[openbiblio-dev] References in Wikipedia and BibJSON

Jim Pitman pitman at stat.Berkeley.EDU
Tue Feb 14 16:35:55 UTC 2012


Etienne Posthumus <etienne.posthumus at okfn.org> wrote:

> Nosing around the docs I found this:
> http://en.wikipedia.org/wiki/Wikipedia:Citation_templates
>
> Trying to understand how it works:
> On a page you edit, the citations are made using the above template
> style. When the page is saved the citations are saved and displayed at
> the end of the article in a neat (numbered & hyperlinked) reference
> list. Because of the template, certain fields become useful, eg. dois
> become linkable,  wikipages are links etc.
> Is this correct? So we could make a Wikipedia page to BibJSON scraper
> that takes a given wikipediapage url and returns a BibJSON of all
> cited items?

I think so. What is  a bit annoying about these templates is their variety of
styles.  But it should be pretty easy to parse all the different flavors of these templates
and map to something we would accept as BibJSON. I'll be glad to work on this.

> What is the 'correct' way to get a structured data output of a
> Wikipedia page? (showing even more ignorance on my part)

There is an API call which I have used before. The return is not very structured, but its fairly
clean and typically better than trying to scrape the raw html, which they discourage anyway. It should be easy 
to rip out the bibitems from the API call.  I'll look in my files for how the API call works and respond
again on this.

--Jim





More information about the openbiblio-dev mailing list