[pdb-discuss] british library screen scraping

Dan Leech dan.t.leech at gmail.com
Mon Apr 10 14:43:37 UTC 2006


hello all

I could probably knock up a scraper in PHP, it looks pretty
starightforward.. the dates of birth/death are included by the side of the
list of composers.

http://en.wikipedia.org/wiki/List_of_Classical_composers

it would be usefull to have an updated DB plan that reflects Rufus's model.
Would we need any more fields other than name/birth/death?

On 09/04/06, Timothy Cowlishaw <timcowlishaw at gmail.com> wrote:
>
>
> On 9 Apr 2006, at 10:11, Rufus Pollock wrote:
>
> > Feel free to comment on/amend this (either on the wiki or here)
>
>
> One thought....
>
> "Add information on composers (specifically date of birth)"
>
> Could we branch this data off into a seperate table, and even develop
> a screenscraper  for wikipedia to get composers dates of birth for us?
>
> If we had a script which scraped the wikipedia list of classical
> composers, and dug down to each article to get the date of birth,
> then dumped the name and date of birth in a database table,  and set
> this to run at regular (infrequent) intervals, all the 'composer'
> fields in the table of 'recordings' could reference this 'composers'
> table...
>
>
> cheers,
>
> Tim
>
> _______________________________________________
> pdb-discuss mailing list
> pdb-discuss at lists.okfn.org
> http://lists.okfn.org/cgi-bin/mailman/listinfo/pdb-discuss
>



--
Dan Leech
Virtual Art Solutions
www.dantleech.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/pd-discuss/attachments/20060410/b10848da/attachment.html>


More information about the pd-discuss mailing list