[pdb-discuss] Re: Author name lists

Andrew Gray shimgray at gmail.com
Wed Jun 27 00:22:41 UTC 2007


On 25/06/07, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> Wow, Andrew this is fantastic. To respond:
>
> a) This is definitely useful to us.
>
> b) Suggestions on how to proceed: we are currently focusing on
> recordings so we are particularly interested in composers and data on
> recordings themselves.

Mmmm. This is the problem - the data is comprehensive but almost
totally unsorted. No keywording, no consistent subject headings, no
indexing...

Let's look at some random (post-processing) records:

n  42006838
[Name:] Clavel, Bernard,
[Associated dates:] 1923-
[Title of a work:] Colonnes du ciel (1976)
[Source data found in:] Clavel, B. Marie bon pain, c1980.
[Source data found in:] Clavel, B. La saison des loups, c1976
[Information found:] t.p. (Les colonnes du ciel; with a single star to
indicate v. 1)

n  42008378
[Name:] Coulson, Juanita.
[Title of a work:] Children of the stars
[Source data found in:] Outward bound. 1982.

n  42015898 [Old reference number:] n  81074968
[Name:] Marais, Marin, [Associated dates:] 1656-1728.
[Title of a work:] Instrumental music
[Also under: name:] Marais, Marin,
[Also under: dates:] 1656-1728.
[Also under: work:] Instrumental works
[Source data found in:] Pièces à une et à deux violes (1686-89), 1980.

Sadly lacking is any information on who or what these people are. You
or I can glance at them and say "French novelist", "sf writer",
"baroque(?) musician", but eyeballing doesn't scale and extracting
specifically composers is going to be tricky. I don't know how
widespread and useful comments like those in Marin's are - or how
productive a general search on "instrument*", "music*", etc., would
be. More research needed...

> We've already got a bit of data on composers (see recent blogpost[1])
> but we need more. However what we are really lacking is data on the
> recordings themselves. One possibility is to extract this kind of info
> from the Library of Congress as well (this was briefly discussed in [2])
> -- given your expertise in this area this might be a good way to go.

ha! I have no expertise in this area, I'm just a cataloguer playing
with a parser. The material was scraped (painstakingly) by a third
party and displayed here:

http://www.ibiblio.org/fred2.0/ [Personal authority records are the
/100 directory]

Presumably one could do the same for the actual catalogue data
(authority records are mainly metadata), but looking at the efforts he
went to it'd be quite tough to set up.

However, if we can replicate the scraping, it should be possible to
target it relatively easily - feed the catalogue with individual known
composers and scrape the corpus of their work.

> c) We are currently storing our code (though not all the data as it too
> large ...) in a subversion repository:
>
>    http://p.knowledgeforge.net/pdw/svn/
>
> If you'd like to use this you'd be very welcome. Just sign up for an
> account on http://www.knowledgeforge.net/ and let me know your username
> and you can have commit access.

Thanks - I'll look into it next week.

-- 
- Andrew Gray
  andrew.gray at dunelm.org.uk




More information about the pd-discuss mailing list