[pdb-discuss] Pulling data from musicbrainz into the web-app: recent progress

Rufus Pollock rufus.pollock at okfn.org
Tue Aug 14 13:08:32 UTC 2007


We're now able to pull data from musicbrainz (plus link back there) and 
set the copyright status:

$ bin/pdw-admin --config=production.ini mb2dm 1900 1915
Starting download of information from MusicBrainz
## 23 Releases to process
Processing Victor 17166
-- Processing track: Eskimo Rag
-- Processing track: Mysterious Moon
...

$ bin/pdw-admin --config=production.ini calculate_status
Processing: Performance of: Eskimo Rag
-- Status: 1
Processing: Performance of: Mysterious Moon
-- Status: 1
...

This results info visible on:

   http://db.publicdomainworks.net/performance/

I've also improved web interface to display source information (in 
particular link backs to musicbrainz) so people can do a single click to 
fix the original info.

Specific results are items such as:

http://db.publicdomainworks.net/performance/read/3

   - Oh How That German Could Love! (1910)
   - Authored by Irving Berlin
   - In copyright because Berlin d. 1989

http://db.publicdomainworks.net/performance/read/13

   - Nobody (1906)
   - Authored by Bert Williams d. 1924
   - So Public Domain (hurrah)

http://db.publicdomainworks.net/performance/read/2

   - Mysterious Moon (1912)
   - Authored by Edna Brown (at least that's my guess from mb data)
   - As no death date copyright status is unknown

As is apparent from a quick browse:

a) musicbrainz data is often wrong (see below for specific example)
b) the majority of items -- even though pre-selected to have performance 
date prior to 1916 have an unknown status because the death-date of the 
creator is unknown

Regards,

Rufus


The Code
========

Loader code from musicbrainz is here:

   http://p.knowledgeforge.net/pdw/svn/trunk/src/pdw/mb2dm.py

Copyright status code:

   http://p.knowledgeforge.net/pdw/svn/trunk/src/pdw/copyright.py


Known Issues
============

* We are having to do a lot of educated guessing on performers / 
composer status because musicbrainz does not store this (see detailed 
comments in the mb2dm.py file)

* Quite a bit of Musicbrainz data is wrong e.g. Rock Around the Clock is 
listed as being performed in 1900 (by a performer who was only born in 
1948!):

http://db.publicdomainworks.net/performance/read/93
http://musicbrainz.org/track/32d79a50-babd-4a30-8199-fb92c1a1b306.html




More information about the pd-discuss mailing list