[pdb-discuss] Pulling data from musicbrainz into the web-app: recent progress
Rufus Pollock
rufus.pollock at okfn.org
Tue Aug 14 13:08:32 UTC 2007
We're now able to pull data from musicbrainz (plus link back there) and
set the copyright status:
$ bin/pdw-admin --config=production.ini mb2dm 1900 1915
Starting download of information from MusicBrainz
## 23 Releases to process
Processing Victor 17166
-- Processing track: Eskimo Rag
-- Processing track: Mysterious Moon
...
$ bin/pdw-admin --config=production.ini calculate_status
Processing: Performance of: Eskimo Rag
-- Status: 1
Processing: Performance of: Mysterious Moon
-- Status: 1
...
This results info visible on:
http://db.publicdomainworks.net/performance/
I've also improved web interface to display source information (in
particular link backs to musicbrainz) so people can do a single click to
fix the original info.
Specific results are items such as:
http://db.publicdomainworks.net/performance/read/3
- Oh How That German Could Love! (1910)
- Authored by Irving Berlin
- In copyright because Berlin d. 1989
http://db.publicdomainworks.net/performance/read/13
- Nobody (1906)
- Authored by Bert Williams d. 1924
- So Public Domain (hurrah)
http://db.publicdomainworks.net/performance/read/2
- Mysterious Moon (1912)
- Authored by Edna Brown (at least that's my guess from mb data)
- As no death date copyright status is unknown
As is apparent from a quick browse:
a) musicbrainz data is often wrong (see below for specific example)
b) the majority of items -- even though pre-selected to have performance
date prior to 1916 have an unknown status because the death-date of the
creator is unknown
Regards,
Rufus
The Code
========
Loader code from musicbrainz is here:
http://p.knowledgeforge.net/pdw/svn/trunk/src/pdw/mb2dm.py
Copyright status code:
http://p.knowledgeforge.net/pdw/svn/trunk/src/pdw/copyright.py
Known Issues
============
* We are having to do a lot of educated guessing on performers /
composer status because musicbrainz does not store this (see detailed
comments in the mb2dm.py file)
* Quite a bit of Musicbrainz data is wrong e.g. Rock Around the Clock is
listed as being performed in 1900 (by a performer who was only born in
1948!):
http://db.publicdomainworks.net/performance/read/93
http://musicbrainz.org/track/32d79a50-babd-4a30-8199-fb92c1a1b306.html
More information about the pd-discuss
mailing list