[pdb-discuss] Something concrete we could do in the next week

Rufus Pollock rufus.pollock at okfn.org
Sun Mar 11 14:05:57 UTC 2007


Nathan Lewis wrote:
> 
> Hi Rufus,
> 
> Have you done any of these things so far?

Yes.

1. Parser: I have just (re-)written a parser for the composer data in 
python:

<http://project.knowledgeforge.net/pdw/svn/trunk/src/pdw/parse_composer_data.py>
<http://project.knowledgeforge.net/pdw/svn/trunk/src/pdw/parse_composer_data_test.py>

(I know we already had your perl parseComp.pl but my perl skills are 
non-existent and the new code (i think) adds some extra functionality 
like trying to standardize the date format, extra aliases wherever 
possible etc. Full details in the commit messages which I've included at 
the end of the email).

2. Musicbrainz interface: comments below

> I have had a play with the musicbrainz client but I found that the 
> deprecated rdf based perl binding does not appear to work very well and 
> the new ReST based perl binding is very incomplete. How are the python 
> bindings?

Python bindings seem to work fine. As I posted a couple of weeks back I 
got something simple working which I put in subversion:

http://project.knowledgeforge.net/pdw/svn/trunk/src/pdw/mb.py

Demoed (very simply) in:

http://project.knowledgeforge.net/pdw/svn/trunk/src/pdw/mb_test.py

> If neither are up to the task then we will have to write code to use the 
> ReST api directly which wouldn't be too difficult but we will still need 
> some tricky heuristics to work out which artist to go with when more 
> than one match the name.

As you said we will need some heuristics but do not think it will be 
that hard ...

> I think the musicbrainz cross referencing might be too much to get 
> working in one week.

Sure but we can try.

Regards,

Rufus

## Log Message

### data/composers.txt:

r29 | rgrp | 2007-03-11 13:50:18 +0000 (Sun, 11 Mar 2007) | 35 lines

Make various changes to source composer data file in order to make it 
easier to parse (and to fix some bugs in the data). More work is needed 
but this is a start. Justification of the changes provided below.

1. Rudolf FRIML:

-(Karl) Rudolf FRIML, * 1879 or 1884 Dec 7, + 1972 Nov 12
+(Karl) Rudolf FRIML, * 1879 Dec 7, + 1972 Nov 12

Checking on the internet (wikipedia and elsewhere) indicated that 1879 
was correct year of birth.

2. Chris PATTON:

-Chris(=Christopher W) PATTON, * @57, + 2006 Apr 25
+Chris(=Christopher W) PATTON, * 1957, + 2006 Apr 25

Assume @57 is a simple typo (internet searching did not give any indicators)

3. Tennyson JESSE:

-Fryniwyd(=Wynifried Margaret?) Tennyson JESSE, Mrs HARWOOD, * 1888 or 
1889, + 1958 Aug 6
+Fryniwyd(=Wynifried Margaret?) Tennyson JESSE, Mrs HARWOOD, * 1888, + 
1958 Aug 6

Internet research (e.g. 
http://www.classiccrimefiction.com/f-tennyson-jesse.htm) indicates 1888 
as correct year of birth.

4. Eduardo SANCHEZ De FUENTES y PELAEZ

-Eduardo SANCHEZ De FUENTES y PELAEZ, * 1876 Apr 3, * 1944 Sep 7, + ?
+Eduardo SANCHEZ De FUENTES y PELAEZ, * 1876 Apr 3, + 1944 Sep 7

Original entry looks like a typo (and brief bit of internet browsing 
seemed to suggest he had lived in early 20th century).

5. Various others (see diff): fix aliases so the parser works

The irregular structure has already caused quite a few problems 
(sometimes aliases are separated by commas sometimes they are bracketed 
etc etc). Here just fixed cases like RUSSELL(-BROWN) by removing 
bracketed section and creating an new alias with the stuff in the 
brackets unbracketed.


### src/pdw/parse_composer_data.py

r30 | rgrp | 2007-03-11 13:52:17 +0000 (Sun, 11 Mar 2007) | 9 lines

Code to parse composers.txt data file on composer birth and death dates 
into a usable form.
* trunk/src/pdw/parse_composer_data.py,
   trunk/src/pdw/parse_composer_data_test.py:
   Create a ComposerFileParser class which parses out each line in the 
composers.txt file return a dictionary containing (among others):
     * last name
     * first name
     * birth date
     * death date




More information about the pd-discuss mailing list