[pdb-discuss] Something concrete we could do in the next week
Rufus Pollock
rufus.pollock at okfn.org
Sun Mar 11 14:05:57 UTC 2007
Nathan Lewis wrote:
>
> Hi Rufus,
>
> Have you done any of these things so far?
Yes.
1. Parser: I have just (re-)written a parser for the composer data in
python:
<http://project.knowledgeforge.net/pdw/svn/trunk/src/pdw/parse_composer_data.py>
<http://project.knowledgeforge.net/pdw/svn/trunk/src/pdw/parse_composer_data_test.py>
(I know we already had your perl parseComp.pl but my perl skills are
non-existent and the new code (i think) adds some extra functionality
like trying to standardize the date format, extra aliases wherever
possible etc. Full details in the commit messages which I've included at
the end of the email).
2. Musicbrainz interface: comments below
> I have had a play with the musicbrainz client but I found that the
> deprecated rdf based perl binding does not appear to work very well and
> the new ReST based perl binding is very incomplete. How are the python
> bindings?
Python bindings seem to work fine. As I posted a couple of weeks back I
got something simple working which I put in subversion:
http://project.knowledgeforge.net/pdw/svn/trunk/src/pdw/mb.py
Demoed (very simply) in:
http://project.knowledgeforge.net/pdw/svn/trunk/src/pdw/mb_test.py
> If neither are up to the task then we will have to write code to use the
> ReST api directly which wouldn't be too difficult but we will still need
> some tricky heuristics to work out which artist to go with when more
> than one match the name.
As you said we will need some heuristics but do not think it will be
that hard ...
> I think the musicbrainz cross referencing might be too much to get
> working in one week.
Sure but we can try.
Regards,
Rufus
## Log Message
### data/composers.txt:
r29 | rgrp | 2007-03-11 13:50:18 +0000 (Sun, 11 Mar 2007) | 35 lines
Make various changes to source composer data file in order to make it
easier to parse (and to fix some bugs in the data). More work is needed
but this is a start. Justification of the changes provided below.
1. Rudolf FRIML:
-(Karl) Rudolf FRIML, * 1879 or 1884 Dec 7, + 1972 Nov 12
+(Karl) Rudolf FRIML, * 1879 Dec 7, + 1972 Nov 12
Checking on the internet (wikipedia and elsewhere) indicated that 1879
was correct year of birth.
2. Chris PATTON:
-Chris(=Christopher W) PATTON, * @57, + 2006 Apr 25
+Chris(=Christopher W) PATTON, * 1957, + 2006 Apr 25
Assume @57 is a simple typo (internet searching did not give any indicators)
3. Tennyson JESSE:
-Fryniwyd(=Wynifried Margaret?) Tennyson JESSE, Mrs HARWOOD, * 1888 or
1889, + 1958 Aug 6
+Fryniwyd(=Wynifried Margaret?) Tennyson JESSE, Mrs HARWOOD, * 1888, +
1958 Aug 6
Internet research (e.g.
http://www.classiccrimefiction.com/f-tennyson-jesse.htm) indicates 1888
as correct year of birth.
4. Eduardo SANCHEZ De FUENTES y PELAEZ
-Eduardo SANCHEZ De FUENTES y PELAEZ, * 1876 Apr 3, * 1944 Sep 7, + ?
+Eduardo SANCHEZ De FUENTES y PELAEZ, * 1876 Apr 3, + 1944 Sep 7
Original entry looks like a typo (and brief bit of internet browsing
seemed to suggest he had lived in early 20th century).
5. Various others (see diff): fix aliases so the parser works
The irregular structure has already caused quite a few problems
(sometimes aliases are separated by commas sometimes they are bracketed
etc etc). Here just fixed cases like RUSSELL(-BROWN) by removing
bracketed section and creating an new alias with the stuff in the
brackets unbracketed.
### src/pdw/parse_composer_data.py
r30 | rgrp | 2007-03-11 13:52:17 +0000 (Sun, 11 Mar 2007) | 9 lines
Code to parse composers.txt data file on composer birth and death dates
into a usable form.
* trunk/src/pdw/parse_composer_data.py,
trunk/src/pdw/parse_composer_data_test.py:
Create a ComposerFileParser class which parses out each line in the
composers.txt file return a dictionary containing (among others):
* last name
* first name
* birth date
* death date
More information about the pd-discuss
mailing list