[pdb-discuss] Re: Questions Questions Questions.. ooh and hello :)

Rufus Pollock rufus.pollock at okfn.org
Fri Mar 16 09:59:33 UTC 2007


> <Unlurk>
> 
> Hey All, apologies for busting in on the discussion.... I've been
> following your work for a little while now, and I'm really interested in
> whats going on, and the data (Well, OK, especially the data)). I'd love

Great to hear from you.

> to contribute, but alas it's hard to teach an old dog new tricks and I'm
> really a java bod rather than perl or python person.

No problem -- I did quite a bit of java in the past but my preference is 
for python now ...

> I really don't want to tread on any toes or anything, but I've been
> mucking around and got some java code that does some of what you're
> discussing here. If you want to keep the community focused on one
> implementation, I'll just look after my code and keep it as a play
> thing, but if it's more the case that okfn/pdb would like to "Let a
> thousand flowers bloom" (as it were) then I'd be more than happy to
> share what I've tentatively tagged jpdb (A java version of the pdb).

Sharing would be great. If you would like to put it in subversion you'd 
be more than welcome -- just sign up for an account on:

   http://www.knowledgeforge.net/

Tell me your user name and I'll make you a admin or developer on the 
project so you have write access to:

https://project.knowledgeforge.net/pdw/svn/

You could then start putting java stuff in e.g. src/java/

> Right now, I'm just parsing the composers file and buggering about with
> the web services. But before long I should be able to make the thing
> searchable via SRW/SRU.

Sounds great. As you might have seen the parsing of the composers file 
has been done (though in python ...).

> What I really wanted to ask tho.... was the composers file.. I've had a
> dig around and I can't actually see a concrete definition for it.. Right
> now.. I'm using something like this as a spec:

There is no concrete defintion. We were given it by Philip Harper of 
kingkong.demon.co.uk in exactly the form provided (i think it was 
designed more for grepping than machine processing). Having just written 
a parser for it I can personally say that there is quite a lot of 
special case stuff. I've done my best to deal with it but some bits will 
just have to be managed by hand (e.g. stuff like ... Macall(-Smith)). 
See log message appended to my recent email:

http://lists.okfn.org/pipermail/pdb-discuss/2007-March/000169.html

> ComposerNameEntry ::= [Title] Name DOB DOD
> Title ::= String
> Name ::= NameComponents AdditionalNameSpec
> NameComponents ::= NameComponent +
> NameComponent :: String ( / AlternateSpellingString )+
> AdditionalNameSpec ::= ([&]ps: Name [,Name]+ )
> DOB ::= * Date
> DOD ::= + Date
> Date ::= Year [Month [Day]] [(Date Comment)]
> 
> Does that pretty much tie up with the general understanding? I wasn't
> sure what the difference between ps: and &ps: is... any thoughts?

This looks pretty accurate. When I did it for the time being I've just 
shunted all the alias stuff into a field called 'extra' and 'aliases'. 
To see the parsing results just do:

python <path-to-trunk>/src/pdw/parse_composer_data.py

This should spit out the parsed version of the composers.txt (with 
python dictionaries converted to strings)

Regards,

Rufus




More information about the pd-discuss mailing list