[pdb-discuss] Re: Somewhere to start

Mon Jul 16 20:06:36 UTC 2007

James Casbon wrote:
> On 16/07/07, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>> This looks really excellent. I've just ported our existing domain model
>> to sqlalchemy/elixir and you can see the results at:
>>
>> http://p.knowledgeforge.net/pdw/svn/trunk/src/pdw/www/models/recording.py
>>
>> Unit tests showing usage at:
>>
>> <http://p.knowledgeforge.net/pdw/svn/trunk/src/pdw/www/tests/models/test_recording.py> 
>>
>>
> 
> Rufus, that looks like a really good start.  The python pedant in me
> can't help but mention that if your going to use all the XUnit cruft
> then self.assertEquals gives you better messages than plain assert.

I agree with you on the XUnit cruft and the plain assert is a move back 
to something plainer. Used in combination with py.test or nosetests it 
delivers more info with plainer reading that plain old unittest.

> As for the model, again looks good.  I like the idea.
> 
>>
>> We'd welcome assistance and I very much think this could be useful. Our
>> next task is to complete the domain model and then start pushing/pulling
>> data from musicbrainz to a local db on which we can run the web app.
>>
> 
> Presumably you're well aware of: 
> http://musicbrainz.org/doc/PythonMusicBrainz2

Yup. Ongoing work to use this for our own purposes is here:

http://p.knowledgeforge.net/pdw/svn/trunk/src/pdw/mb.py

> Part of me is wondering if the idea of replicating some of the
> musicbrainz database doesn't smell slightly bad.  Apologies if this
> has already been discussed, but I'm new here.

No this this an excellent question.

> Wouldn't it be better to try and link each recording to the musicbrainz 
> id.  eg:
> the MBID field from here.
> http://musicbrainz.org/track/9f429c1b-ac0b-4e7a-8686-8804b09d5c67.html

Yes this is exactly what we plan to do. The only problem occurs if we 
have data that is not yet in MB (though that could just be solved by 
ensuring that all data we have *is* uploaded to MB before pulling it 
back into our web app).

> Then
>  1. write a service that returns a recording for a given MBID
>  2. hack up a greasemonkey script to overlay onto the MB site that a
> recording is available

I hadn't thought of the greasemonkey approach -- interesting idea. 
Having our own backend at present had simply been for performance issues 
-- I just don't know how happy MB would be if were really hitting their 
DB very regularly. So instead we would do a regular search (say once 
weekly of their stuff) and then cache the out of copyright stuff (not 
the whole record but at least an id + title ...)

> Then you put all the chore of domain models and databases on MB.
> Obviously this cant provide you search for tracks where there *is* a
> recording avaiable.   But hey, for each track available you could walk
> the MB model and get the data as text, then create a web page for each
> track and let google do the searching for you - they may be better at
> it ;-)

If i understand correctly: we crawl MB on some regular basis and then 
create a page for each PD recording -- if so this is very similar to 
what I had in mind except we run off a DB backend (i.e. we crawl 
regularly cache identifiers and a small bit of info on each PD recording 
and then use this to create the pages -- with a linkback from each 
recording to original MB page for e.g. modifications etc etc)

~rufus