[pdb-discuss] parsing the bbc data

James Casbon casbon at gmail.com
Tue Jul 24 15:50:44 UTC 2007


I had a first shot at parsing the bbc data (see attached file if
you're interested).  It seems pretty well structured, but I wonder if
anyone can tell what the columns are?

Please have a look at http://p.knowledgeforge.net/pdw/A1_parsed_20070721.csv

the first column is the title, the second is the 'pre title' (ie the,
das, etc), but what about the rest?

I suppose for pdw we're only interested in a few.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: parse.py
Type: application/octet-stream
Size: 1332 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/pd-discuss/attachments/20070724/a8b67728/attachment-0001.obj>


More information about the pd-discuss mailing list