[openbiblio-dev] First cut of AsyncUpload branch
Etienne Posthumus
etienne.posthumus at okfn.org
Mon Feb 13 10:11:53 UTC 2012
On 10 February 2012 16:59, Jim Pitman <pitman at stat.berkeley.edu> wrote:
> I have one suggestion of pratice I have found extremely useful in my parser work and which I'd like to see provided
> as a small but expected supplement to each source-specific parser.
>
> That is, that each parser should contain a method which when offered a url will
> respond with the canonical form of the url from which the parser knows how to scrape data, and the API call
> for doing that. Users may or may not want the parser to continue from there.
Jim, do I understand it correctly that you suggest some sort of
'string-sniffing' support in ALL the parsers?
IOW, when called in some manner as a convention, eg.
someparser -s "arXiv:1201.6450"
it returns some structured output along the lines of:
{ "recogised" : true/false,
"canonical":"http://arxiv.org/abs/1201.6450",
"metadata":"http://somemetadataurl"}
Can you contribute a simple Python script that does what you suggest?
(no parsing needed yet)
Then we can see if this is general enough to recommend as a convention
for other parser/scrapers.
More information about the openbiblio-dev
mailing list