[openbiblio-dev] First cut of AsyncUpload branch

Fri Feb 17 05:01:41 UTC 2012

Tom Morris <tfmorris at gmail.com> wrote:

> On Mon, Feb 13, 2012 at 5:18 AM, Mark MacGillivray <mark at odaesa.com> wrote:
> > On Mon, Feb 13, 2012 at 10:11 AM, Etienne Posthumus
> > <etienne.posthumus at okfn.org> wrote:
> >> Jim, do I understand it correctly that you suggest some sort of
> >> 'string-sniffing' support in ALL the parsers?
> >> IOW, when called in some manner as a convention, eg.
> >> someparser -s "arXiv:1201.6450"
> >
> > This sounds to me like something that should come before parsing -
> > e.g. send a string to a URL, get back details of which parser would
> > parse it, then submit to that parser. Does not actually need to be
> > written into the parsers though.

Yes, I think it does, for reasons Tom outlines below. The main point is that the community
knowledge about how to handle inputs from a particular source should all be in one place.
You should not have to hunt arount all over for it. All documentation, info about identifiers,
capabilities, parsing and so on, should be kept in one place. Seems to me the best way to do that is
have it all in the parser module. If the parser is a directory, with different files expected there for
different purposes I suppose that's OK. But the heart of it will be a parser, so why not just fatten up the
parser module with the additional info, some of which has to be machine actionable.

--Jim
>
> It's actually pretty useful to have format handlers (e.g. parsers)
> either register what they can handle or provide a method by which the
> calling framework can query whether or not they can handle something.
>
> In Google Refine, import format handlers are required to provide a
> method by which the app can: 1) query whether they can handle a given
> HTTP ContentType and 2) query whether they can handle a given URI (can
> trivially be used to implement file extension filter or more
> extensively do full blown content analysis).  Given a set of candidate
> format handlers, Refine can query them all to see which ones can
> handle the given content.
>
> Another way to do something similar is to have static registration of
> a set of content types, URI RegEx patterns, etc.  Either way, it's
> useful to have the parser writer declare this stuff explicitly rather
> than having it decoupled in a way that allows it to get out of sync
> with the actual capabilities of the parser.
>
> Tom
>
> _______________________________________________
> openbiblio-dev mailing list
> openbiblio-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/openbiblio-dev