[openbiblio-dev] a bibtex oddity

Peter Murray-Rust pm286 at cam.ac.uk
Sun Feb 5 21:47:56 UTC 2012


On Sun, Feb 5, 2012 at 9:12 PM, Mark MacGillivray <
mark.macgillivray at okfn.org> wrote:

> Recently we saw an example of an import from a bibtex file where the
> records were separated by commas. We had not seen this before, and are
> unsure if we can achieve parsing of that along with the other bits and
> pieces we have to watch out for.
>
> We are wondering if it is worth the effort to fix - is it common to do
> this? Previous examples have included curly braces or not, and a
> certain amount of whitespace and newlines, but never and/or commas
> between records.
>

<warning>Strongly held views</warning>

I don't know the answer to the specific problem, but I have much experience
of the generic one. It can be stated:
* a format is developed. It's more or less well described syntactically and
semantically
* a small set of tools (often only one) is developed.
* Things progress sort of OK
* someone invents an "enhancement" to the file. They create a new tool that
processes the extension (which is not documented). The extension gets
widespread use.
* someone else sees the new format, guesses at its syntax and "improves" it.
* now we have chaos. A send B a file which crashes the program that C wrote
(or worse simply corrupts the information).
* chaos continues for 20 years or more.

It happened with HTML. Every possible horror was thrust upon the world. The
only saving feature was that most HTML was designed to be read by humans
and they are pretty good at reading grot. They recognise a blank page as
grot - a fouled encoding as grot.

But machines can't. So writing browsers cost literally millions on millions
of dollars. And finally - after 20 years - the community agrees that
conformance to a standard is not a bad thing after all.

So if we chase a broken toolchain we'll spend all that time chasing
toolchains instead of doing bibliography.

We have the same thing with the NIH "RIS". AFAICS they've just taken
something a bit like RIS and made up the rest. It breaks every parser we
have got. So what do we do?

* convince the NIH to put out conformant RIS? Assuming that such a thing
exists. Hmmm, little chance
* document the NIH-RIS and write a special parser? Looks like a necessary
stopgap

*** convince the NIH to emit BibJSON!!

Because there IS a BibJSON specification. and we CAN validate it. And if
the community want to extend its power they have a forum and a community to
react and proceed in a responsible way

>
> Mark
>
> _______________________________________________
> openbiblio-dev mailing list
> openbiblio-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/openbiblio-dev
>



-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openbiblio-dev/attachments/20120205/3047a0c9/attachment.html>


More information about the openbiblio-dev mailing list