pitman at stat.Berkeley.EDU
Tue Jun 7 20:12:22 UTC 2011
William Waites <ww at styx.org> wrote:
> * There's already an RDF mapping for the schema.org stuff -
> schema.rdfs.org so interoperating isn't very hard and can be largely automatic
> * This is a "standard" dictated by a cartel that largely ignores previous and current work.
Right, but this cartel provides real incentives for adoption for the standard. I am not aware
of any incentives to export RDFa.
> * schema.org is very close to RDFa, why didn't they just use RDFa?
> Their arguments boil down to "because we felt like it" which is a
> bit anti-social - see the excellent article by Manu Sporny
You mean http://ht.ly/5cncU "Is Schema.org Really a Google Land Grab?"? Of course it is a Land Grab, and
a very smart one I think. But one which those with lightweight data and interests can easily adapt to,
I think it is important for BibJSON/BibServer/SchHTML to be responsive to whatever emerging standards
there are, to promote release and processing of biblio data subject to those standards, and to
find incentives for people to do that. The most obvious incentive to providing well structured
data is for it to be well harvested and indexed by search engines. We cant compete with the big players
for large scale search and indexing. Our role is to provide incentives to academic communities
to make their data available in ways which directly enhance the value of that data, by provision of
low cost high quality services over highly structured open data.
> * Are you seriously suggesting to embed XML fragments in JSON?
Yes. One can just as well use JSON to embed LaTeX fragments or a BibTeX entry, or any other piece
of structured text. An abstract in XML seems perfectly acceptable. The JSON should declare the format of the
field, either once and for all in a metadata indication (e.g. "abstract" is always in XML subject to some DTD, or
in LaTeX") or record by record may be necessary for mashed up collections.
We prefer some purer JSONic expression, but in the short term we have to deal with data in all kinds
of formats, and there is great benefit to be gained from taking data from whatever source, mapping it quickly to
JSON without loss of information, then processing the JSON, e.g. regularizing it to something we call BibJSON.
Then BibJSON -> BibJSON processsing is greatly facilitated. It is a judgement call of course how pure we make the BibJSON.
But initially I think it best not to be too pure, and to allow possibly different formats in different fields of the same record.
> The two middle points may be true but it may also be that because the
> cartel is so powerful it gains some sort of critical mass (and maybe not, this is not the first time Google et al. have proposed
> microformats of various types which have not been widely adopted).
Right. But we should be prepared to pick up on whatever structured data format the data publishers are incentivized to provide.
We can provide some small incentives ourselves, but only in the biblio space, and that is a small fraction of the larger document
space that the search engines are operating over. So I think we have to adapt our formats to whatever is taking off in the larger domain.
> The real news is that this is an admission by the big three search engines that heuristics and natural language techniques are not enough
> - this is a significant departure from their previous positions. This much is a very good thing.
Yes. It is very encouraging. The big three will continue to be able to deal with data that is much dirtier than we would like to
see for academic biblio purposes, and we should leverage that as best we can to refine and enhance datasets we care about.
Director, Bibliographic Knowledge Network Project
Professor of Statistics and Mathematics
University of California
367 Evans Hall # 3860
Berkeley, CA 94720-3860
ph: 510-642-9970 fax: 510-642-7892
e-mail: pitman at stat.berkeley.edu
More information about the bibjson-dev