[open-bibliography] Post about openbiblio data from Finland's Vaski consortia

Jim Pitman pitman at stat.Berkeley.EDU
Fri Oct 14 16:13:26 UTC 2011


Karen Coyle <kcoyle at kcoyle.net> wrote:

> > If deliberate, where exactly is the "BY" requirement defined?
> On the CC web site -- all of the licenses are defined there.

Sorry, I meant where exactly could I find the particular attribution statement which
was expected for this instance of CC-BY, i.e. whether the attribution should be to the finnish 
library or OKFN. But this is moot if it is a mistake.

> There's been a short discussion on the list for the Digital Public  
> Library of america about the fact that there is no reliable provenance  
> in CC licenses. They at least need to be digitally signed. So this  
> "who" question is inherent in CC.

Yes, this is an interesting discussion on DPLA. Years ago Nelson Beebe provided check sums
on his BibTeX datasets for just this reason.  I hope it may be possible to do that with some form of BibJSON,
both at the record level and at the collection or dataset level. BTW, Mark MacGillivray and I have agreed
that until some better consensus emerges from the openbiblio community, for purposes of BibJSON dev we are using the words 
"dataset" and "collection" interchangeably. This finnish deposit exemplifies what we mean by either term. We are open to 
suggestion about how to distinguish the terms "collection" and "dataset" for purposes of BibJSON/BibSoup. 
For reasons I have not yet understood,  the term "collection" seems to set off alarm bells which "dataset" does not.

It would be great if OKFN could promote some simple form of digital signatures for open biblio records and datasets.
This should also encourage those wishing to make improvements of large open datasets to do so by publishing 
diffs or increments. This should hopefully reduce the problem of duplication of records, and make us welcome and
encourage copying of records rather than fearing it.

Digital signatures raise issues about what is the canonical form of a structured text dataset, 
be it encoded as BibTeX or XML or JSON or whatever. If we are going to recommend checksums on canonical forms, we should 
be ensure the canonical form has desirable technical properties, like the metadata and license being in predictable places near the top of the file, 
from which they could easily be extracted as standalone and equally valid metatdata only records separated from the data.
This is especially important if the data is in a big file.

--Jim

----------------------------------------------
Jim Pitman
Professor of Statistics and Mathematics
University of California
367 Evans Hall # 3860
Berkeley, CA 94720-3860

ph: 510-642-9970  fax: 510-642-7892
e-mail: pitman at stat.berkeley.edu
URL: http://www.stat.berkeley.edu/users/pitman




More information about the open-bibliography mailing list