[open-bibliography] CUL dataset release

Tue Oct 19 20:38:07 UTC 2010

Ben O'Steen <bosteen at gmail.com> wrote:

> Unfortunately, the main description and grouping of this dataset is
> simply "the metadata records that Cambridge University Library feel that
> haven't been copied or combined with data from OCLC and other suppliers"

Thanks Ben, it  would be very useful if that description could be
posted in the OKFN record for the dataset.
A general question with all these library releases, is what is the increment
in value over the OL dataset which already has a nice API?
What agent is willing to track the deltas as new datasets come online?
Another question is the extent to which OL data qualifies according to the 
OKFN's open data criteria. I find the OL terms of use statement http://www.archive.org/about/terms.php
fairly opaque. OL used to have a statement that they regarded their data as public
domain, but  this seems to have been withdrawn. Maybe Karen could comment on this?

> I would very much favour a wiki or similar for CKAN packages, where data
> triage and characterisation can be discussed on a per-package basis.

My strong support for this. 
To avoid duplication of effort it is important to have a central place where agents planning or 
conducting data triage and characterisations can report their plans and activities. Then we should 
try to encourage a culture in this group to post to that place, and to look there before embarking 
on such efforts.

I also strongly agree with Peter that we should encourage dumps of open data first, and
ask questions about quantity and quality later. The rate of dumps of library data is currently
exceeding the capacity of this community to process them into usable forms. But that should not slow 
down the open dumping.  I think that the process of data cleaning and enhancement by agents with 
domain-specific interests would be accelerated if some agents could do preliminary triage and 
characterisations and report on the results. Then other agents might be motivated to step in and 
improve the data further.

--Jim

>
> On Mon, 2010-10-18 at 21:15 -0700, Jim Pitman wrote:
> > Can someone please provide a description of this dataset? Some idea
> > of what range of years and subjects? Or how this dataset was collected or conceived?
> > Of course its nice to see any dataset in PDDL. But it is not much of a service to
> > release hundreds of thousands of records with no indications of what's in there. 
> > 
> > Is it expected that everyone on the list is going to jump in and see if there's anything there 
> > they care about?  I'd like to see a higher standard of dataset description before dataset release 
> > announcments on this list.
> > 
> > Or maybe we need some preliminary stage inviting volunteers to provide dataset descriptions if
> > dataset providers are unwilling or unable to do so.
> > 
> > many thanks
> > 
> > --Jim
> > ----------------------------------------------
> > Jim Pitman
> > Director, Bibliographic Knowledge Network Project
> > http://www.bibkn.org/
> > 
> > Professor of Statistics and Mathematics
> > University of California
> > 367 Evans Hall # 3860
> > Berkeley, CA 94720-3860
> > 
> > ph: 510-642-9970  fax: 510-642-7892
> > e-mail: pitman at stat.berkeley.edu
> > URL: http://www.stat.berkeley.edu/users/pitman
> > 
> > 
> > _______________________________________________
> > open-bibliography mailing list
> > open-bibliography at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/open-bibliography
>
>
>
> _______________________________________________
> open-bibliography mailing list
> open-bibliography at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-bibliography