[open-bibliography] CUL dataset release

Ben O'Steen bosteen at gmail.com
Tue Oct 19 18:51:12 UTC 2010


Unfortunately, the main description and grouping of this dataset is
simply "the metadata records that Cambridge University Library feel that
haven't been copied or combined with data from OCLC and other suppliers"

As such, it typically refers to items that are peculiar to Cambridge's
collection, varying from maps to manuscripts. As I've stated, the rhyme
and reason of this collection has nothing to do with the content - the
aggregation is due solely to licence considerations made by CUL.

Not much of a theme, but such is the data. This package is purely the
source material we received from CUL - nothing has been done to it, but
it is published as this is the key data that any subsequent set relies
on, and it is the package that defines the base licence of any
subsequent curated sets.

So, yes, it's not that interesting and it is very ill-defined from a
content POV, but it needs to exist so we can indicate provenance of data
and licence. Without which, any questionable assertion, bias, encoding,
ontological choice or assumption I make in creating a more 'reusable' or
curated version of it, cannot be checked against the original matter.
(Especially with regards to CUL's interpretation of MARC fields and the
way they have extended it over the years - which is a issue that is not
just found with CUL data of course.)

I would very much favour a wiki or similar for CKAN packages, where data
triage and characterisation can be discussed on a per-package basis.

Ben

On Mon, 2010-10-18 at 21:15 -0700, Jim Pitman wrote:
> Can someone please provide a description of this dataset? Some idea
> of what range of years and subjects? Or how this dataset was collected or conceived?
> Of course its nice to see any dataset in PDDL. But it is not much of a service to
> release hundreds of thousands of records with no indications of what's in there. 
> 
> Is it expected that everyone on the list is going to jump in and see if there's anything there 
> they care about?  I'd like to see a higher standard of dataset description before dataset release 
> announcments on this list.
> 
> Or maybe we need some preliminary stage inviting volunteers to provide dataset descriptions if
> dataset providers are unwilling or unable to do so.
> 
> many thanks
> 
> --Jim
> ----------------------------------------------
> Jim Pitman
> Director, Bibliographic Knowledge Network Project
> http://www.bibkn.org/
> 
> Professor of Statistics and Mathematics
> University of California
> 367 Evans Hall # 3860
> Berkeley, CA 94720-3860
> 
> ph: 510-642-9970  fax: 510-642-7892
> e-mail: pitman at stat.berkeley.edu
> URL: http://www.stat.berkeley.edu/users/pitman
> 
> 
> _______________________________________________
> open-bibliography mailing list
> open-bibliography at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-bibliography






More information about the open-bibliography mailing list