[pdb-discuss] Some thoughts on an open metadata db for cultural works

Rufus Pollock rufus.pollock at okfn.org
Thu Jul 6 09:43:29 UTC 2006


Below are some general thoughts on the value of a general metadata and 
how different groups could collaborate on such a project. Comments welcome.

Regards,

Rufus

= Metadata Database =

A metadata database for cultural works (books/recordings/films etc) is a 
prerequisite for all kinds of activities from a database of public 
domain works to a mass digitization project such as Google Print.

At base this database models two types of objects (more if we include 
items such as Recordings and Films):

   Artist
     * name
     * dob
     * date of death
     * ...
   Work
     * title
     * creators (link to artists)
     * date created
     * ...
   [Recording]
   ...

== Desirable properties of such a metadata DB ==

1. **Persistent** identifiers for Artists and Works. That way one can 
have a single central registry which can then be *reused* by other 
projects via the identifiers (e.g. a public domain project and orphan 
work project could both reuse the same metadata db).

2. Data should be 'open' (see http://www.okfn.org/okd/) so that anyone 
can reuse and redistribute it (without this it will be much harder for 
different projects to agree to collaborate on a *central* database). One 
of the main benefits of the data being open is that it makes it easy for 
someone to 'branch' the database -- i.e. create their own version under 
their control. This might be useful if an organization had specific 
needs, for example needing to ensure the data was ultra reliable.

3. Versioned and community editable: this way one can enlist a community 
of volunteers to improve and add to the database. At the same time 
versioning means that contributions can be checked, audited and reverted.

== Collaboration ==

How could two different organizations collaborate with respect to such a 
database. There are several ways:

1. Share data. Both organizations could contribute their data into an 
open central repository (maintained by either of them or even by a third 
party)

2. Collaborate in developing the database itself (database structure, 
maintenance etc)

3. Collaborate on developing tools to utilize and interface with the 
database. For example one would like to have a web front end to the 
database by which members of the general community could edit and add 
information

With open data only the first option is essential to the collaboration 
process: it is entirely possible for two groups to share data but to 
build their own infrastructure and interfaces to that data.




More information about the pd-discuss mailing list