[open-bibliography] comprehensive bibliographic database of "open" resources?

Wed Aug 18 09:35:48 UTC 2010

Hallo,

I am not a lawyer but I've just started working with one on a guide for libraries about opening up catalog data. To consider the legal status of catalog data we have to distinguish the different types of data in a catalog as well as distinguishing between individual records and collections/databases.

So, what's in a catalog?

*  bibliographic data (That's the stuff that gets the most interest here),
* subject headings and notations,
* exemplar data (number of holdings, call number)
* metametadata (administrative metadata (last modified etc.) probably often created automatically).
* catalogue enrichment like:
   o links to wikipedia, google books, amazon etc.
   o Cover images (self-scanned or from amazon)
   o links to digitizations of tocs, registers, perhaps bibliographies (I don't understand why libraries didn't include the books' bibliographies in digitizing for catalogue enrichment)
   o abstracts
   o blurbs
   o reviews (user generated or by a lectorate service),
   o user generated tags.

Furthermore there are the authority records like:

    * name authority files
    * subject authority files
    * classifications (I put this to authority files though strictly speaking it might not be right)

Only three things of these you can have copyright individually on: abstracts, blurbs and cover images. These assumptions correspond to the practice in Germany and the contracts and agreements between library organizations and publishers.

So you can take individual records which don't contain copyrighted material without any legal risk. 

What about databases? 
Thomas already talked about American law from which European law differs by also protecting factual databases. The practice in libraryland seems to be to not copyright individual records. Even OCLC has abandoned the thought of copyrighting individual records (they first tried to). The policy now reads in 2. C (http://www.oclc.org/worldcat/recorduse/policy/default.htm#2C): "OCLC claims copyright rights in WorldCat as a compilation, it does not claim copyright ownership of individual records". (During the debate about OCLC's policy I asked  some questions on OCLC's "Online Community Forum" about whether individual members could publish their data under a Public Domain license. OCLC admitted that they would have the legal rights to do so but that the social contract of the policy is intended to prevent such actions. See http://www.oclc.org/worldcat/recorduse/policy/forum/forum.pdf.)

In European law there are two distinct but related rights associated with databases: copyright and the sui generis database right. So, even if you just collect facts and cannot claim any copyright on your database your "qualitatively and/or quantitatively substantial investment in either the obtaining, verification or presentation of the contents" is protected by not allowing the whole database or "substantial parts" to be copied and published by others. This "substantial part" bit is pretty vague and the one thing that is clear that more than half of a database represents a "substantial part" which doesn't mean that you are on the save side by taking less than half.

So, legally you can automatically collect metadata but obiously not without restrictions, at least according to European law. Most probably opening up a bigger part of OCLC's WorldCat would lead to being sued by them. (Additionally, if you've got a contract with a metadata provider, its terms might actually forbid the transmission of any one record to somebody else...) 

Adrian

 >>>Thomas Krichel <krichel at openlib.org> schrieb am Mittwoch, 18. August 2010 um
00:13:
> Peter Murray-Rust writes
> 
>> We absolutely need some consensus on this. 
> 
>   A consensus reach here is of no use if it does not match
>   legal reality.
> 
>> Some people say thatw e can collect this metadata without
>> restrictions - others say we can't.
> 
>   The way I see it (I have no formal legal education) is that, from a
>   US perspective at least, a la Feist vs Rural Telephone, a compendium
>   of factual data can not be copyrighted. The factual parts of a
>   bibliographic descriptions are author name expressions, titles, and
>   location information for full-text. Classfication data may or may
>   not be considered as factual. Abstracts are definitely not.
> 
>   Fortunately, for author identification, we only need the
>   factual components. These are the ones available at 3lib.org
> 
>> There is no technical reason why we cannot extract the metadata
>> automatically ...
> 
>   But there are economic reasons. It's expensive to maintain
>   scraping when sites change.
> 
>> For example we are indexing 10,000 articles from Acta
>> Crystallographica and I doubt a single one is on any reading list.
>> Yest they are critical for data-driven science.
> 
>   Again, when will this be available? 
> 
>   Cheers,
> 
>   Thomas Krichel                    http://openlib.org/home/krichel 
>                                 http://authorclaim.org/profile/pkr1 
>                                                skype: thomaskrichel
> 
> _______________________________________________
> open-bibliography mailing list
> open-bibliography at lists.okfn.org 
> http://lists.okfn.org/mailman/listinfo/open-bibliography