[okfn-discuss] Submitting comments to the Library of Congress?

Jonathan Gray jonathan.gray at okfn.org
Tue Dec 11 20:42:45 UTC 2007

Hi all,

Below is a (rather long) draft response to the LoC draft report:


Any comments would be much appreciated!

(Its also inline below.)



= Open bibliographic data? - comments on draft report by the Working 
Group on the Future of Bibliographic Control at the Library of Congress =

14th December 2007

'''Rufus Pollock, The Open Knowledge Foundation''' [[BR]]
'''Jonathan Gray, The Open Knowledge Foundation''' [[BR]]
'''Peter Suber, The Scholarly Publishing and Academic Resources 
Coalition''' [[BR]]
'''Aaron Schwartz, The Open Library'''[[BR]]

== Introduction ==

This document is a response to the call for comments on a draft released 
by the Working Group on the Future of Bibliographic Control on 30th 
November 2007 [1].

We think it is laudable that the Working Group have recommended that the 
Library of Congess takes a more active role in leading the library world 
into 21st century. Their vision of a bibliographic control ecosystem 
which is "collaborative, decentralized, international in scope and 
web-based" (p. 1) is timely.

However, we are concerned that there is no explicit mention of the 
potential benefits of open licensing for bibliographic data. Over the 
past few years, open licensing has facilitated the explosive growth of a 
'knowledge commons'. To give a few prominent examples: Open Access 
journals, Open Educational Resources and Open Data in scientific 
research [2] have all been enabled by licenses which permit material to 
be freely re-used and re-distributed [3].

We believe open licensing would strongly help to catalyse the 
flourishing of an information ecology for bibliographic data - by 
allowing and encouraging anyone to share, modify and build on it. Openly 
licensed bibliographic data would allow users and developers to:
 * improve the quality of the data by correcting errors, and adding 
ancillary information;
 * attempt to harmonise and integrate data that is from multiple 
sources, in different formats and which adheres to different sets of 
 * use technologies such as wikis and versioning systems to facilitate 
the collaborative development of data [4];
 * host bibliographic data and experiment with distributed data 
provision and access;
 * combine bibliographic datasets with other material - such as 
user-contributed reviews, images and 'tags';
 * build innovative (web) applications to explore and represent the 
wealth of information contained in bibliographic records, e.g. through 
datamining and/or visualization technologies [5];
 * extract structured, machine-readable data from bibliographic records 
and to link this to other open datasets in the emerging semantic web of 
data [6].

New kinds of technologies are emerging very rapidly - and we think that 
one of the best ways for the library community to see the fruits of 
these developments applied to bibliographic data is to permit greater 
experimentation with the data by the wider technical community - and the 
general public. Placing restrictions on how bibliographic records may be 
re-used effectively inhibits community-led development and innovative 
'tinkering'. One of the implicit principles of more 'open' models of 
development is that 'the most interesting thing to be done with your 
material will be thought of by someone else'. This kind of thought 
resonates strongly with the "decentralised", "dynamic", "collaborative" 
ethos propagated in the report, in which users and third party 
organisations are encouraged to play a more active role in bibliographic 

== Summary of key comments ==

 * The potential benefits of open licensing should be mentioned in the 
draft. We've identified several places where such mention may be 
 * The draft should strive to acknowledge a broad spectrum of parties 
who may contribute to an ecosystem of bibliographic control, and who 
benefit from shared bibliographic data - including individual technical 
developers, enthusiasts and a diverse variety of third part 
organisations - rather than simply either libraries, library users and 
commercial contractors. (Cf. comments on p. 1, par. 1)
 * Open licensing can help to lower or remove transaction costs. (Cf. 
comments on p. 1, par. 1)
 * We urge that even if value-added data products or services are sold 
in order to recover costs, openly licensing 'raw' bibliographic data is 
still considered. (Cf. comments on p. 4, par. 3)
 * The LC takes into account short and long term opportunities to create 
'public value' as well as opportunities for market growth when 
considering making alterations to its pricing structure. (Cf. comments 
on p. 8, par. 1; p. 13, sect. 1.1.4)
 * The report should explicitly acknowledge significant work by 
non-profit organisations in the areas of digitisation and bibliographic 
control as well as contributions of commercial vendors. (Cf. comments on 
p. 8, par. 2)
 * The Library of Congress should take a leading role in encouraging 
bibliographic data to be shared - encouraging other individual libraries 
to make their data available under an open license where possible. (Cf. 
comments on p. 8, par. 5)
 * Open bibliographic data would encourage relevant groups to improve 
and build on each other's work rather than doubling up effort in 
parallel development. (Cf. comments on p. 9, par. 1)
 * A strong culture of sharing bibliographic information may help 
libraries not become over-dependent on third party contractors to 
replace work currently done by Library of Congress. (Cf. comments on p. 
15, sect. 1.2)
 * The products of digitizing material that is in the public domain 
should be made available under an open license where possible. (Cf. 
comments on pp. 19-20, sect. 2; p. 21, sect 2.4)
 * The Library of Congress should implement changes in metadata 
standards such that there is be a field within each bibliographic record 
to specify the license the record is available under (Cf. comments on 
pp. 21-26, sect. 3.)

== Comments on the Draft Report ==

N.B. We take 'bibliographic data' to refer to metadata concerning 
library holdings - primarily in the form of bibliographic records.

=== Introduction ===

p. 1, par. 1

"Its realization will occur in cooperation with the private sector, and 
with the active collaboration of library users."
 * The implied distinction - between formal cooperation with the private 
sector and input from ordinary library users - may become increasingly 
blurred. We think it would be valuable to recognise that there is 
potential for a broad spectrum of potential collaborators ranging 
between these two poles - including individual technical developers and 
smaller groups who might wish to re-use or add value to bibliographic 
data without necessarily, e.g., contracting with the relevant producer.

"Data will be gathered from multiple sources; change will happen 
quickly; and bibliographic control will be dynamic, not static."
 * Open licensing would help to ensure that bibliographic control is 
dynamic and that change happens quickly by eradicating the requirement 
that every user asks permission from every data producer for each new 
application of bibliographic information.

"Libraries must continue the transition to this future without delay in 
order to retain their relevance as information providers."
 * As mentioned above, openly licensing bibliographic material would 
help to accelerate this transition by allowing third parties to 
experiment with innovative ways of re-using it and building on it - 
including the development of new kinds of applications, services, 
plugins, and so on.

=== Background ===

p. 4, par. 3

"According to current congressional regulations, LC is permitted to 
recover only direct costs for services provided to others. As a result, 
the fees that the Library charges do not cover the most expensive aspect 
of cataloging: namely, the cost of the intellectual work. . The 
economics of creating LC's products have changed dramatically since the 
time when the Library was producing cards for library catalogs. It is 
now time to reevaluate the pricing of LC's product line in order to 
develop a business model that allows LC to more substantially recoup its 
actual costs."

 * Reevaluating product pricing is arguably one way among several 
towards cost recovery. Also, while the LC might recoup costs through 
revenue generated through value-added products and services - we hope 
this does not preclude any effort to encourage the circulation of its 
raw data.

=== Guiding Principles ===

p. 7, par. 3

"Different communities of bibliographic practice have grown up around 
different resource types: library collections of books and journals, 
archives, journal articles, and museum objects and images. As these 
resources and others become increasingly accessible through the Web, 
separation of the communities of practice that manage them is no longer 
desirable, sustainable, or functional. Bibliographic control is 
increasingly a matter of managing relationships—among works, names, 
concepts, and object descriptions—across communities. Consistency of 
description within any single environment, such as the library catalog, 
is becoming less significant than the ability to make connections 
between environments: Amazon to WorldCat to Google to PubMed to 
Wikipedia, with library holdings serving as but one node in this web of 
connectivity. In today's environment, bibliographic control cannot 
continue to be seen as limited to library catalogs."

 * Again, open licensing could be mentioned here, given this projected 
decentralisation and the importance of widespread collaboration among 
many different parties.

p. 8, par. 1

"Once considered a public good, information access is today a commodity 
in a rapidly-growing marketplace. Many information resources formerly 
managed in the not-for-profit sector are now the objects of a 
significant for-profit economy. Entities in this latter economy have 
financial capabilities far beyond those of libraries. Further, they have 
the resources to engage in large scale research and development."

 * We think its crucial here to strike a balance here between 
encouraging public benefit and market growth. Open licensed 
bibliographic data would allow the general public to benefit from being 
able to freely re-use and re-distribute the it, as well as commercial 
organisations to benefit from being able to re-use it in their products 
and services. Increased commercial exploitation would also arguably 
indirectly generate more revenue for government organisations such as LC 
through an increase in taxable profits. Open licensing also allows 
community driven development, which may in some cases yield similar or 
even preferable results to well funded closed models of development. 
Also open licensing is becoming increasingly popular for large 
for-profit enities, who may, for example, charge for associated services.

p. 8, par. 2

"Libraries of today need to recognize that they are but one group of 
players in a vast field, and that market conditions necessitate that 
libraries interact increasingly with the commercial sector. One example 
of such interaction can be found in the various mass digitization 
projects in which for-profit organizations are making use of library 
resources and library metadata."

 * It is also important to recognise new partnerships with non-profit 
organisations in this area - such as the important digitisation work 
being carried out by the Internet Archive and by The Open Library with 
members of the Open Content Alliance.

p. 8, par. 5

"Sharing, however, is not a strategy for LC alone. The entire library 
community and its many partners must also be part of it."

 * Again, by advocating liberal licensing practices on a wide scale - 
the LC could effectively encourage libraries to scale their 
bibliographic control operations by sharing their data.

p. 9, par. 1

"Is there duplicate effort being expended? Are there possible 
partnerships that could reduce the burden on the Library?"

 * Open licensing in this area would encourage relevant groups to 
improve and build on each other's work rather than doubling up effort in 
parallel development.

p. 9, par. 4
"In addition, the standards landscape in the library field is murky, 
with many different organizations working on similar standards in a 
non-coordinated fashion."

 * See comments on p. 9, par. 1, above.

=== Findings and Recommendations ===

p. 11, sect. 1.1

"The Working Group identified three primary areas of redundancy in the 
bibliographic production process:
  1. the supply chain, wherein some data are created by publishers and 
vendors and later re-created by library catalogers;
  2. the modification of records within the library community, wherein 
such modifications are not shared, even though they could be useful to 
others; and
  3. the expenses that are incurred when individual libraries must 
purchase records because the sharing of those records is prohibited or 

 * This whole section on increased sharing and eliminating redundancies 
is an opportune place to allude to the potential of open licensing.

p. 12, sect. &

" All: Be more flexible in accepting bibliographic data from 
others (e.g., publishers, foreign libraries) that do not conform 
precisely to U.S. library standards."

" All: Develop workflow and mechanisms to use data and metadata 
from network resources, such as abstracting and indexing services, 
Amazon, IMDb, etc., where those can enhance the user's experience in 
seeking and using information.

 * Its likely that some form of liberal licensing is requisite for 
utilising third party data ( and in re-purposing existing 
metadata ( on a large scale.

p. 13, sect. 1.1.4

"1.1.4 Re-Examine the Current Economic Model for Data Sharing in the 
Networked Environment LC: Convene a representative group consisting of libraries 
(large and small), vendors, and OCLC members to address costs, barriers 
to change, and the value of potential gains arising from greater sharing 
of data, and to develop recommendations for change. LC: Promote widespread discussion of barriers to sharing data. LC: Reevaluate the pricing of LC's product line with a view to 
developing a business model that enables more substantial cost recovery."

 * We strongly suggest that the public good (or the economic notion of 
'social welfare'), in addition to cost recovery, should be taken into 
account in the analysis of these issues. Particularly given the trend 
setting role it is suggested that LC takes in the wider world of 
bibliographic control.

p. 15, sect. 1.2

"Long-term dependence on Library of Congress bibliographic services 
leaves the users of those services increasingly vulnerable to any 
changes in them.

Long-term reliance on Library of Congress leadership and on its 
provision of cataloging records leads libraries—even some large 
libraries with relatively plentiful staff—to think that they bear no 
responsibility, individually or collectively, for sharing substantively 
in the work of
bibliographic control."

 * Note the same would be true if, for example, more libraries 
outsourced bibliographic work to 'closed' private contractors to replace 
core functions that had previously been fulfilled by LC. It seems that a 
stronger culture of sharing and exchanging data between libraries 
(perhaps in addition to third party contractors and contributions) is a 
more sustainable strategy that would leave libraries in a better 
position in the longer term - and able to do at least some work 'in house'.

p. 16, sect. 1.2

"All types of libraries will contribute to the best of their abilities 
and resources to the "public good" that comes from bibliographic control 
and resource sharing."

 * Again, we strongly suggest that this is factored into the kinds of 
discussions and analyses recommended in 1.1.4 (cf. comments on p. 13, 
sect. 1.1.4).

p. 18, sect. 1.3

"There will be increased sharing of authority data between libraries and 
between library systems and systems from other communities, with library 
authority data available to anyone working with bibliographic data. 
Economies will be realized by minimizing the number of times the same 
entity needs to be researched. Exchange of information about the same 
name from one system to another will be made simpler and more reliable. 
Access to data will be unimpeded and barriers to using data will be 

 * This is another opportune moment to mention the potential benefits of 
open licensing.

pp. 19-20, sect. 2 & p. 21, sect 2.4

"2.4 Encourage Digitization to Allow Broader Access"

 * Though, as stated above, our primary interest in these comments is in 
bibliographic metadata, we also advocate making the digitised images of 
material that is in the public domain available under an open license 
where possible.

pp. 21-26, sect. 3

 * It would be extremely valuable if LC encouraged all library records 
to have a standard metadata field that that included information on the 
license of the library record itself.

p. 23, sect. 3.1

"Library bibliographic data will move from the closed database model to 
the open Web-based model wherein records are addressable by programs and 
are in formats that can be easily integrated into Web services and 
computer applications. This will enable libraries to make better use of 
networked data resources and to take advantage of the relationships that 
exist (or could be made to exist) among various data sources on the Web."

 * Open licensing could greatly help to facilitate the emergence of such 
a 'open' model.

p. 28, sect. 4.1

"Library bibliographic data will be used in a wide variety of 
environments, and interoperability between library and non-library 
bibliographic applications will increase/improve.

Library catalogs are seen as valuable components in an interlocking 
array of discovery tools."

 * Again, this is a particularly opportune place to mention the 
possibility of using a liberal license.

p. 31, sect.

" LC: Provide LCSH openly for use by library and non-library 

  * Ditto.
pp. 33-4, sect. 5.1

 * We stronly advocating that the 'public good' be taken into account 
while building an evidence base. (Cf. p. 13, sect. 1.1.4)

== References ==

All page numbers refer to the Draft Final Report of the Working Group 

[1] Letter from the Working Group – November 30, 2007 

[2] According to the Directory of Open Access Journals 
<http://www.doaj.org/> there are now just under 3000 Open Access 
journals with over 160,000 articles. See Open Access News 
<http://www.earlham.edu/~peters/fos/fosblog.html> for more on open 
projects in scholarly publishing and research. OER (Open Educational 
Resources) Commons <http://www.oercommons.org/> is a major portal for 
open course content. Science Commons <http://sciencecommons.org/> is a 
significant proponent of open licensing for scientific research data.

[3] Creative Commons and Talis both maintain open licenses such as the 
Creative Commons Attribution license 
<http://creativecommons.org/licenses/by/2.0/> and the Open Database 
Another frequently used open license is the GFDL, which Wikipedia's 
content is licensed under. For a more comprehensive list see 

[4] The Open Library is a prominent project that is currently 
experimenting with versioning in bibliographic data 

[5] To give an example, many developers are exploring different uses of 
the open-source suite of tools from MIT's Simile project 
<http://simile.mit.edu/>, which allows large datasets to be represented 
on a timeline.

[6] The WC3 Community Project 'Linking Open Data' 
which includes Tim Berners-Lee is currently pioneering work in this area.

Rufus Pollock wrote:
> Jonathan Gray wrote:
>> Hi all,
>> The Library of Congress has asked for comments on a draft produced by 
>> a Working Group they initiated on the 'Future of Bibliographic Control'.
>> As some of you may have seen I recently blogged about this:
>> http://blog.okfn.org/2007/12/06/the-future-of-bibliographic-control-and-licensing-policies-for-bibliographic-data/ 
>> The deadline for public comments is 15th December. I think it would 
>> be great if we could submit some brief notes on the potential 
>> benefits of openly licensing bibliographic data!
> We should definitely draft something. Would you be happy to put 
> something together and then post it to the list (or on the wiki with a 
> link for the list).
> Looking through the PR from the LC in your mail (not included here) 
> some  potential points to make would be:
> * Best way to achieve sharing and deliver value for a publicly funded 
> ORG such as LC is to make biblio metadata *openly* available. Why?
>   * More bugfixing, possibilities for 'wiki-like' management of data 
> etc => better quality data
>   * Allows for possibility of distributed data provision and access 
> (reducing load, reducing latency, risk of downtime etc etc)
>   * Better access both in terms of multiple forms/formats and others 
> designing a better interface (see comments in a recent blog post [1])
>   * Possibilities for reuse and recombination with other data sources
> * Must emphasize we are not talking about their content at this point 
> just the *metadata*
> * Might also want to point out that for content in which there are no 
> rights problems (i.e. public domain) should make that stuff openly 
> available for same reason.
> * Overall: for publicly funded bodies open approaches maximize social 
> welfare!
> [1]:<http://blog.okfn.org/2007/10/31/british-history-online-why-the-restrictions/> 
>> Does anyone know if any groups or individuals have already submitted 
>> comments along these lines?
> No-one to my knowledge but that might not be saying much ...
>> Can anyone think of any organisations/individuals who might be 
>> interested in helping out with this?
> Should obviously contact archive.org/openlibrary. Paul from Talis has 
> already posted so it looks like something is happening there. Might 
> want to also try contacting CC (perhaps Jon Phillips) though the time 
> constraints might be a little tight.
> ~rufus

More information about the okfn-discuss mailing list