[okfn-discuss] Submitting comments to the Library of Congress?
jonathan.gray at okfn.org
Tue Dec 11 20:42:45 UTC 2007
Below is a (rather long) draft response to the LoC draft report:
Any comments would be much appreciated!
(Its also inline below.)
= Open bibliographic data? - comments on draft report by the Working
Group on the Future of Bibliographic Control at the Library of Congress =
14th December 2007
'''Rufus Pollock, The Open Knowledge Foundation''' [[BR]]
'''Jonathan Gray, The Open Knowledge Foundation''' [[BR]]
'''Peter Suber, The Scholarly Publishing and Academic Resources
'''Aaron Schwartz, The Open Library'''[[BR]]
== Introduction ==
This document is a response to the call for comments on a draft released
by the Working Group on the Future of Bibliographic Control on 30th
November 2007 .
We think it is laudable that the Working Group have recommended that the
Library of Congess takes a more active role in leading the library world
into 21st century. Their vision of a bibliographic control ecosystem
which is "collaborative, decentralized, international in scope and
web-based" (p. 1) is timely.
However, we are concerned that there is no explicit mention of the
potential benefits of open licensing for bibliographic data. Over the
past few years, open licensing has facilitated the explosive growth of a
'knowledge commons'. To give a few prominent examples: Open Access
journals, Open Educational Resources and Open Data in scientific
research  have all been enabled by licenses which permit material to
be freely re-used and re-distributed .
We believe open licensing would strongly help to catalyse the
flourishing of an information ecology for bibliographic data - by
allowing and encouraging anyone to share, modify and build on it. Openly
licensed bibliographic data would allow users and developers to:
* improve the quality of the data by correcting errors, and adding
* attempt to harmonise and integrate data that is from multiple
sources, in different formats and which adheres to different sets of
* use technologies such as wikis and versioning systems to facilitate
the collaborative development of data ;
* host bibliographic data and experiment with distributed data
provision and access;
* combine bibliographic datasets with other material - such as
user-contributed reviews, images and 'tags';
* build innovative (web) applications to explore and represent the
wealth of information contained in bibliographic records, e.g. through
datamining and/or visualization technologies ;
* extract structured, machine-readable data from bibliographic records
and to link this to other open datasets in the emerging semantic web of
New kinds of technologies are emerging very rapidly - and we think that
one of the best ways for the library community to see the fruits of
these developments applied to bibliographic data is to permit greater
experimentation with the data by the wider technical community - and the
general public. Placing restrictions on how bibliographic records may be
re-used effectively inhibits community-led development and innovative
'tinkering'. One of the implicit principles of more 'open' models of
development is that 'the most interesting thing to be done with your
material will be thought of by someone else'. This kind of thought
resonates strongly with the "decentralised", "dynamic", "collaborative"
ethos propagated in the report, in which users and third party
organisations are encouraged to play a more active role in bibliographic
== Summary of key comments ==
* The potential benefits of open licensing should be mentioned in the
draft. We've identified several places where such mention may be
* The draft should strive to acknowledge a broad spectrum of parties
who may contribute to an ecosystem of bibliographic control, and who
benefit from shared bibliographic data - including individual technical
developers, enthusiasts and a diverse variety of third part
organisations - rather than simply either libraries, library users and
commercial contractors. (Cf. comments on p. 1, par. 1)
* Open licensing can help to lower or remove transaction costs. (Cf.
comments on p. 1, par. 1)
* We urge that even if value-added data products or services are sold
in order to recover costs, openly licensing 'raw' bibliographic data is
still considered. (Cf. comments on p. 4, par. 3)
* The LC takes into account short and long term opportunities to create
'public value' as well as opportunities for market growth when
considering making alterations to its pricing structure. (Cf. comments
on p. 8, par. 1; p. 13, sect. 1.1.4)
* The report should explicitly acknowledge significant work by
non-profit organisations in the areas of digitisation and bibliographic
control as well as contributions of commercial vendors. (Cf. comments on
p. 8, par. 2)
* The Library of Congress should take a leading role in encouraging
bibliographic data to be shared - encouraging other individual libraries
to make their data available under an open license where possible. (Cf.
comments on p. 8, par. 5)
* Open bibliographic data would encourage relevant groups to improve
and build on each other's work rather than doubling up effort in
parallel development. (Cf. comments on p. 9, par. 1)
* A strong culture of sharing bibliographic information may help
libraries not become over-dependent on third party contractors to
replace work currently done by Library of Congress. (Cf. comments on p.
15, sect. 1.2)
* The products of digitizing material that is in the public domain
should be made available under an open license where possible. (Cf.
comments on pp. 19-20, sect. 2; p. 21, sect 2.4)
* The Library of Congress should implement changes in metadata
standards such that there is be a field within each bibliographic record
to specify the license the record is available under (Cf. comments on
pp. 21-26, sect. 3.)
== Comments on the Draft Report ==
N.B. We take 'bibliographic data' to refer to metadata concerning
library holdings - primarily in the form of bibliographic records.
=== Introduction ===
p. 1, par. 1
"Its realization will occur in cooperation with the private sector, and
with the active collaboration of library users."
* The implied distinction - between formal cooperation with the private
sector and input from ordinary library users - may become increasingly
blurred. We think it would be valuable to recognise that there is
potential for a broad spectrum of potential collaborators ranging
between these two poles - including individual technical developers and
smaller groups who might wish to re-use or add value to bibliographic
data without necessarily, e.g., contracting with the relevant producer.
"Data will be gathered from multiple sources; change will happen
quickly; and bibliographic control will be dynamic, not static."
* Open licensing would help to ensure that bibliographic control is
dynamic and that change happens quickly by eradicating the requirement
that every user asks permission from every data producer for each new
application of bibliographic information.
"Libraries must continue the transition to this future without delay in
order to retain their relevance as information providers."
* As mentioned above, openly licensing bibliographic material would
help to accelerate this transition by allowing third parties to
experiment with innovative ways of re-using it and building on it -
including the development of new kinds of applications, services,
plugins, and so on.
=== Background ===
p. 4, par. 3
"According to current congressional regulations, LC is permitted to
recover only direct costs for services provided to others. As a result,
the fees that the Library charges do not cover the most expensive aspect
of cataloging: namely, the cost of the intellectual work. . The
economics of creating LC's products have changed dramatically since the
time when the Library was producing cards for library catalogs. It is
now time to reevaluate the pricing of LC's product line in order to
develop a business model that allows LC to more substantially recoup its
* Reevaluating product pricing is arguably one way among several
towards cost recovery. Also, while the LC might recoup costs through
revenue generated through value-added products and services - we hope
this does not preclude any effort to encourage the circulation of its
=== Guiding Principles ===
p. 7, par. 3
"Different communities of bibliographic practice have grown up around
different resource types: library collections of books and journals,
archives, journal articles, and museum objects and images. As these
resources and others become increasingly accessible through the Web,
separation of the communities of practice that manage them is no longer
desirable, sustainable, or functional. Bibliographic control is
increasingly a matter of managing relationships—among works, names,
concepts, and object descriptions—across communities. Consistency of
description within any single environment, such as the library catalog,
is becoming less significant than the ability to make connections
between environments: Amazon to WorldCat to Google to PubMed to
Wikipedia, with library holdings serving as but one node in this web of
connectivity. In today's environment, bibliographic control cannot
continue to be seen as limited to library catalogs."
* Again, open licensing could be mentioned here, given this projected
decentralisation and the importance of widespread collaboration among
many different parties.
p. 8, par. 1
"Once considered a public good, information access is today a commodity
in a rapidly-growing marketplace. Many information resources formerly
managed in the not-for-profit sector are now the objects of a
significant for-profit economy. Entities in this latter economy have
financial capabilities far beyond those of libraries. Further, they have
the resources to engage in large scale research and development."
* We think its crucial here to strike a balance here between
encouraging public benefit and market growth. Open licensed
bibliographic data would allow the general public to benefit from being
able to freely re-use and re-distribute the it, as well as commercial
organisations to benefit from being able to re-use it in their products
and services. Increased commercial exploitation would also arguably
indirectly generate more revenue for government organisations such as LC
through an increase in taxable profits. Open licensing also allows
community driven development, which may in some cases yield similar or
even preferable results to well funded closed models of development.
Also open licensing is becoming increasingly popular for large
for-profit enities, who may, for example, charge for associated services.
p. 8, par. 2
"Libraries of today need to recognize that they are but one group of
players in a vast field, and that market conditions necessitate that
libraries interact increasingly with the commercial sector. One example
of such interaction can be found in the various mass digitization
projects in which for-profit organizations are making use of library
resources and library metadata."
* It is also important to recognise new partnerships with non-profit
organisations in this area - such as the important digitisation work
being carried out by the Internet Archive and by The Open Library with
members of the Open Content Alliance.
p. 8, par. 5
"Sharing, however, is not a strategy for LC alone. The entire library
community and its many partners must also be part of it."
* Again, by advocating liberal licensing practices on a wide scale -
the LC could effectively encourage libraries to scale their
bibliographic control operations by sharing their data.
p. 9, par. 1
"Is there duplicate effort being expended? Are there possible
partnerships that could reduce the burden on the Library?"
* Open licensing in this area would encourage relevant groups to
improve and build on each other's work rather than doubling up effort in
p. 9, par. 4
"In addition, the standards landscape in the library field is murky,
with many different organizations working on similar standards in a
* See comments on p. 9, par. 1, above.
=== Findings and Recommendations ===
p. 11, sect. 1.1
"The Working Group identified three primary areas of redundancy in the
bibliographic production process:
1. the supply chain, wherein some data are created by publishers and
vendors and later re-created by library catalogers;
2. the modification of records within the library community, wherein
such modifications are not shared, even though they could be useful to
3. the expenses that are incurred when individual libraries must
purchase records because the sharing of those records is prohibited or
* This whole section on increased sharing and eliminating redundancies
is an opportune place to allude to the potential of open licensing.
p. 12, sect. 126.96.36.199 & 188.8.131.52
"184.108.40.206 All: Be more flexible in accepting bibliographic data from
others (e.g., publishers, foreign libraries) that do not conform
precisely to U.S. library standards."
"220.127.116.11 All: Develop workflow and mechanisms to use data and metadata
from network resources, such as abstracting and indexing services,
Amazon, IMDb, etc., where those can enhance the user's experience in
seeking and using information.
* Its likely that some form of liberal licensing is requisite for
utilising third party data (18.104.22.168) and in re-purposing existing
metadata (22.214.171.124) on a large scale.
p. 13, sect. 1.1.4
"1.1.4 Re-Examine the Current Economic Model for Data Sharing in the
126.96.36.199 LC: Convene a representative group consisting of libraries
(large and small), vendors, and OCLC members to address costs, barriers
to change, and the value of potential gains arising from greater sharing
of data, and to develop recommendations for change.
188.8.131.52 LC: Promote widespread discussion of barriers to sharing data.
184.108.40.206 LC: Reevaluate the pricing of LC's product line with a view to
developing a business model that enables more substantial cost recovery."
* We strongly suggest that the public good (or the economic notion of
'social welfare'), in addition to cost recovery, should be taken into
account in the analysis of these issues. Particularly given the trend
setting role it is suggested that LC takes in the wider world of
p. 15, sect. 1.2
"Long-term dependence on Library of Congress bibliographic services
leaves the users of those services increasingly vulnerable to any
changes in them.
Long-term reliance on Library of Congress leadership and on its
provision of cataloging records leads libraries—even some large
libraries with relatively plentiful staff—to think that they bear no
responsibility, individually or collectively, for sharing substantively
in the work of
* Note the same would be true if, for example, more libraries
outsourced bibliographic work to 'closed' private contractors to replace
core functions that had previously been fulfilled by LC. It seems that a
stronger culture of sharing and exchanging data between libraries
(perhaps in addition to third party contractors and contributions) is a
more sustainable strategy that would leave libraries in a better
position in the longer term - and able to do at least some work 'in house'.
p. 16, sect. 1.2
"All types of libraries will contribute to the best of their abilities
and resources to the "public good" that comes from bibliographic control
and resource sharing."
* Again, we strongly suggest that this is factored into the kinds of
discussions and analyses recommended in 1.1.4 (cf. comments on p. 13,
p. 18, sect. 1.3
"There will be increased sharing of authority data between libraries and
between library systems and systems from other communities, with library
authority data available to anyone working with bibliographic data.
Economies will be realized by minimizing the number of times the same
entity needs to be researched. Exchange of information about the same
name from one system to another will be made simpler and more reliable.
Access to data will be unimpeded and barriers to using data will be
* This is another opportune moment to mention the potential benefits of
pp. 19-20, sect. 2 & p. 21, sect 2.4
"2.4 Encourage Digitization to Allow Broader Access"
* Though, as stated above, our primary interest in these comments is in
bibliographic metadata, we also advocate making the digitised images of
material that is in the public domain available under an open license
pp. 21-26, sect. 3
* It would be extremely valuable if LC encouraged all library records
to have a standard metadata field that that included information on the
license of the library record itself.
p. 23, sect. 3.1
"Library bibliographic data will move from the closed database model to
the open Web-based model wherein records are addressable by programs and
are in formats that can be easily integrated into Web services and
computer applications. This will enable libraries to make better use of
networked data resources and to take advantage of the relationships that
exist (or could be made to exist) among various data sources on the Web."
* Open licensing could greatly help to facilitate the emergence of such
a 'open' model.
p. 28, sect. 4.1
"Library bibliographic data will be used in a wide variety of
environments, and interoperability between library and non-library
bibliographic applications will increase/improve.
Library catalogs are seen as valuable components in an interlocking
array of discovery tools."
* Again, this is a particularly opportune place to mention the
possibility of using a liberal license.
p. 31, sect. 220.127.116.11
"18.104.22.168 LC: Provide LCSH openly for use by library and non-library
pp. 33-4, sect. 5.1
* We stronly advocating that the 'public good' be taken into account
while building an evidence base. (Cf. p. 13, sect. 1.1.4)
== References ==
All page numbers refer to the Draft Final Report of the Working Group
 Letter from the Working Group – November 30, 2007
 According to the Directory of Open Access Journals
<http://www.doaj.org/> there are now just under 3000 Open Access
journals with over 160,000 articles. See Open Access News
<http://www.earlham.edu/~peters/fos/fosblog.html> for more on open
projects in scholarly publishing and research. OER (Open Educational
Resources) Commons <http://www.oercommons.org/> is a major portal for
open course content. Science Commons <http://sciencecommons.org/> is a
significant proponent of open licensing for scientific research data.
 Creative Commons and Talis both maintain open licenses such as the
Creative Commons Attribution license
<http://creativecommons.org/licenses/by/2.0/> and the Open Database
Another frequently used open license is the GFDL, which Wikipedia's
content is licensed under. For a more comprehensive list see
 The Open Library is a prominent project that is currently
experimenting with versioning in bibliographic data
 To give an example, many developers are exploring different uses of
the open-source suite of tools from MIT's Simile project
<http://simile.mit.edu/>, which allows large datasets to be represented
on a timeline.
 The WC3 Community Project 'Linking Open Data'
which includes Tim Berners-Lee is currently pioneering work in this area.
Rufus Pollock wrote:
> Jonathan Gray wrote:
>> Hi all,
>> The Library of Congress has asked for comments on a draft produced by
>> a Working Group they initiated on the 'Future of Bibliographic Control'.
>> As some of you may have seen I recently blogged about this:
>> The deadline for public comments is 15th December. I think it would
>> be great if we could submit some brief notes on the potential
>> benefits of openly licensing bibliographic data!
> We should definitely draft something. Would you be happy to put
> something together and then post it to the list (or on the wiki with a
> link for the list).
> Looking through the PR from the LC in your mail (not included here)
> some potential points to make would be:
> * Best way to achieve sharing and deliver value for a publicly funded
> ORG such as LC is to make biblio metadata *openly* available. Why?
> * More bugfixing, possibilities for 'wiki-like' management of data
> etc => better quality data
> * Allows for possibility of distributed data provision and access
> (reducing load, reducing latency, risk of downtime etc etc)
> * Better access both in terms of multiple forms/formats and others
> designing a better interface (see comments in a recent blog post )
> * Possibilities for reuse and recombination with other data sources
> * Must emphasize we are not talking about their content at this point
> just the *metadata*
> * Might also want to point out that for content in which there are no
> rights problems (i.e. public domain) should make that stuff openly
> available for same reason.
> * Overall: for publicly funded bodies open approaches maximize social
>> Does anyone know if any groups or individuals have already submitted
>> comments along these lines?
> No-one to my knowledge but that might not be saying much ...
>> Can anyone think of any organisations/individuals who might be
>> interested in helping out with this?
> Should obviously contact archive.org/openlibrary. Paul from Talis has
> already posted so it looks like something is happening there. Might
> want to also try contacting CC (perhaps Jon Phillips) though the time
> constraints might be a little tight.
More information about the okfn-discuss