[open-bibliography] Introducing Knowledge for All

Tue Oct 25 00:53:45 UTC 2011

Mark Leggott <mleggott at k4all.ca> wrote:

> The idea of a "repository for biblio datasets" is part of the K4All plan. 

Excellent. I expect Thomas and I will be glad to push copies of all  of our data currently on 3lib into such a repo.
I would be glad to discuss further the versioning needs of users.
arXiv has done a terrific job of meeting these, by providing a stack of versions, and returning the most recent
one by default. Hopefully your setup will have that capability, which would be a major improvement on all currently
available biblio stores I am aware of.

> We will be posting more detail on the technical architecture shortly, 

I look forward to seeing that, especially what you propose to require in terms of metadata standards for biblio datasets.
It would be desirable to sync those  requirements with ongoing BibJSON dev discussions. 

> the goal is to have a rich internal XML schema for metadata that can be mapped to other schemas/formats, similar to what you are suggesting below. 
Great, provided  it is not a cost of admission to provide rich metadata. That would turn away potential data providers.

I wonder what you are thinking in terms of  licensing requirements. Some sort of click-through CC license seems the
way to go. I would like to see more discussion on this list of what is the most appropriate license for biblio data, and how if at all to determine
when the line is crossed from structured text to database. Should a large BibTeX or BibJSON file be considered a text file to be licensed by e.g. by
CC0 or some flavor of CC-BY, or a database to be licensed by  http://opendatacommons.org/licenses/odbl/ or similar?

> The data will be stored in the Fedora repository framework, which provides a great deal of flexibility in how you get the data in and out. 

Sounds great to me. I really look forward to interacting with this repository.

How I see BibSoup/BibServer interacting with your proposed repo is we would provide various services and views over biblio datasets  held in your
repo, which would allow customization by various research communities to construct views and services according to their needs.
Ideally, there should be many biblio dataset repos, we should not have to rely on a centralized system. But right now there are no dataset
repos providing the necessary services. It is great that there is one on the horizon. I wonder what could be done to promote mirrors and the
like, e.g. perhaps Talis or Amazon?   There are now a number of such services which offer to mirror suitably licensed public datasets.
I do think it is important to encourage replication, while maintaining clarity about where the source data repository service lies.
This is exemplified well by the use of arXiv, which is clearly to everyone the place where content is deposited, while allowing numerous
fronts to provide various services and added value.

with appreciation of your initiative in this space,

--Jim 
----------------------------------------------
Jim Pitman
Professor of Statistics and Mathematics
University of California
367 Evans Hall # 3860
Berkeley, CA 94720-3860

ph: 510-642-9970  fax: 510-642-7892
e-mail: pitman at stat.berkeley.edu
URL: http://www.stat.berkeley.edu/users/pitman