[open-bibliography] Introducing Knowledge for All
Jim Pitman
pitman at stat.Berkeley.EDU
Tue Oct 25 00:53:45 UTC 2011
Mark Leggott <mleggott at k4all.ca> wrote:
> The idea of a "repository for biblio datasets" is part of the K4All plan.
Excellent. I expect Thomas and I will be glad to push copies of all of our data currently on 3lib into such a repo.
I would be glad to discuss further the versioning needs of users.
arXiv has done a terrific job of meeting these, by providing a stack of versions, and returning the most recent
one by default. Hopefully your setup will have that capability, which would be a major improvement on all currently
available biblio stores I am aware of.
> We will be posting more detail on the technical architecture shortly,
I look forward to seeing that, especially what you propose to require in terms of metadata standards for biblio datasets.
It would be desirable to sync those requirements with ongoing BibJSON dev discussions.
> the goal is to have a rich internal XML schema for metadata that can be mapped to other schemas/formats, similar to what you are suggesting below.
Great, provided it is not a cost of admission to provide rich metadata. That would turn away potential data providers.
I wonder what you are thinking in terms of licensing requirements. Some sort of click-through CC license seems the
way to go. I would like to see more discussion on this list of what is the most appropriate license for biblio data, and how if at all to determine
when the line is crossed from structured text to database. Should a large BibTeX or BibJSON file be considered a text file to be licensed by e.g. by
CC0 or some flavor of CC-BY, or a database to be licensed by http://opendatacommons.org/licenses/odbl/ or similar?
> The data will be stored in the Fedora repository framework, which provides a great deal of flexibility in how you get the data in and out.
Sounds great to me. I really look forward to interacting with this repository.
How I see BibSoup/BibServer interacting with your proposed repo is we would provide various services and views over biblio datasets held in your
repo, which would allow customization by various research communities to construct views and services according to their needs.
Ideally, there should be many biblio dataset repos, we should not have to rely on a centralized system. But right now there are no dataset
repos providing the necessary services. It is great that there is one on the horizon. I wonder what could be done to promote mirrors and the
like, e.g. perhaps Talis or Amazon? There are now a number of such services which offer to mirror suitably licensed public datasets.
I do think it is important to encourage replication, while maintaining clarity about where the source data repository service lies.
This is exemplified well by the use of arXiv, which is clearly to everyone the place where content is deposited, while allowing numerous
fronts to provide various services and added value.
with appreciation of your initiative in this space,
--Jim
----------------------------------------------
Jim Pitman
Professor of Statistics and Mathematics
University of California
367 Evans Hall # 3860
Berkeley, CA 94720-3860
ph: 510-642-9970 fax: 510-642-7892
e-mail: pitman at stat.berkeley.edu
URL: http://www.stat.berkeley.edu/users/pitman
More information about the open-bibliography
mailing list