[open-bibliography] BibSoup/BibServer collaboration model?

Thu Feb 2 17:54:21 UTC 2012

Tom Morris <tfmorris at gmail.com> wrote:

> I'm trying to wrap my head around how BibSoup/BibServer works in the
> greater ecosystem of bibliographic data.

Lots of important questions here. Following is my take on these issues.
Little of this is laid out in detail anywhere on a web page. But Naomi
has recently started work on pulling these sorts of materials together.

> In my perfect world, I'd never have to enter any information which is
> not unique to me.  Books, authors, journals, articles, author
> affiliations would all be magically known to the system and all I'd
> need to add would be my own special value.  

My long term goal also.

> Reading list?  Just pick the books to be included on the list.  Annotated bibliography?  Pick
> the journal articles and add my comments or category tags of whatever
> other type of annotation I like.  Reference list? Pick any CSL
> template and book list and get my reference list generated in the
> appropriate format to include in my publication.

Exactly.

> Recognizing that this nirvana is a long way off and being willing to
> help get there, I realize that I'm going to have to enter the data if
> someone else hasn't already, but I'd like others to benefit from that
> work and, conversely, be able to reuse the work that they've done to
> save myself effort.  I know that some of my "data entry" will actually
> be converting existing resources that I have into a usable form rather
> than sitting down at the keyboard and retyping things.

Also my view.

> Control of my work is important, so I'd like to be able to choose how
> much, if any, of my annotations, lists, data entry, etc gets shared
> (or is visible) to others.

Yes.  Delicious manages this issue well. You can bookmark and annotate
a website, and then make this public or private. But delicious has only a
few fields: url/title/notes/tags. A lot can be done just with these fields.
If you replace the notes field by something comparable to a bibtex entry,
and have slightly more structured tags then something that incorporated the
best features of delicious and current bibsoup should serve these requirements.
I have been working on components of this, managing the bookmarking and tagging,
but the permissions/privacy issue remains challenging. I think the right data model
is something like the current BibJSON model for a record, with a record-specific  or in some 
cases field-specific tag which indicates the level of privacy.
I dont think it is useful fo have privacy indicators for basic public domain metadata.
I do think it is useful fo have privacy indicators at the record level, and within records
for some annotation fields. This is a feature request for BibJSON/BibServer.

> What space, if any, does BibSoup/BibServer occupy in this world?  

I think they are a major step towards realizing the goals indicated above.

> If Jim Pitman has a public reading list and I have a reading list, can I
> easily create a merged reading list with the books from both?  

I can, currently, with simple python code operating over BibJSON lists. This is not
yet available as a feature in BibServer, but it could fairly easily be.
Another feature request.

> If Jim's bibliography and my bibliography have a different list of
> authors for a journal article, is there a collaboration mechanism
> other than email for coming to agreement, if we both desire, on the
> correct set of authors and updating the bibliographic database with the results of that 
> agreement?

This is a mixed socio/technical question.
I have dealt with this issue for many thousands of instance records  by now, and have a 
fairly clear view of how it should be handled, at least technically, and some idea of the 
social side.
You have your record in BibJSON, I have mine. They differ. Simple python code allows the
two records to be compared. Trivial differences can be ignored, according to some normalization
convention. Substantial differences can be  treated field by field.
For quick applications, a hybrid record is machine-created by indicating in a config file for each
field what source is to be preferred, and ignoring the other field.
For more serious applications, like creation of an authoritative personal bibliography of a
deceased author, an editor looks at the two fields and makes a judgement, to take one or the other or do a hand-edited merge. I have primitive code and UIs for all these tasks.
Simple merging code is easy. Providing adequate UIs is hard. This sort of task can also potentially be crowdsourced, but the programming for that is beyond my current capabilities.
Now, when the merged record is created, there is the question of who owns or controls it,
committing it back to a database, etc. My view is that various biblio agents should be responsible for the cleanliness of their own biblio stores, hence responsible for what they write to those
stores. If you create a hybrid record and include it in your collection, that is up to you.
The hybrid record can also be created on the fly, stored in an index, and that could be done
by any particular BibServer instance, according to editorial policy governing that instance.
I dont think we know the answer to how best to manage the governance and quality control of
such operations. But we do now have at least primitive tools to perform them.

> Perhaps it would be illustrative to compare and contrast with other
> existing widely known services and tools such as Zotero, Mendeley,
> CiteUlike, and the venerable emacs/Bibtex/LaTex.  What is better,
> worse, or just different?  Which sets of things are alternatives to
> each other and which complement each other?  What are the things which
> make BibSoup/BibServer unique?

Excellent suggestion, and I would be glad to contribute to a wiki or other 
collaborative document space to respond to this. Naomi or Mark, could you
suggest how to organize a community response to these questions?
My own way would be to start in a Google Doc, which I would be glad to initiate,
but before doing that let us see what Naomi and Mark suggest.

Many thanks for the questions/suggestions.

--Jim

----------------------------------------------
Jim Pitman
Professor of Statistics and Mathematics
University of California
367 Evans Hall # 3860
Berkeley, CA 94720-3860

ph: 510-642-9970  fax: 510-642-7892
e-mail: pitman at stat.berkeley.edu
URL: http://www.stat.berkeley.edu/users/pitman