[open-bibliography] BibSoup/BibServer collaboration model?

Karen Coyle kcoyle at kcoyle.net
Fri Feb 3 15:15:49 UTC 2012



On 2/2/12 9:54 AM, Jim Pitman wrote:

> This is a mixed socio/technical question.
> I have dealt with this issue for many thousands of instance records  by now, and have a
> fairly clear view of how it should be handled, at least technically, and some idea of the
> social side.
> You have your record in BibJSON, I have mine. They differ. Simple python code allows the
> two records to be compared. Trivial differences can be ignored, according to some normalization
> convention. Substantial differences can be  treated field by field.
> For quick applications, a hybrid record is machine-created by indicating in a config file for each
> field what source is to be preferred, and ignoring the other field.

There is another option, which many systems use:
Rather than merge the records (and risk losing data), the records are 
held in a cluster. For basic displays, one record in the cluster is 
selected (so that you don't get two titles, two journals, etc.), but at 
any point someone using the database can select their preferred citation 
to download or use.

Sometimes the members of the cluster are stored in a somewhat "offline" 
fashion so they don't increase the size of the database for storage and 
retrieval, but there are pointers to the original records.

If you merge and select fields, you cannot easily correct merging 
errors. Keeping all of the incoming records is the only way to re-run 
clustering over time.

kc

> For more serious applications, like creation of an authoritative personal bibliography of a
> deceased author, an editor looks at the two fields and makes a judgement, to take one or the other or do a hand-edited merge. I have primitive code and UIs for all these tasks.
> Simple merging code is easy. Providing adequate UIs is hard. This sort of task can also potentially be crowdsourced, but the programming for that is beyond my current capabilities.
> Now, when the merged record is created, there is the question of who owns or controls it,
> committing it back to a database, etc. My view is that various biblio agents should be responsible for the cleanliness of their own biblio stores, hence responsible for what they write to those
> stores. If you create a hybrid record and include it in your collection, that is up to you.
> The hybrid record can also be created on the fly, stored in an index, and that could be done
> by any particular BibServer instance, according to editorial policy governing that instance.
> I dont think we know the answer to how best to manage the governance and quality control of
> such operations. But we do now have at least primitive tools to perform them.
>
>> Perhaps it would be illustrative to compare and contrast with other
>> existing widely known services and tools such as Zotero, Mendeley,
>> CiteUlike, and the venerable emacs/Bibtex/LaTex.  What is better,
>> worse, or just different?  Which sets of things are alternatives to
>> each other and which complement each other?  What are the things which
>> make BibSoup/BibServer unique?
>
> Excellent suggestion, and I would be glad to contribute to a wiki or other
> collaborative document space to respond to this. Naomi or Mark, could you
> suggest how to organize a community response to these questions?
> My own way would be to start in a Google Doc, which I would be glad to initiate,
> but before doing that let us see what Naomi and Mark suggest.
>
> Many thanks for the questions/suggestions.
>
> --Jim
>
> ----------------------------------------------
> Jim Pitman
> Professor of Statistics and Mathematics
> University of California
> 367 Evans Hall # 3860
> Berkeley, CA 94720-3860
>
> ph: 510-642-9970  fax: 510-642-7892
> e-mail: pitman at stat.berkeley.edu
> URL: http://www.stat.berkeley.edu/users/pitman
>
> _______________________________________________
> open-bibliography mailing list
> open-bibliography at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-bibliography

-- 
Karen Coyle
kcoyle at kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet




More information about the open-bibliography mailing list