[open-bibliography] BibSoup/BibServer collaboration model?

Fri Feb 3 15:31:13 UTC 2012

On Fri, Feb 3, 2012 at 3:15 PM, Karen Coyle <kcoyle at kcoyle.net> wrote:
>
>
> On 2/2/12 9:54 AM, Jim Pitman wrote:
>
>> This is a mixed socio/technical question.
>> I have dealt with this issue for many thousands of instance records  by
>> now, and have a
>> fairly clear view of how it should be handled, at least technically, and
>> some idea of the
>> social side.
>> You have your record in BibJSON, I have mine. They differ. Simple python
>> code allows the
>> two records to be compared. Trivial differences can be ignored, according
>> to some normalization
>> convention. Substantial differences can be  treated field by field.
>> For quick applications, a hybrid record is machine-created by indicating
>> in a config file for each
>> field what source is to be preferred, and ignoring the other field.
>
>
> There is another option, which many systems use:
> Rather than merge the records (and risk losing data), the records are held
> in a cluster. For basic displays, one record in the cluster is selected (so
> that you don't get two titles, two journals, etc.), but at any point someone
> using the database can select their preferred citation to download or use.
>
> Sometimes the members of the cluster are stored in a somewhat "offline"
> fashion so they don't increase the size of the database for storage and
> retrieval, but there are pointers to the original records.
>
> If you merge and select fields, you cannot easily correct merging errors.
> Keeping all of the incoming records is the only way to re-run clustering
> over time.

We are going to keep records separate - one record per collection. So
each person will maintain their own record. However, all records can
be searched across a bibsoup, so I am thinking of adding a simple
dropdown to the front end that shows "more like this". So you could
see similar records and probably choose to copy them and so on. In
addition to this, we will probably implement a "parent/slave/sameas"
sort of relationship so a record can be copied from another and know
its provenance, and whether or not to track changes.

These will be on the upcoming sprint plans,

Mark

>
> kc
>
>
>> For more serious applications, like creation of an authoritative personal
>> bibliography of a
>> deceased author, an editor looks at the two fields and makes a judgement,
>> to take one or the other or do a hand-edited merge. I have primitive code
>> and UIs for all these tasks.
>> Simple merging code is easy. Providing adequate UIs is hard. This sort of
>> task can also potentially be crowdsourced, but the programming for that is
>> beyond my current capabilities.
>> Now, when the merged record is created, there is the question of who owns
>> or controls it,
>> committing it back to a database, etc. My view is that various biblio
>> agents should be responsible for the cleanliness of their own biblio stores,
>> hence responsible for what they write to those
>> stores. If you create a hybrid record and include it in your collection,
>> that is up to you.
>> The hybrid record can also be created on the fly, stored in an index, and
>> that could be done
>> by any particular BibServer instance, according to editorial policy
>> governing that instance.
>> I dont think we know the answer to how best to manage the governance and
>> quality control of
>> such operations. But we do now have at least primitive tools to perform
>> them.
>>
>>> Perhaps it would be illustrative to compare and contrast with other
>>> existing widely known services and tools such as Zotero, Mendeley,
>>> CiteUlike, and the venerable emacs/Bibtex/LaTex.  What is better,
>>> worse, or just different?  Which sets of things are alternatives to
>>> each other and which complement each other?  What are the things which
>>> make BibSoup/BibServer unique?
>>
>>
>> Excellent suggestion, and I would be glad to contribute to a wiki or other
>> collaborative document space to respond to this. Naomi or Mark, could you
>> suggest how to organize a community response to these questions?
>> My own way would be to start in a Google Doc, which I would be glad to
>> initiate,
>> but before doing that let us see what Naomi and Mark suggest.
>>
>> Many thanks for the questions/suggestions.
>>
>> --Jim
>>
>> ----------------------------------------------
>> Jim Pitman
>> Professor of Statistics and Mathematics
>> University of California
>> 367 Evans Hall # 3860
>> Berkeley, CA 94720-3860
>>
>> ph: 510-642-9970  fax: 510-642-7892
>> e-mail: pitman at stat.berkeley.edu
>> URL: http://www.stat.berkeley.edu/users/pitman
>>
>> _______________________________________________
>> open-bibliography mailing list
>> open-bibliography at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-bibliography
>
>
> --
> Karen Coyle
> kcoyle at kcoyle.net http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet
>
>
> _______________________________________________
> open-bibliography mailing list
> open-bibliography at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-bibliography