[openbiblio-dev] Open Biblio call tomorrow

Jim Pitman pitman at stat.Berkeley.EDU
Tue Feb 7 20:19:15 UTC 2012


Peter, thanks for pointers to "open-science" and "open-access" lists. For now,
it seems you are monitoring these and prepared to deal with bridging to open-biblio,
so no pressing need for me to join them, right? I am however as you know a big fan
of open-science and open-access, so if there are actions I can take to assist either of these
causes do let me know. For the moment, I'm focussed on the open-biblio strategy which we seem to agree
is useful to pursue, even if Harnad does not.

> > > The interactive breakthrough will come (I think) when we can easily
> > annotate records (I am thinking by adding new fields).
> >
> > Yes. But we need to be very thoughtful about the data model for this to
> > work well. I think the right data model is to allow that
> > agents like MathSciNet, PubMed, Google Scholar and others provide fairly
> > stable records, and even more stable identifiers, and to distinguish
> > what are essentially just copies of these records, which should be
> > acknowledged to preserve provenance, and further derivative records.
> >
> That's the model I have assumed. It also extends to library collections
> such as BNB and the german collection.

Yes. I think the simplest thing for us to do will to enable users to do simple 
selection with queries and checkboxes over a single big collection, then be allowed to mark up their copy of a record in
various ways.  Working over multiple big collections and asserting equivalence of records would be a next step, but
this may be much harder.

> > Users, in their own collections, should be able to easily supplement such
> > a record from any source with a correction in a field or two, and with
> > supplementary fields.
> > But the more common and effective use case will be for a user to create a
> > new composite record from whatever records of the same object are out there.
> > Daniel Hook's Symplectic software does a great job of this merging.  And I
> > have some passes at this too. Essentially, this creates a new record, which
> > the user owns, and
> > which inherits some properties from the source records, and other
> > properties which might be edits by hand, or provided with some machine
> > processing, e.g. automated name-disambiguation
> > or subject classification.  This is a difficult area of mangaging
> > workflows for bibliographic data enhancement, but one which may be very
> > rewarding.
> > It will be hard to bound the scope of such efforts. I would be inclined to
> > chip away at it a bit at a time, but with a data model where there is no
> > apparent
> > obstacle to further progress.
> >
> > I think that the technology and the community will find many ways of using
> this. In the malaria project there is certainly no pressing need to
> disambiguate or correct - so it's adding information. 

This is the relatively simple use case mentioned above.  What about leveraging
http://acawiki.org/ as a place for users to write summaries of articles?
This might be a quick win, as acawiki seems like it is a well functioning project. 
Connecting to their community might also attract open biblio contributors.

> For books collections who knows?

We know that the straight wiki approach does not work too well for books on a large scale. Too many
problems with spam and neglect. Open Library has shown that. It seems better to focus on getting domain communities to curate
artifacts of all sorts in their domain, books, articles, videos, ... rather than working genre
by genre. The genre categories of all articles, all books, ... are too big, and already dominated
by the biggest players. We are better off trying to replicate and improve with better software the
subject-specific communities like RePEc and NASA ADS, and, more easily to enable smaller communities
of this type, such as your Malaria community.

--Jim

----------------------------------------------
Jim Pitman
Professor of Statistics and Mathematics
University of California
367 Evans Hall # 3860
Berkeley, CA 94720-3860

ph: 510-642-9970  fax: 510-642-7892
e-mail: pitman at stat.berkeley.edu
URL: http://www.stat.berkeley.edu/users/pitman




More information about the openbiblio-dev mailing list