[openbiblio-dev] Open Biblio call tomorrow

Jim Pitman pitman at stat.Berkeley.EDU
Tue Feb 7 19:39:09 UTC 2012


> On Tue, Feb 7, 2012 at 6:01 PM, Naomi Lillie <naomi.lillie at okfn.org> wrote:

> > At last week's meeting it was decided that it is useful to keep these
> > meetings weekly during this busy stage of the project, so there is another
> > catch-up call tomorrow (Wednesday 8th February). Please join us at 16.00
> > GMT using the Etherpad http://openbiblio.okfnpad.org/catchup - all
> > welcome, please navigate here at this time and I will Skype call those present.

I should be there

Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
> I shan't be able to make it...
> Note that we has an Openbiblio meeting today with AdrianP, Naomi, MarkMcg
> and myself.  

Sorry, I overlooked this one. But some comments below.

> Adrian and others have done a great job on promoting the release of Open
> biblio. The German National Bibliography (?) is available in RDF as CC0.
> Mark will have a look at at - maybe it's easy to convert to BibJSON - but
> anyway we should work with the suppliers to create BibJSON - this is a
> really clear example of why BJ is useful.

Excellent. Strong support from me. The sort of thing I want to be able to do with these
big national collections is pull all records related to particular subjects of particular authors.
Hopefully that will be facilitated by putting the data into elastic search. This does not seem easy with the
data as RDF. I have asked on the list many times how to do this, and never got a reply. Hopefully we can
demo this functionality with a BibServer instance dedicated to each National Biblio.
Mark, was that your idea? Or do you think you can merge all of these into a single BibSoup instance?
At some stage this must run us into performance issues, I am not sure when. For my own applications, I'd be glad just be
able to query these big datasets and pull out bits of them I care about, and then mix/match the data into my own collections,
with links back to source.

> Mark and the team continue to make developments daily. I am awestruck. I
> help by finding bugs (one of my skills). I think the first stage of BibSoup
> is now clear - it's straightforward for many people to set up a Bibserver -

A clear goal. But we are not there yet, right? As far as I know we have not yet reached the milestone of a public facing BibServer 
not controlled by Mark/OKF. I continue to press for that. There are important social barriers to overcome as well as technical ones.
I would love to see a growing list of where these installations are. Start with 1, learn from the socio-technical issues involved with that,
and keep pressing. But lets distinguish what has already been achieved and what remains to be achieved.

> and it's eevn easier to upload to Mark's server.

Yes, this is great as proof of concept!
I find BibSoup as presently setup an excellent proving ground for various biblio display efforts, like a sandbox,  but I
dont see much future for it without some further partitioning/replication and bibliographic control.
It is too easy for people to post low quality datasets, or datasets under development. I think that is very useful, and a
way of attracting users to BibServer, but it is not the same as a well-organized well-curated collections which I hope we
can start to see emerging soon. Things like the Malaria dataset and the Probability Web dataset should help focus on that.

> We see Open-bib as a centrepiece of our Open Science/Access efforts (see
> open-science and open-access). 
What exactly do you mean by "open-science" and "open-access"? Projects? URLS? I seem to be out of this loop.

> The interactive breakthrough will come (I think) when we can easily annotate records (I am thinking by adding new fields).

Yes. But we need to be very thoughtful about the data model for this to work well. I think the right data model is to allow that
agents like MathSciNet, PubMed, Google Scholar and others provide fairly stable records, and even more stable identifiers, and to distinguish
what are essentially just copies of these records, which should be acknowledged to preserve provenance, and further derivative records.
Users, in their own collections, should be able to easily supplement such a record from any source with a correction in a field or two, and with supplementary fields.
But the more common and effective use case will be for a user to create a new composite record from whatever records of the same object are out there.
Daniel Hook's Symplectic software does a great job of this merging.  And I have some passes at this too. Essentially, this creates a new record, which the user owns, and 
which inherits some properties from the source records, and other properties which might be edits by hand, or provided with some machine processing, e.g. automated name-disambiguation 
or subject classification.  This is a difficult area of mangaging workflows for bibliographic data enhancement, but one which may be very rewarding.
It will be hard to bound the scope of such efforts. I would be inclined to chip away at it a bit at a time, but with a data model where there is no apparent
obstacle to further progress.

--Jim
----------------------------------------------
Jim Pitman
Professor of Statistics and Mathematics
University of California
367 Evans Hall # 3860
Berkeley, CA 94720-3860

ph: 510-642-9970  fax: 510-642-7892
e-mail: pitman at stat.berkeley.edu
URL: http://www.stat.berkeley.edu/users/pitman




More information about the openbiblio-dev mailing list