[open-bibliography] getting a personal bib library out

Wed Jan 18 23:22:05 UTC 2012

Previously I wrote:

> > Then here is the dataset on my Berkeley BibServer
> > http://bibserver.berkeley.edu/cgi-bin/bibs7?https://raw.github.com/langner/library/master/library.small.bib

Sorry, there was a mistake in this url, which reverts to the default Schramm dataset not Karol's. The right
url for her collection  on the legacy Berkeley BibServer is

http://bibserver.berkeley.edu/cgi-bin/bibs7?source=https://raw.github.com/langner/library/master/library.small.bib

To clarify my general point, it is that the copy of her biblio on bibsoup should be retrievable with  just as
simple a url indicating the source. I dont care about details of the address, I just want to be able to script it from the source url,
and it should be fully supported and documented. I should not have to first learn and keep track of the fact that
bibsoup likes to use karol_m_langer_s_bibliography as the id for this dataset and that it was user pitman who uploaded it and hence that

http://bibsoup.net/pitman/karol_m_langer_s_bibliography

is the appropriate address. I should know in advance that if I push a dataset to bibsoup then it will be retrievable at an address which
is a simple function of its source url. It should also be possible to anonymise the upload. There are at least 20K Google Scholar which could
machine uploaded to bibsoup by script, and it would be easy to write a daemon which checked from time to time for new profiles hidden in Google Scholar
and uploaded them whenever it found them. You dont need personal ownership of datasets for this. These should somehow be communal property.
The same applies to any arguably open bibliographic source. DBLP, OL are other examples. The act of pushing a button to initiate an upload from that source to bibsoup should
not make a user the owner of the dataset.

> > Do you mean find the Bibservers or retrieve their content? 
Hopefully clarified above.

> I fully agree that a Bibserver needs something like .dump()

Yes, that too, but it will take a while to run. Maybe something to run daily and post the result somewhere that can handle big files.
It does not seem reasonable or necessary to expect typical bibserver installations to support frequent requests for complete dumps.
Is this something CKAN could handle?
--Jim

----------------------------------------------
Jim Pitman
Professor of Statistics and Mathematics
University of California
367 Evans Hall # 3860
Berkeley, CA 94720-3860

ph: 510-642-9970  fax: 510-642-7892
e-mail: pitman at stat.berkeley.edu
URL: http://www.stat.berkeley.edu/users/pitman