[open-bibliography] getting a personal bib library out

Jim Pitman pitman at stat.Berkeley.EDU
Thu Jan 19 16:43:14 UTC 2012


Thad Guidry <thadguidry at gmail.com> wrote:

> This full URL you gave:
> http://bibsoup.net/collections/?q=source:"
> http://bibserver.berkeley.edu/tmp/testbib.bib"&format=json&meta=false
> sucked into Google Refine easily.  And that should give folks plenty of
> options to get the data out of Bibserver, edit, & export the data out of
> Refine and reuse it however they want. (even templating it for re-use in a wiki web page!)

Thad, that's great. Are you able to
1) provide a wrapper around Google Refine to make it both import and export BibJSON?
2) provide users with a way to access Google Refine via the web so that it does this?
That would be major progress I think.
We might also approach Google Refine developers and ask them to incorporate the wrapper in their product.

This is a generic issue we have with any desktop product or webservice for biblio data processing.
How to interface the product/service with BibJSON? 
How to approach the product or service provider to persuade them to incorporate the BibJSON interface into their product? 
Answering this will require ongoing effort from BKN staff and volunteers, and this effort needs to be coordinated.

I have made a number of such efforts in the last weeks, specifically developing interfaces for the following data sources:
arXiv, MathSciNet, Google Scholar, Google WWW, Springerlink, DBLP ...
In each case, I have a python module which will query the source and convert the reponse to a BibJSON dataset
suitable for upload to BibSoup. I need help with refactoring, hardening, organizing and publishing these modules.  
These modules should be incorporated into a simple python client which will allow anyone with a laptop to query any of these sources and immediately 
upload the results to BibSoup.
Right now I have such a python client running on a Berkeley Server which I can access with a bookmarklet from my browser.
I'd like to see this python client communally maintained with frequent updates  to deal with changing datasource schemas and allow various enhancements.
Volunteers to assist with this development effort?

Mark, many thanks for advice re accessing datasets by source_url.
The curl request you gave did not work on my machine, but I was able to interpret it in python.
See appended python code.
Where to put such code fragments along with documentation of machine access to BibSoup?

Possibly this should be directed to the dev list and not the general open-bibliography list. But I would
still like to invite volunteers from the larger list to contribute to both social and technical aspects  of interfacing
various datasources with BibJSON/BibServer. Hopefully this effort will become better coordinated soon. 
A list of volunteers and what they can potentially contribute to this effort will be a good start towards that.

Many thanks

--Jim




######################################################################
## python code ####
import urllib, json



def get_bibsoup_metadata(source_url):
    ''' returns dictionary of bibsoup metadata for dataset from source at url.
    dictionary is empty if bibsoup has no copy of dataset at url '''
	#url = 'http://bibsoup.net/collections/?q=source:"http://bibserver.berkeley.edu/tmp/testbib.bib"&format=json&meta=false'
	#2) The JSON result will include [0]['id'] and [0]['owner']
	#3) from which can be inferred http://bibsoup.net/pitman/test_bib
	#I have added an feature request to the issues to add a /source/<url> routing.
    bibsoup_url = 'http://bibsoup.net/collections/?q=source:"' + source_url +'"&format=json&meta=false'
    infile = urllib.urlopen(bibsoup_url)
    instring = infile.read()
    ls = json.loads(instring)
    if ls:
        d = ls[0]
        d['url'] = 'http://bibsoup.net/' + d['owner'] + '/' + d['id']
    else: d = {}
    return d


ignore = '''
'''
ls_urls = '''
http://bibserver.berkeley.edu/tmp/testbib.bib 
http://bibserver.berkeley.edu/tmp/bibjson/data/authors/peres_yuval.json
this_is_not_a_url
'''

for u in ls_urls.split():
    print u
    d= get_bibsoup_metadata(u)
    print json.dumps(d,indent=4)
    print
~                                                                                                                                                                                                  
~                                            















More information about the open-bibliography mailing list