krichel at openlib.org
Sun Nov 6 14:09:06 UTC 2011
Mark Leggott writes
> Hi Thomas - I thought I would start an offline
you mean offlist ;-)
> thread so we don;t need to discuss details on list...
I think we should because there are other people that may start to
work with this. We need a re-usable base of bibliographic data that
is shared so we can build services on top of that data understand
each other. So when, say, AuthorClaim says that user X has
disclaimed document D, that information is reusable. Or when k4all
say that D has been access 200 times in November, that information
is reusable. AuthorClaim can report this to user X.
This is how RePEc works. I started this in 1993, and it's been quite
> How much disk space do you need? Depending on what your requirements
> are we may be able to host the full set if that is useful.
There are two parts to this.
The first is the 3lib dataset as used in AuthorClaim and
AuthorProfile. This only contains titles, authors and
URLs. Recently, on the source machine
krichel at wotan:~$ du -s /home/mamf/opt/amf/3lib/
This data is free because it is factual. I think the HAL data also
has abstracts and for selected sources we may start adding
abstract. Call that the AMF data.
Then, there are other parts of the data that are sources of the AMF
data, and scripts used to transform the data from source to AMF. Off
the top of my head, this includes PubMed that I can't redistribute
for licensing reasons. There is CrossRef data the status of which is
For the OKFN machine, I export all that is in the home of mamf
mamf at wotan:~$ crontab -l | grep sofca
53 22 * * * rsync -aq --delete --delete-during --exclude-from ~/etc/sofca.exclude ~/ flib at sofca:opt/wotan/home/mamf
mamf at wotan:~$ cat etc/sofca.exclude
I can try to make that list longer, but I would also have to change
the 3lib.org documentation.
I can't go through all of this now, it's a big body of work. I will
have quite a bit of time to work on this from mid January to
September next year. Right now I am maintaining stuff.
I suggest you get the AMF data soon, it will give you something to
play with. As your business plan says, you want some data that is
essentially cc0. The AMF set is not necessarily de jure CC0, but
de facto it is.
Send me a public key to authorize.
Thomas Krichel http://openlib.org/home/krichel
More information about the open-bibliography