  Mark Leggott writes

> Hi Thomas - I thought I would start an offline

  you mean offlist ;-)

> thread so we don;t need to discuss details on list...

  I think we should because there are other people that may start to
  work with this. We need a re-usable base of bibliographic data that
  is shared so we can build services on top of that data understand
  each other. So when, say, AuthorClaim says that user X has
  disclaimed document D, that information is reusable.  Or when k4all
  say that D has been access 200 times in November, that information
  is reusable. AuthorClaim can report this to user X.

  This is how RePEc works. I started this in 1993, and it's been quite

> How much disk space do you need? Depending on what your requirements
> are we may be able to host the full set if that is useful.

  There are two parts to this.

  The first is the 3lib dataset as used in AuthorClaim and
  AuthorProfile.  This only contains titles, authors and
  URLs. Recently, on the source machine

krichel at wotan:~$ du -s /home/mamf/opt/amf/3lib/
42022232        /home/mamf/opt/amf/3lib/

  This data is free because it is factual. I think the HAL data also
  has abstracts and for selected sources we may start adding
  abstract. Call that the AMF data.

  Then, there are other parts of the data that are sources of the AMF
  data, and scripts used to transform the data from source to AMF. Off
  the top of my head, this includes PubMed that I can't redistribute
  for licensing reasons. There is CrossRef data the status of which is
  also dodgy. 

  For the OKFN machine, I export all that is in the home of  mamf

mamf at wotan:~$ crontab -l | grep sofca
53 22  * * * rsync -aq --delete --delete-during --exclude-from ~/etc/sofca.exclude ~/ flib at sofca:opt/wotan/home/mamf


mamf at wotan:~$ cat etc/sofca.exclude
# .ssh
# data
# lib
# usr
# Mail

  I can try to make that list longer, but I would also have to change
  the 3lib.org documentation.

  I can't go through all of this now, it's a big body of work. I will
  have quite a bit of time to work on this from mid January to
  September next year. Right now I am maintaining stuff.

  I suggest you get the AMF data soon, it will give you something to
  play with. As your business plan says, you want some data that is
  essentially cc0. The AMF set is not necessarily de jure CC0, but
  de facto it is. 

  Send me a public key to authorize.


