[open-bibliography] OCLC's license for FAST

William Waites ww at styx.org
Wed Jan 4 22:15:31 UTC 2012


On Wed, 4 Jan 2012 22:41:16 +0100, Thomas Krichel <krichel at openlib.org> said:

    >> Somebody harvest it and republish please. The ODC-By license
    >> allows for this.

    pkr1>   If somebody write a harvesting program, I will be happy to
    pkr1> maintain it on the OKFN sponsored server for 3lib. In fact,
    pkr1> the harverster could run there and store it there.

Easy peasy. Script below. Warning: it's not a very polite crawler, or
very intelligent. But it does grab both the RDF and MARC21 version.

Cheers,
-w


#!/bin/sh

i=1
d=0

while true; do
    echo $i
    mkdir -p fast/${d}

    curl -s http://experimental.worldcat.org/fast/${i}/rdf.xml > fast/${d}/${i}.rdf
    if grep -q "404: Document not found" fast/${d}/${i}.rdf; then
        rm fast/${d}/${i}.rdf
    fi
    gzip -9 fast/${d}/${i}.rdf

    curl -s http://experimental.worldcat.org/fast/${i}/marc21.xml > fast/${d}/${i}.xml
    if grep -q "404: Document not found" fast/${d}/${i}.xml; then
        rm fast/${d}/${i}.xml
    fi
    gzip -9 fast/${d}/${i}.xml

    i=$(($i + 1))
    if test $i -gt 1700000; then
        break
    fi

    if test $(($i % 10000)) -eq 0; then
        d=$(($d + 1))
    fi
done
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/open-bibliography/attachments/20120104/31049451/attachment-0001.sig>


More information about the open-bibliography mailing list