[open-bibliography] OCLC's license for FAST
William Waites
ww at styx.org
Wed Jan 4 22:15:31 UTC 2012
On Wed, 4 Jan 2012 22:41:16 +0100, Thomas Krichel <krichel at openlib.org> said:
>> Somebody harvest it and republish please. The ODC-By license
>> allows for this.
pkr1> If somebody write a harvesting program, I will be happy to
pkr1> maintain it on the OKFN sponsored server for 3lib. In fact,
pkr1> the harverster could run there and store it there.
Easy peasy. Script below. Warning: it's not a very polite crawler, or
very intelligent. But it does grab both the RDF and MARC21 version.
Cheers,
-w
#!/bin/sh
i=1
d=0
while true; do
echo $i
mkdir -p fast/${d}
curl -s http://experimental.worldcat.org/fast/${i}/rdf.xml > fast/${d}/${i}.rdf
if grep -q "404: Document not found" fast/${d}/${i}.rdf; then
rm fast/${d}/${i}.rdf
fi
gzip -9 fast/${d}/${i}.rdf
curl -s http://experimental.worldcat.org/fast/${i}/marc21.xml > fast/${d}/${i}.xml
if grep -q "404: Document not found" fast/${d}/${i}.xml; then
rm fast/${d}/${i}.xml
fi
gzip -9 fast/${d}/${i}.xml
i=$(($i + 1))
if test $i -gt 1700000; then
break
fi
if test $(($i % 10000)) -eq 0; then
d=$(($d + 1))
fi
done
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/open-bibliography/attachments/20120104/31049451/attachment-0001.sig>
More information about the open-bibliography
mailing list