[open-humanities] Forking Project Gutenberg to Github
seth at sethish.com
Tue Aug 19 18:09:20 UTC 2014
I've been working on a project called GITenberg <http://gitenberg.github.io>
The aim is to move Project Gutenberg's books to github.
As you probably know, Project Gutenberg (PG) is an amazing organization
that has been digitizing public domain books since the 1970s. They have
around 45,000 books.
But PG is hesitant to upgrade their tools, and have limited resources to
work on new projects. But there are issues with the current collection.
There are some remaining typos and transcription errors. And many books
are using old encoding formats (PG predates unicode).
I want to help with that, and along the way, produce something that more
developers, OKFN hackers, digital humanists and other groups can readily
GITenberg uses git and github to keep track of books. This adds a number
of features right out of the gate, including:
+ version control via git
+ public bug tracking (PG uses a private RT instance to track reported
+ public collaboration (pull requests under public review)
PG's metadata is provided in RDF/XML, in a 230mb zip file. While this is a
wonderful resource, RDF isn't the easiest format for most developers to
pick up and use. In fact, the .zip file has so many top-level folders, it
can't be completely unpacked on some filesystems (ext3).
I've created repos and included the book source files (often including
images!) for 43,000 of PG's books and put them on github.
There is a lot yet that I hope to do, but I would love to get OKFN's
feedback, requests, or assistance!
Uploading script <https://github.com/sethwoodworth/GITenberg>
Mailing list <https://groups.google.com/forum/#!forum/gitenberg-project>
All the best,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the open-humanities