[open-humanities] Forking Project Gutenberg to Github
rufus.pollock at okfn.org
Wed Aug 20 17:40:41 UTC 2014
On 20 August 2014 15:31, Seth Woodworth <seth at sethish.com> wrote:
> Ah, I wrote a section about that in my email but removed it for brevity.
> > Is there an easy way of searching Gitenberg for sets of texts, e.g. for
> all the works by a particular author?
> Not yet.
> Metadata is my next focus for the project. Right now, the only search
> function is the github repo search, which is insufficient.
I wonder if we have a connection with Textus / OpenLiterature - the whole
idea of the recent incarnation was to *not* store to the texts in the
service but have them live in flat-files or equivalent somewhere. Plus we
always want to import gutenberg stuff!
At the very least the textus-viewer could be used for viewing the texts (if
we could add the relevant typography stuff)
> Now, metadata is stored in RDF/XML. PG has a metadata file for each book
> (which I'm also tracking with git here
> <https://github.com/sethwoodworth/PG_rdf_metadata>). I've put a copy of
> the metadata file in each book repo as *pg<bookid>.rdf* . This isn't
> I'm extending the recently released python tool gutenberg
> <https://bitbucket.org/c-w/gutenberg> to parse more fields of PG's RDF
> schema. When that is finished, I can include a .json file in each book repo
> and release a big json file of all metadata.
A simple JSON file would be very nice. It might be a stretch but I wonder
if using datapackage.json structure would be appropriate here - see
http://data.okfn.org/doc/data-package. After all the text files seem a
natural fit as resources here.
> Next step, is an API server and simple search tool for people to be able
> to interact with the metadata without having to download >200mb.
> > What metadata is there for each work, and what provision is there for
> adding to it?
> Book title, Author(s) (and variant spellings), Library of Congress
> Subject Heading <https://github.com/sethwoodworth/LCC>, and some other
> metadata. PG changed their schema recently and I'm not 100% sure how many
> variants the 40k rdf files contain. I will see if I can get an answer to
> that question later today.
> As far as adding new metadata, there are several decisions to make:
> + which is the canonical metadata file, .rdf, .json or both?
> + what schema to use for arbitrary or specific new metadata
I'd go for JSON. I've suggested datapackage.json for the "container" but
could be worth looking at Textus stuff for suggestions on particular fields
(which cites bibjson though bibjson seems to be down).
> I would prefer to make these choices as a community rather than just me.
> But that being said, I'm accepting most PR's that come in and keeping
> track in case we need to migrate anything in the future.
> P.S. Thanks for opening issues on GITenberg repos!
> On Wed, Aug 20, 2014 at 5:48 AM, John Levin <john at anterotesis.com> wrote:
>> Hello Gitenberg!
>> On 19/08/2014 19:09, Seth Woodworth wrote:
>>> Hello Humanities!
>>> I've been working on a project called GITenberg
>>> The aim is to move Project Gutenberg's books to github.
>> This is a really interesting project, and one I hope could be adapted for
>> other large collections of texts.
>> Couple of quick questions:
>> Is there an easy way of searching Gitenberg for sets of texts, e.g. for
>> all the works by a particular author?
>> What metadata is there for each work, and what provision is there for
>> adding to it?
>> John Levin
>> open-humanities mailing list
>> open-humanities at lists.okfn.org
>> Unsubscribe: https://lists.okfn.org/mailman/options/open-humanities
> open-humanities mailing list
> open-humanities at lists.okfn.org
> Unsubscribe: https://lists.okfn.org/mailman/options/open-humanities
*Rufus PollockFounder and President | skype: rufuspollock | @rufuspollock
<https://twitter.com/rufuspollock>Open Knowledge <http://okfn.org/> - see
how data can change the world**http://okfn.org/ <http://okfn.org/> | @okfn
<http://twitter.com/OKFN> | Open Knowledge on Facebook
<https://www.facebook.com/OKFNetwork> | Blog <http://blog.okfn.org/>*
The Open Knowledge Foundation is a not-for-profit organisation. It is
incorporated in England & Wales as a company limited by guarantee, with
company number 05133759. VAT Registration № GB 984404989. Registered
office address: Open Knowledge Foundation, St John’s Innovation Centre,
Cowley Road, Cambridge, CB4 0WS, UK.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the open-humanities