[open-humanities] Forking Project Gutenberg to Github

James Harriman-Smith james.harriman-smith at okfn.org
Thu Aug 21 10:44:02 UTC 2014


This is just a quick email to say how delighted I am that this project is
getting traction, and may even link up with Textus further down the line.
Seth, if there's anything we can do to help you: user testing, spreading
the word, etc. do let us know.

I'll be setting up the next Open Humanities call soon, just as soon as I
finish running a summer school, and would love to put this on the agenda if
you think some open discussion would be handy.

J


On 20 August 2014 20:21, David Potocnik <david.potocnik at gmail.com> wrote:

> I'd love to work on all of this (It has been a project idea of years)
> if basic subsistence funding would be provided.
> I've looked at Textus and and OpenLit and have some proto code of my
> own + a bunch of notes.
>
> I wish there would be a foundation for hackers that would
> 1) cut the red tape and deadlines
> 2) not award a load of cash (like every foundation out there) and a
> jetset lifestyle, just a subsistence fund. like 500€/month.
>
> Sorry to derail, but it's an open invitation.
>
> David
>
> On 20 August 2014 19:40, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> > On 20 August 2014 15:31, Seth Woodworth <seth at sethish.com> wrote:
> >>
> >> Ah, I wrote a section about that in my email but removed it for brevity.
> >>
> >> > Is there an easy way of searching Gitenberg for sets of texts, e.g.
> for
> >> > all the works by a particular author?
> >>
> >> Not yet.
> >>
> >> Metadata is my next focus for the project.  Right now, the only search
> >> function is the github repo search, which is insufficient.
> >
> >
> > I wonder if we have a connection with Textus / OpenLiterature - the whole
> > idea of the recent incarnation was to *not* store to the texts in the
> > service but have them live in flat-files or equivalent somewhere. Plus we
> > always want to import gutenberg stuff!
> >
> > At the very least the textus-viewer could be used for viewing the texts
> (if
> > we could add the relevant typography stuff)
> >
> >>
> >> Now, metadata is stored in RDF/XML.  PG has a metadata file for each
> book
> >> (which I'm also tracking with git here).  I've put a copy of the
> metadata
> >> file in each book repo as pg<bookid>.rdf .  This isn't ideal.
> >>
> >> I'm extending the recently released python tool gutenberg to parse more
> >> fields of PG's RDF schema. When that is finished, I can include a .json
> file
> >> in each book repo and release a big json file of all metadata.
> >
> >
> > A simple JSON file would be very nice. It might be a stretch but I
> wonder if
> > using datapackage.json structure would be appropriate here - see
> > http://data.okfn.org/doc/data-package. After all the text files seem a
> > natural fit as resources here.
> >
> >>
> >> Next step, is an API server and simple search tool for people to be able
> >> to interact with the metadata without having to download >200mb.
> >
> >
> >>
> >> > What metadata is there for each work, and what provision is there for
> >> > adding to it?
> >>
> >> Book title, Author(s) (and variant spellings), Library of Congress
> Subject
> >> Heading, and some other metadata. PG changed their schema recently and
> I'm
> >> not 100% sure how many variants the 40k rdf files contain.  I will see
> if I
> >> can get an answer to that question later today.
> >>
> >> As far as adding new metadata, there are several decisions to make:
> >> + which is the canonical metadata file, .rdf, .json or both?
> >> + what schema to use for arbitrary or specific new metadata
> >
> >
> > I'd go for JSON. I've suggested datapackage.json for the "container" but
> > could be worth looking at Textus stuff for suggestions on particular
> fields
> > (which cites bibjson though bibjson seems to be down).
> >
> >>
> >> I would prefer to make these choices as a community rather than just me.
> >> But that being said, I'm accepting most PR's that come in and keeping
> >> track in case we need to migrate anything in the future.
> >
> >
> > Rufus
> >
> >>
> >>
> >>
> >> P.S. Thanks for opening issues on GITenberg repos!
> >>
> >>
> >> On Wed, Aug 20, 2014 at 5:48 AM, John Levin <john at anterotesis.com>
> wrote:
> >>>
> >>> Hello Gitenberg!
> >>>
> >>>
> >>> On 19/08/2014 19:09, Seth Woodworth wrote:
> >>>>
> >>>> Hello Humanities!
> >>>>
> >>>> I've been working on a project called GITenberg
> >>>> <http://gitenberg.github.io>.
> >>>>
> >>>>
> >>>> The aim is to move Project Gutenberg's books to github.
> >>>>
> >>>
> >>> <snip>
> >>>
> >>> This is a really interesting project, and one I hope could be adapted
> for
> >>> other large collections of texts.
> >>>
> >>> Couple of quick questions:
> >>> Is there an easy way of searching Gitenberg for sets of texts, e.g. for
> >>> all the works by a particular author?
> >>> What metadata is there for each work, and what provision is there for
> >>> adding to it?
> >>>
> >>> Best,
> >>>
> >>> John
> >>>
> >>> --
> >>> John Levin
> >>> http://www.anterotesis.com
> >>> http://twitter.com/anterotesis
> >>>
> >>> _______________________________________________
> >>> open-humanities mailing list
> >>> open-humanities at lists.okfn.org
> >>> https://lists.okfn.org/mailman/listinfo/open-humanities
> >>> Unsubscribe: https://lists.okfn.org/mailman/options/open-humanities
> >>
> >>
> >>
> >> _______________________________________________
> >> open-humanities mailing list
> >> open-humanities at lists.okfn.org
> >> https://lists.okfn.org/mailman/listinfo/open-humanities
> >> Unsubscribe: https://lists.okfn.org/mailman/options/open-humanities
> >>
> >
> >
> >
> > --
> >
> > Rufus Pollock
> >
> > Founder and President | skype: rufuspollock | @rufuspollock
> >
> > Open Knowledge - see how data can change the world
> >
> > http://okfn.org/ | @okfn | Open Knowledge on Facebook |  Blog
> >
> > The Open Knowledge Foundation is a not-for-profit organisation.  It is
> > incorporated in England & Wales as a company limited by guarantee, with
> > company number 05133759.  VAT Registration № GB 984404989. Registered
> office
> > address: Open Knowledge Foundation, St John’s Innovation Centre, Cowley
> > Road, Cambridge, CB4 0WS, UK.
> >
> >
> > _______________________________________________
> > open-humanities mailing list
> > open-humanities at lists.okfn.org
> > https://lists.okfn.org/mailman/listinfo/open-humanities
> > Unsubscribe: https://lists.okfn.org/mailman/options/open-humanities
> >
> _______________________________________________
> open-humanities mailing list
> open-humanities at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-humanities
> Unsubscribe: https://lists.okfn.org/mailman/options/open-humanities
>



-- 
James Harriman-Smith
Open Literature Working Group Coordinator
Open Knowledge Foundation
http://okfn.org/members/jameshs
Skype: james.harriman.smith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-humanities/attachments/20140821/0d9c1069/attachment-0003.html>


More information about the open-humanities mailing list