[open-humanities] Forking Project Gutenberg to Github

David Potocnik david.potocnik at gmail.com
Wed Aug 20 19:21:35 UTC 2014


I'd love to work on all of this (It has been a project idea of years)
if basic subsistence funding would be provided.
I've looked at Textus and and OpenLit and have some proto code of my
own + a bunch of notes.

I wish there would be a foundation for hackers that would
1) cut the red tape and deadlines
2) not award a load of cash (like every foundation out there) and a
jetset lifestyle, just a subsistence fund. like 500€/month.

Sorry to derail, but it's an open invitation.

David

On 20 August 2014 19:40, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> On 20 August 2014 15:31, Seth Woodworth <seth at sethish.com> wrote:
>>
>> Ah, I wrote a section about that in my email but removed it for brevity.
>>
>> > Is there an easy way of searching Gitenberg for sets of texts, e.g. for
>> > all the works by a particular author?
>>
>> Not yet.
>>
>> Metadata is my next focus for the project.  Right now, the only search
>> function is the github repo search, which is insufficient.
>
>
> I wonder if we have a connection with Textus / OpenLiterature - the whole
> idea of the recent incarnation was to *not* store to the texts in the
> service but have them live in flat-files or equivalent somewhere. Plus we
> always want to import gutenberg stuff!
>
> At the very least the textus-viewer could be used for viewing the texts (if
> we could add the relevant typography stuff)
>
>>
>> Now, metadata is stored in RDF/XML.  PG has a metadata file for each book
>> (which I'm also tracking with git here).  I've put a copy of the metadata
>> file in each book repo as pg<bookid>.rdf .  This isn't ideal.
>>
>> I'm extending the recently released python tool gutenberg to parse more
>> fields of PG's RDF schema. When that is finished, I can include a .json file
>> in each book repo and release a big json file of all metadata.
>
>
> A simple JSON file would be very nice. It might be a stretch but I wonder if
> using datapackage.json structure would be appropriate here - see
> http://data.okfn.org/doc/data-package. After all the text files seem a
> natural fit as resources here.
>
>>
>> Next step, is an API server and simple search tool for people to be able
>> to interact with the metadata without having to download >200mb.
>
>
>>
>> > What metadata is there for each work, and what provision is there for
>> > adding to it?
>>
>> Book title, Author(s) (and variant spellings), Library of Congress Subject
>> Heading, and some other metadata. PG changed their schema recently and I'm
>> not 100% sure how many variants the 40k rdf files contain.  I will see if I
>> can get an answer to that question later today.
>>
>> As far as adding new metadata, there are several decisions to make:
>> + which is the canonical metadata file, .rdf, .json or both?
>> + what schema to use for arbitrary or specific new metadata
>
>
> I'd go for JSON. I've suggested datapackage.json for the "container" but
> could be worth looking at Textus stuff for suggestions on particular fields
> (which cites bibjson though bibjson seems to be down).
>
>>
>> I would prefer to make these choices as a community rather than just me.
>> But that being said, I'm accepting most PR's that come in and keeping
>> track in case we need to migrate anything in the future.
>
>
> Rufus
>
>>
>>
>>
>> P.S. Thanks for opening issues on GITenberg repos!
>>
>>
>> On Wed, Aug 20, 2014 at 5:48 AM, John Levin <john at anterotesis.com> wrote:
>>>
>>> Hello Gitenberg!
>>>
>>>
>>> On 19/08/2014 19:09, Seth Woodworth wrote:
>>>>
>>>> Hello Humanities!
>>>>
>>>> I've been working on a project called GITenberg
>>>> <http://gitenberg.github.io>.
>>>>
>>>>
>>>> The aim is to move Project Gutenberg's books to github.
>>>>
>>>
>>> <snip>
>>>
>>> This is a really interesting project, and one I hope could be adapted for
>>> other large collections of texts.
>>>
>>> Couple of quick questions:
>>> Is there an easy way of searching Gitenberg for sets of texts, e.g. for
>>> all the works by a particular author?
>>> What metadata is there for each work, and what provision is there for
>>> adding to it?
>>>
>>> Best,
>>>
>>> John
>>>
>>> --
>>> John Levin
>>> http://www.anterotesis.com
>>> http://twitter.com/anterotesis
>>>
>>> _______________________________________________
>>> open-humanities mailing list
>>> open-humanities at lists.okfn.org
>>> https://lists.okfn.org/mailman/listinfo/open-humanities
>>> Unsubscribe: https://lists.okfn.org/mailman/options/open-humanities
>>
>>
>>
>> _______________________________________________
>> open-humanities mailing list
>> open-humanities at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/open-humanities
>> Unsubscribe: https://lists.okfn.org/mailman/options/open-humanities
>>
>
>
>
> --
>
> Rufus Pollock
>
> Founder and President | skype: rufuspollock | @rufuspollock
>
> Open Knowledge - see how data can change the world
>
> http://okfn.org/ | @okfn | Open Knowledge on Facebook |  Blog
>
> The Open Knowledge Foundation is a not-for-profit organisation.  It is
> incorporated in England & Wales as a company limited by guarantee, with
> company number 05133759.  VAT Registration № GB 984404989. Registered office
> address: Open Knowledge Foundation, St John’s Innovation Centre, Cowley
> Road, Cambridge, CB4 0WS, UK.
>
>
> _______________________________________________
> open-humanities mailing list
> open-humanities at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-humanities
> Unsubscribe: https://lists.okfn.org/mailman/options/open-humanities
>



More information about the open-humanities mailing list