[open-humanities] OpenLiterature v1.0 - let's do a reboot!

Rufus Pollock rufus.pollock at okfn.org
Fri Jun 5 17:02:30 UTC 2015

On 5 June 2015 at 10:06, Eric Hellman <eric at hellman.net> wrote:

> Hi Rufus!
> Gitenberg is open source and forkable, we'd welcome any help you can
> contribute. Please join our google group and make suggestions there or
> alternatively, create issues in the gitenberg-dev github repo so the whole
> community can participate in the discussion.

I got that - right now though i'm not offering to do code contribution.
Rather, based on our experience with Open Shakespeare, Open Literature etc
over the last 8-9y I'm making suggestions for a reuse and convergence ...

> Of course it's a compromise to colocate text and markup, but my
> observation has been that separation of text and markup is unnatural for
> most humans; what you end up with in uncontrolled production environments
> is pervasive implicit markup, which is the worst of both worlds. Project
> Gutenberg has played out this conflict for 40 years with Michael Hart
> single-handedly holding back html for many years until he finally relented.
> May he rest in peace.

I got you, though the point is especially with annotation there are huge
tech gains with doing the separation. To be clear, you could take asciidoc
and convert to Textus format. I assume you have read the Textus slide deck
at: http://okfnlabs.org/textus/

> I am curious as to the specifics of your work with Open Shakespeare.
> yaml is a formal superset of json; I suppose that means you could do
> datapackage in yaml. I don't understand the value-add of datapackage
> specifically- which of the many problems that we need to solve will it help
> with?

You could, and you could do it in xml or anything else. But it is done in
json ;-)

The problem Data Package solves for you is converging on a standard
sidecar" format that is customizable and has been thought through and is
being used - rather than inventing ab initio. I am happy to go through the
yaml vs json if you want but the basics is that everything, including
browsers, natively supports JSON whilst that is not true for yaml (and yaml
is ultimately just about as hard to write reliably for non-techies as json


> Eric
> On Jun 5, 2015, at 7:48 AM, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> On 4 June 2015 at 11:11, Seth Woodworth <seth at sethish.com> wrote:
>> The primary output of GITenberg is intended to be epub (but not only).
>> We've (probably) solidified on using Asciidoc as it supports most of the
>> markup types we need, is 12 years old already, and has more than one
>> converter implementation (asciidoc and asciidoctor).
> The point is that all markup forms like that whether asciidoc or TEI or
> HTML or ... have the issue that you embed your markup into the text which I
> would suggest here is a *bad* idea (tm). We went the asciidoc style route
> (in fact more markdown) for original open shakespeare and we ended up at
> textus precisely because of benefits of separating markup and plain text.
> To be clear Textus isn't a new markup format specifically: it is more the
> concept of separating plain text and markup. Most of the markup in Textus
> is just HTML plus some TEI for the "semantic" stuff.
>> The asciidoc abstract syntax tree is fairly parse-able.  I could see
>> creating textus as an output format for GITenberg books.  Say I add three
>> paragraphs to the end of a chapter, is does textus make it easier for me to
>> re-align annotations with the new document offset?
> When do you add 3 paragraphs to end of the chapter of an existing book?
> But no system is generally that great for that realignment. But yes, the
> algorithm for realignment of annotations in textus would be pretty
> straightforward in that case
>> I like datapackages a great deal.  But I'm not very familiar with the
>> ecosystem of CKAN.  I've looked into adding GITenberg as a package to the
> Let me *repeat* ;-0 - data packages have *nothing* to do with CKAN. As
> explained earlier in the thread I'd be using data packages here plus git or
> s3 for storage - not suggesting using CKAN :-)
> Data Package is a *really* simple standard for the metadata wrapper around
> your data. It sounds like exactly what you are creating here.
>> python library NLTK.  Is CKAN a good place to host text as data?  Eric
>> Hellman of GITenberg (CC'd) is working on a yaml metadata specification
>> that we can map the data to OPDS feeds and MARC records for libraries.  Our
>> argument for yaml was it would be, in theory, easier for librarians to edit
>> by hand.  Would the dpm tool offer us something we are missing?
> I ultimately don't think there is much between yaml and json for editing
> (both will be a little odd). I'd therefore really suggest taking a look at
> extending DataPackage.json for your needs. It would seem a natural fit and
> you can add any fields you need.
> What you get here with Data Package is a) you get a spec that's been
> worked on for a while b) existing tooling (though some of this may be less
> oriented as your payload is just "blobs" of text ;-0 ...)
> Rufus


*Rufus PollockFounder and President | skype: rufuspollock | @rufuspollock
<https://twitter.com/rufuspollock>Open Knowledge <http://okfn.org/> - see
how data can change the world**http://okfn.org/ <http://okfn.org/> | @okfn
<http://twitter.com/OKFN> | Open Knowledge on Facebook
<https://www.facebook.com/OKFNetwork> |  Blog <http://blog.okfn.org/>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-humanities/attachments/20150605/237bc2ac/attachment-0002.html>

More information about the open-humanities mailing list