[open-humanities] OpenLiterature v1.0 - let's do a reboot!

Eric Hellman eric at hellman.net
Fri Jun 5 14:06:51 UTC 2015


Hi Rufus!

Gitenberg is open source and forkable, we'd welcome any help you can contribute. Please join our google group and make suggestions there or alternatively, create issues in the gitenberg-dev github repo so the whole community can participate in the discussion.

Of course it's a compromise to colocate text and markup, but my observation has been that separation of text and markup is unnatural for most humans; what you end up with in uncontrolled production environments is pervasive implicit markup, which is the worst of both worlds. Project Gutenberg has played out this conflict for 40 years with Michael Hart single-handedly holding back html for many years until he finally relented. May he rest in peace.

I am curious as to the specifics of your work with Open Shakespeare. 

yaml is a formal superset of json; I suppose that means you could do datapackage in yaml. I don't understand the value-add of datapackage specifically- which of the many problems that we need to solve will it help with?

Eric

> On Jun 5, 2015, at 7:48 AM, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> 
> On 4 June 2015 at 11:11, Seth Woodworth <seth at sethish.com <mailto:seth at sethish.com>> wrote:
> The primary output of GITenberg is intended to be epub (but not only).  We've (probably) solidified on using Asciidoc as it supports most of the markup types we need, is 12 years old already, and has more than one converter implementation (asciidoc and asciidoctor).  
> 
> The point is that all markup forms like that whether asciidoc or TEI or HTML or ... have the issue that you embed your markup into the text which I would suggest here is a *bad* idea (tm). We went the asciidoc style route (in fact more markdown) for original open shakespeare and we ended up at textus precisely because of benefits of separating markup and plain text. To be clear Textus isn't a new markup format specifically: it is more the concept of separating plain text and markup. Most of the markup in Textus is just HTML plus some TEI for the "semantic" stuff.
>  
> 
> The asciidoc abstract syntax tree is fairly parse-able.  I could see creating textus as an output format for GITenberg books.  Say I add three paragraphs to the end of a chapter, is does textus make it easier for me to re-align annotations with the new document offset?
> 
> When do you add 3 paragraphs to end of the chapter of an existing book? But no system is generally that great for that realignment. But yes, the algorithm for realignment of annotations in textus would be pretty straightforward in that case
>  
> I like datapackages a great deal.  But I'm not very familiar with the ecosystem of CKAN.  I've looked into adding GITenberg as a package to the 
> 
> Let me *repeat* ;-0 - data packages have *nothing* to do with CKAN. As explained earlier in the thread I'd be using data packages here plus git or s3 for storage - not suggesting using CKAN :-)
> 
> Data Package is a *really* simple standard for the metadata wrapper around your data. It sounds like exactly what you are creating here.
>  
> python library NLTK.  Is CKAN a good place to host text as data?  Eric Hellman of GITenberg (CC'd) is working on a yaml metadata specification that we can map the data to OPDS feeds and MARC records for libraries.  Our argument for yaml was it would be, in theory, easier for librarians to edit by hand.  Would the dpm tool offer us something we are missing?
> 
> I ultimately don't think there is much between yaml and json for editing (both will be a little odd). I'd therefore really suggest taking a look at extending DataPackage.json for your needs. It would seem a natural fit and you can add any fields you need.
> 
> What you get here with Data Package is a) you get a spec that's been worked on for a while b) existing tooling (though some of this may be less oriented as your payload is just "blobs" of text ;-0 ...)
> 
> Rufus
>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-humanities/attachments/20150605/a9d9bd3a/attachment-0002.html>


More information about the open-humanities mailing list