[open-humanities] GITenberg update!

Rufus Pollock rufus.pollock at okfn.org
Mon Feb 23 21:06:21 UTC 2015

cc'ing open humanities list

On 23 February 2015 at 19:58, Seth Woodworth <seth at sethish.com> wrote:

> On Mon, Feb 16, 2015 at 1:22 PM, Rufus Pollock <rufus.pollock at okfn.org>
> wrote:
>> ...
>> Would you consider adopting Data Package style stuff and looking at
>> bibjson for additional metadata? Also Textus for the content could be
>> really good.
> I am looking into Data Package.  It could be very valuable to use dpm+ckan
> to store/fetch book resources.  If DPM had an ingestion system for NLTK or
> another natural language processing toolset, it would be a slam dunk.  Does
> datapackage.json double as bibjson?  Would a bibjson file be referred to as
> a resource in a datapackage?

There isn't any specific link between Data Package and CKAN - you can use
Data Package on its own. Data Package is more of a an overall "container"
and then you can extend with specific metadata such as bibjson.

Here I'm not really sure we need to push stuff to CKAN - i think we
probably want flat file storage (in fact, I rather like github - or if
needed s3). In fact DataPackage + github is what we are using for Core
Datasets stuff in Frictionless Data, see:


And see the actual datasets on github here: https://github.com/datasets

I'd then keep your processing pipelines relatively separate from the core
text repository.

I will send Data Package around my group and see what people think.

Great. They are really just a very simple metadata structure that builds
off the best recent packaging structures (e.g. nodejs) and is json

Have you looked at Textus at all? There's a real logic for going the textus
>> root with fixed text books like this.
>> http://okfnlabs.org/textus/
>>> ...
>> HTMLBook <https://github.com/oreillymedia/HTMLBook> is very interesting
>>> as an output format.  We've had designers try to get involved styling PG
>>> books.  With their non-standard html format, it would be impossible to
>>> share css across multiple books.  With HTMLBook, the structure of the html
>>> is relatively fixed, and a css file can apply to multiple HTMLBooks.
>> This sounds like a relation to Textus model. There text and styling get
>> separated and you can then output to multiple formats easily.
> I've looked into Textus and I don't think it fits our usecase.  We're
> looking for a markup that has implementations in multiple languages and is
> more human readable than the Textus format.  I can see how Textus solves a
> lot of problems, and I would be very interested in compiling asciidoc to
> textus if there is a demand.  But much of what we require in terms of
> workflow already exist in the asciidoc universe.

The key point is that in Textus you get to separate markup from your text.
So rather than hardcoding the presentation or structure info into your text
you separate it. This brings a lot of benefits. For example, suppose you
want to introduce pagination information later (e.g. record that this part
of this ebook was this page in the original edition). Normally, you go back
into your asciidoc version and add new markup for that etc. This is not
only painful but it limits you (you can't have different markups) and it
also completely messes with annotation and similar. In addition, you
gradually make your markup language more complex to support all your
different use cases (for example web presentation vs print).

Having gone round this a fair bit - we originally did Shakespeare stuff in
markdown, then latex then ... - I can really recommend thinking about the
Textus approach.

> In terms of discussion with the community on the standardization quesitons
>> I'd recommend open-humanities or okfn-labs list. In terms of pinging people
>> to join the gitenberg list that sounds great - email open-humanities.
>> Thank you, I will send out some messages to those lists this week.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-humanities/attachments/20150223/b50c4cc5/attachment-0002.html>

More information about the open-humanities mailing list