[humanities-dev] TEXTUS import questions

David Chiles dwalterc at gmail.com
Fri Jun 15 18:48:57 UTC 2012


Hi

I've been having a little trouble importing the json files I've created
which folllow json_import_format spec. What is working is I'm able through
the web interface upload a plain text file and then read and annotate it,
that all works fine. When I try using the importData function (from
dataStore-elastic.js) on a JSON file I don't get any errors back and I do
get an id returned. The problem is then when I go to look at the text and
annotations in the web interface there are no texts listed.

I'm guessing there's something simple I'm missing. Any help would be
appreciated.

Thanks
David

On Mon, Jun 4, 2012 at 2:33 PM, David Chiles <dwalterc at gmail.com> wrote:

>  Thank you for the help this is exactly the help needed.
>
> On Thursday, May 24, 2012 at 2:21 PM, Tom Oinn wrote:
>
> Hi David
>
> On 24 May 2012 22:10, David Chiles <dwalterc at gmail.com> wrote:
>
> Hi,
>
> I'm new to this list and to TEXTUS as well. I'm working with a large
> collection of works and annotations that I want to import into a local
> TEXTUS instance.
>
>
> Excellent - what's the corpus?
>
> Public domain works originating from Project Gutenberg (Dickens, Austen,
> Milton and many more)
>
>
> Right now each work is split up into multiple HTML documents by chapter,
> section, book,  … and each one has normal HTML markup. The annotations are
> all, in the TEXTUS terms, "textus:comment". Currently the location of the
> annotation is stored as an xPath and character offset for the start and
> end.
> As well, the original quoted text is known.
>
>
> I've looked over the json_import_format from the github page and from what
> I
> gather all the HTML tags would have to be stripped from the documents and
> put into typography. Then for the annotations all the character offsets
> would need to be converted into overall offset for the entire document.
>
>
> That's correct. I was chatting with Nick Stenning earlier today about
> getting together one day at the OpenBiblio hack meet in a couple of
> weeks to implement exactly this and get the OpenShakespeare data in as
> a test.
>
> Also I wasn't clear on how the import file was actually imported once the
> json file was created.
>
>
> Quite - at the moment the set of command line tools is rather random
> and mostly formed of what I was finding useful when testing!
>
> If you look at
> https://github.com/tomoinn/textus/blob/master/src/tools/import-wikisource.js
> you'll see the fairly simple code which imports the data as a new
> document through the datasource implementation (at the moment this
> assumes a default configuration ElasticSearch database running on the
> local machine). Ignore the 'createDummyAnnotations' function, the
> other function shows a couple of things though.
>
> Firstly at the moment the Textus interface uses the top level
> structure nodes to show texts in the 'show all texts' view, which
> means that, if your import actually consists of multiple texts, you
> can create multiple level 0 nodes, and the description and name
> properties are used as one might expect.
>
> Secondly all you actually need to do, having acquired a datastore
> object, is use the importData function on it, passing in the data
> structure containing your text, annotations of both kinds and
> structure nodes along with a function which will be called on success
> or failure.
>
> Good to have the interest, can you tell us a bit more about your
> project though? Hopefully there'll be some re-use possible if you're
> writing import logic!
>
> Tom
>
> --
> Tom Oinn
> +44 (0) 20 8123 5142 or Skype ID 'tomoinn'
> http://www.crypticsquid.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/humanities-dev/attachments/20120615/4586c4f9/attachment.html>


More information about the humanities-dev mailing list