[ckan-dev] Storing/searching/displaying XML resources

Tue May 8 03:10:01 UTC 2012

On 5/7/12 7:01 PM, "Rufus Pollock" <rufus.pollock at okfn.org> wrote:

>On 7 May 2012 17:17, Haq, Salman <Salman.Haq at neustar.biz> wrote:
>> I have a special use case where I want to store XML resources. For each
>>such
>> resource, I want to display a custom view that allows the user to
>>add/edit
>> the data in the resource. This is very similar to what
>>ckanext-datastorer
>> does except in my use case, the resource has a specialized view.
>>
>> Would the community recommend that I enhance ckanext-datastorer or use
>>it as
>> a template for a new custom extension? I am leaning towards the latter.
>
>I think the latter may be easier -- though medium-term we may want to
>find a way where one can plug in specialist importers to the
>ckanext-datastorer depending on the incoming type of data (and perhaps
>some other info).

Yes, I think a way for plugins (I use that term loosely) to register with
ckanext-datastorer for specific file types would be a good way to go.

>
>> Also, how does ckanext-datastorer store the parsed data? It doesn't
>>appear
>> to have any special models for storing tabular data in the main
>>postrgres
>> db. Does it rely primarily on ElasticSearch as the backing store? Does
>>this
>
>Yes, it uses the CKAN DataStore backed by ElasticSearch rather than
>Postgres.

Just out of curiosity, are there any ckanext's that have their own data
models?

>
>> mean that I will have to convert my XML documents into JSON documents
>>and
>> then store them via the data API?
>
>That would be the natural approach if it were possible.
>
>> Also, from the docs and source code, I still can't figure out what
>> ckanext-archiver does and how it relates to ckanext-datastorer. They
>>both
>> seem to share some common code.
>
>Archiver archives resources: i.e. it looks for resources with remote
>urls and stores a copy of that data into the FileStore (i.e. it
>*archives* it). The DataStorer instead processes the data and puts it
>in the DataStore.

Makes sense now.

>
>> To elaborate more on my use case, the XML document actually represents
>> metadata about a database (eg: tables, columns, keys, row counts, etc).
>>One
>> way to think of the extension is as a 'metadatastorer'. The resources
>>could
>> be in XML format, or in the future, additional formats may be supported
>>for
>> different types of stores (eg: NoSQL dbs, etc)
>
>Understood. I note we've also been thinking quite a bit about how to
>specify metadata for datasets. In the simplest case we use the mapping
>metadata in ElasticSearch to store info about fields (type, format
>etc). We're also thinking about using JSON-LD contexts more heavily
>for this purpose (see [1])

That would be good. I guess a 'resource' will become a tuple of 'metadata'
and 'data'.

What are your thoughts about 'Single Point Of Truth' [2]?

It seems a resource could have multiple representations as a file, a json
object in ES, as a graph in some triple store, etc. Borrowing from DVCS,
these related but separate representations resemble branches. Do people
have thoughts about how this would be handled in the API and the UI?

Salman

[2]: 
http://teddziuba.com/2011/06/most-important-concept-systems-design.html

>
>Rufus
>
>[1]: http://lists.okfn.org/pipermail/ckan-discuss/2012-May/002186.html
>
>>
>> Thanks,
>> Salman
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>>
>
>
>
>-- 
>Co-Founder, Open Knowledge Foundation
>Promoting Open Knowledge in a Digital Age
>http://www.okfn.org/ - http://blog.okfn.org/
>
>_______________________________________________
>ckan-dev mailing list
>ckan-dev at lists.okfn.org
>http://lists.okfn.org/mailman/listinfo/ckan-dev