[ckan-dev] CKAN and RDF/SPARQL

Edzard Höfig edzard.hoefig at fokus.fraunhofer.de
Thu May 26 21:53:56 UTC 2011


Hi William, David and Seb,

thanks for the pointers!

We plan to set up triplestores to keep some of the open data in RDF, anyway; so the extra infrastructure would not pose a problem.
Using the queue, in the way that William proposes, seems to be the most straightforward approach: we would then create a worker that receives JSON-formatted entries from the queue and stores them to "our" triplestore by using CKANRDF and/or proprietary tools to do the conversion. I don't know about the exact introduced delay here, but as the queue seems to be fed directly from the system (and not through an external, independent process doing cyclic polling) it should be ok for our purposes.

"Danke!" says
Ed

Am 26.05.2011 um 06:01 schrieb William Waites:

> * [2011-05-26 01:35:29 +0100] David Raznick <kindly at gmail.com> écrit:
> 
> ] We could start representing extras values as xml instead in the database and
> ] just decode to python as json like.  We could add this as a configuration
> ] option at least.  This would be stored procedureable in postgres using
> ] xpath.
> 
> rofl. Yes... I was floating this idea as in, "you could do something
> like that if you were insane" though :)
> 
> ] I personally think we should depreciate json/xml in the extras values fields
> ] altogether.   It should not be too difficult or troublesome. ckan.net for
> ] example is always using the extras field as a string anyway (just looked at
> ] the database), it will not be difficult to sort out this is harvesting code
> ] which is the only user of it I think.
> 
> In general I agree. Apart from saving the original document to
> implement a cheap reversible transformation, which the harvesting
> machinery does so that a ckan can itself be harvested by a CSW client,
> stuffing things like JSON and XML in a blob is a pretty bad idea I
> think but I'd not like to start a debate about the relative merits of
> different database normalisation strategies.
> 
> ] Should this solve the issue?
> 
> There's another issue with the D2R (which I'm now regretting even
> mentioning) which is that because the mapping is from the relational
> schema so it can translate SPARQL -> SQL, if you change the relational
> schema you have to change the mapping. Since the API is more stable
> than the relational schema it is probably less work to use the API.
> 
> The tradeoff here is maintenance burden (keeping the mapping in sync)
> vs. extra infrastructure (running a triplestore along side). That's
> really a judegement call for Ed to make.
> 
> Cheers,
> -w
> 
> -- 
> William Waites                <mailto:ww at styx.org>
> http://river.styx.org/ww/        <sip:ww at styx.org>
> F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45
> 
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev





More information about the ckan-dev mailing list