[ckan-discuss] Data Registry Aggregator Experiment

Wed Mar 30 10:42:30 BST 2011

* [2011-03-30 10:06:45 +0100] David Read <david.read at okfn.org> écrit:

] Great to see another way of getting and serving package RDF. I agree
] that using Go or other lower level language suits this particular use
] case when optimising for speed.

Not sure I agree with Go as a lower level language. Having now written
something "real" with it, I would put it effectively in the higher
level language category like Python or Ruby, only that there aren't
the zillions of libraries.

] Are your customers asking you to put the aggregated RDF in a store and
] providing a sparql service?

But the users of the RDF will often like it to be in a SPARQL store as
well. This is easily accomplished just by telling ckand to produce a
dump (SIGUSR1) and then importing that dump into the store.

] How does this fit with the rest of the team's existing efforts in CKAN
] data aggregation? I'm thinking of the dcat stuff, repo syncing and
] aggregated search?

Now this is an interesting question. It obviously occupies some of the
same space as the first two and though it doesn't have any particular
search facilities, it would provide a logical place from which an
index could be built.

With respect to repo synching, at the moment it is just a read-only
aggregator. If write functions were added in, any record that is
written to would still have a "home" server and would get written
there. Everywhere else would just have a read only copy. So if the
network is such things is complicated there is a choice to be made
about eventual consistency versus effectively a cache-poisoning
problem, but we would be guaranteed that on the path between the
client doing the update and the home server all records will be up to
date, so somebody doing a write-read cycle see up-to-date data. This
means we don't have to address the very complex question of multiple
writes in multiple places, but also means that in the event of a
network partition, a write can fail, but a read will still work
serving last-known good data. This all basically comes along for free
and just falls out of the design.

What would need some thought is how to handle authentication.  Do the
user credentials, e.g. the X-CKAN-API headers get passed along
(meaning a user would have to have an account and know a potentially
large number of API keys for each "home" server) or do we permit the
edge closest to the user to be where they have their account and
authenticate them in some way and then use a set of api keys for
server- server use?

] And looking at it from the reverse direction, (since we know data
] communities love their independent identity too) could the RDF
] function be added into a CKAN extension, say?

A small ckan extension that properly handled content-type
autonegotiation would actually be quite useful for this and other
things. It could have, e.g. a small 303 redirect controller and hook
the routes in certain cases. This would work well with ckand or the
existing semantic.ckan.net installation (and former
catalogue.data.gov.uk) which is just a directory tree of flat files.

Cheers,
-w
-- 
William Waites                <mailto:ww at styx.org>
http://river.styx.org/ww/        <sip:ww at styx.org>
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45