[ckan-dev] ckan api: remove ES stuff?

Rufus Pollock rufus.pollock at okfn.org
Fri Apr 27 14:13:06 UTC 2012


On 27 April 2012 11:20, James Casbon <casbon at gmail.com> wrote:
> On 26 April 2012 23:38, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>>
>> On 26 April 2012 16:52, James Casbon <casbon at gmail.com> wrote:
>> > Hi All,
>> >
>> > First of all thanks for fixing the CORS requests issues that were preventing
>> > me using the data api in a JS app.  But... is it possible to remove (what
>> > looks like) the search meta information from the results.
>>
> snip
>>
>> simple answer is: no :-)
>>
>> We intentionally return exactly what ElasticSearch gives us so that
>> you can a) use libraries designed for ES with CKAN Data API b) you can
>> switch seamlessly between CKAN data API and ES (if you want to work
>> locally).
>>
>> I know it is slightly annoying but it's a tiny amount of data and you
>> can just do data.hits to get the stuff immediately.
>
>
> I think this is wrong because:
> 1. I don't care about ES, I wanted to use CKANs API and I normally
> expect things to be simple where they can be (i.e. RESTish).

I don't see response format has a lot to do wish REST-ish-ness. I'd
also point that meta information can be relevant (e.g. if something
breaks).

> 2. If I can just return hits, then you can.  One place (on the server)
> versus all the places (on the clients).

Already got a client that wraps it for you :-)

> 3. You are committing yourself to maintain the ES metadata if you drop
> ES as a backend, otherwise you will break the clients (this is a bad
> design decision IMO)

Right now we don't do (and don't have to do) any processing of the
stuff from ES -- it gets passed straight to clients (after
authorization is done on the way in). Thus from a technical point of
view there is nothing we can do about the output. We could change this
but it would entail both immediate dev cost and significant
performance cost (we now process info twice). Again this makes this
unlikely to happen.

I also don't take point about maintaining the metadata. The metadata
could be ignored. What we are committed to maintaining is the
structure.

I've looked a lot at output formats for JSON data structures over the
last year and feel that ES do a pretty good job -- you can always
start simple but eventually you end up with a need to support error
info, other info (like facets), count/total info etc. I think that ES
compares favourably with SOLR and other formats I've seen. Having to
do data.hits.hits or data.hits.total does not seem to be a major issue
and any decent results format will require *at* least one nesting (in
order to have error and count info shown).

> Also, I would really like it to be able to just get the resource
> without any limits.  This would work for most small datasets.

No sensible person will implement this by default right :-) It would
be too easy for someone to request 300k rows (though that could get
caught by timeouts). If you think the data is so small just add
?size=1000 to the query ...

> ie http://thedatahub.org/api/data/b9aae52b-b082-4159-b46f-7bb9c158d013/
> should not return 'No handler found for uri
> [/ckan-www.ckan.net/b9aae52b-b082-4159-b46f-7bb9c158d013/] and method
> [GET]\

This was already discussed before right :-) -- see
<https://github.com/elasticsearch/elasticsearch/issues/1826> -- once
we upgrade to ES 0.19 this will be resolved. In the meantime add ?size
to the _search query :-)

> It should return the JSON representation of the resource or a paging
> link if the resource is too large
> http://stackoverflow.com/questions/924472/paging-in-a-rest-collection

Again we're going to be sticking with ES results unless there is an
incredibly compelling reason to change. Right now you can always go
paging pretty simply manually: do a query to get counts and then to
incremental requests (I actually have been thinking about doing this
internally as an optimizatoin for big queries so you have a more
responsive UX). I'd even implement in Recline for you if you want :-)

To end: this feedback is *really* appreciated -- even though I'm not
agreeing on specific points. Please keep it coming :-)

Rufus

PS: are you around online tomorrow in afternoon UK time? Would like to
chat on skype or irc re this stuff and notebook + recline if you have
a spare 30m ...




More information about the ckan-dev mailing list