[ckan-dev] Only specific field in a search result.

Alice Heaton a.heaton at nhm.ac.uk
Thu Jan 8 09:32:32 UTC 2015


Hi,

Unfortunately not. CKAN datastore is quite slow on large datasets.

Is this your own CKAN server? If so you can try a few things:

- Make sure your datastore database is vacuumed;
- Use PostgreSQL 9.3 (or even 9.4) which will give better performance;
- Look into indexes if you are doing searches.

But this only goes so far. Annoyingly the main thing that causes the CKAN datastore queries to be so slow is
the COUNT(*) included in every query. This is down to the way PostgreSQL works, and has been improved 
in PostgreSQL 9.2 (I think - I can confirm it's better on 9.3!).

Even with PostgreSQL 9.3 though the COUNT(*) has a significant costs. This is annoying because in most cases a
COUNT(*) is actually not needed. At most it is needed once - not for every page requests, and in most cases
you don't need it at all, you just loop requesting pages until there's no more data.

We have a patch somewhere that allows CKAN plugin to remove the COUNT(*) - we use this for internal queries
that use the datastore API, and it makes a significant performance improvement. I can send you the
patch (we'll submit it to CKAN when I get a minute!). With the patch you will need to write your own plugin implementing
the IDatastore interface to decide when a COUNT(*) should or shouldn't be included.

Best Wishes,
Alice


________________________________________
From: ckan-dev [ckan-dev-bounces at lists.okfn.org] on behalf of BLANC Antoine [blanc at kdl.co.jp]
Sent: 08 January 2015 08:27
To: ckan-dev at lists.okfn.org
Subject: [ckan-dev] Only specific field in a search result.

Hello everyone,

I'm developing an web application that use the CKAN API. I make some
search through the API to get datasets corresponding to my query. And it
works perfectly.
But I need only a few fields from each dataset.
So I filter the result in a kind of middleware to get only the fields
that  I need to send to my JavaScript part.
This CKAN is very large (1.8 million datasets), the result can be very
slow. So I am trying to remove some treatments and reduce the network
traffic.

Is it possible get only those fields directly from the API ?

And of course, my web application isn't the only application that use
this CKAN, so I can't modify deeply its configuration

Thanks in advance for your help

Best regards

Antoine.

--
____________________________________________
アントワーヌ ブラン [Antoine BLANC]
株式会社 神戸デジタル・ラボ
〒650-0033 神戸市中央区江戸町93番 栄光ビル5F
TEL:078-327-2280
FAX:078-327-2278
@ : blanc at kdl.co.jp
____________________________________________

_______________________________________________
ckan-dev mailing list
ckan-dev at lists.okfn.org
https://lists.okfn.org/mailman/listinfo/ckan-dev
Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev



More information about the ckan-dev mailing list