[ckan-dev] Only specific field in a search result.

Alice Heaton a.heaton at nhm.ac.uk
Thu Jan 22 10:28:17 UTC 2015


We tried this approach but in our experience, even after a vacuum 
analyze, this can be quite off the mark on a large dataset. It is best 
to get rid of the count, because in most instances it is not needed 
(typically algorithms will loop fetching pages until there is no more 
data returned). I will submit this as a suggestion on Github as soon as 
I get a minute :-)

On 22/01/15 03:11, Alex (Maxious) Sadleir wrote:
> Just so I don't lose it; this is a patch to make the ckan datastore
> COUNT(*) approximate and only recalculated by ANALYZE; you could also
> patch to do this yourself when the data is updated but it's done by
> autovacuum anyway.
> I got this from https://wiki.postgresql.org/wiki/Slow_Counting
>
> diff --git a/ckanext/datastore/db.py b/ckanext/datastore/db.py
> index bec642c..ac6175c 100644
> --- a/ckanext/datastore/db.py
> +++ b/ckanext/datastore/db.py
> @@ -920,8 +920,8 @@ def search_data(context, data_dict):
>
>       sort = _sort(context, data_dict, field_ids)
>
> -    sql_string = u'''SELECT {select}, count(*) over() AS "_full_count" {rank}
> -                    FROM "{resource}" {ts_query}
> +    sql_string = u'''SELECT {select},c.reltuples  AS "_full_count" {rank}
> +                    FROM "{resource}",(SELECT reltuples FROM pg_class
> WHERE relname = '{resource}') as c {ts_query}
>                       {where} {sort} LIMIT {limit} OFFSET {offset}'''.format(
>           select=select_columns,
>           rank=rank_column,
>
> On Thu, Jan 8, 2015 at 8:32 PM, Alice Heaton <a.heaton at nhm.ac.uk> wrote:
>> Hi,
>>
>> Unfortunately not. CKAN datastore is quite slow on large datasets.
>>
>> Is this your own CKAN server? If so you can try a few things:
>>
>> - Make sure your datastore database is vacuumed;
>> - Use PostgreSQL 9.3 (or even 9.4) which will give better performance;
>> - Look into indexes if you are doing searches.
>>
>> But this only goes so far. Annoyingly the main thing that causes the CKAN datastore queries to be so slow is
>> the COUNT(*) included in every query. This is down to the way PostgreSQL works, and has been improved
>> in PostgreSQL 9.2 (I think - I can confirm it's better on 9.3!).
>>
>> Even with PostgreSQL 9.3 though the COUNT(*) has a significant costs. This is annoying because in most cases a
>> COUNT(*) is actually not needed. At most it is needed once - not for every page requests, and in most cases
>> you don't need it at all, you just loop requesting pages until there's no more data.
>>
>> We have a patch somewhere that allows CKAN plugin to remove the COUNT(*) - we use this for internal queries
>> that use the datastore API, and it makes a significant performance improvement. I can send you the
>> patch (we'll submit it to CKAN when I get a minute!). With the patch you will need to write your own plugin implementing
>> the IDatastore interface to decide when a COUNT(*) should or shouldn't be included.
>>
>> Best Wishes,
>> Alice
>>
>>
>> ________________________________________
>> From: ckan-dev [ckan-dev-bounces at lists.okfn.org] on behalf of BLANC Antoine [blanc at kdl.co.jp]
>> Sent: 08 January 2015 08:27
>> To: ckan-dev at lists.okfn.org
>> Subject: [ckan-dev] Only specific field in a search result.
>>
>> Hello everyone,
>>
>> I'm developing an web application that use the CKAN API. I make some
>> search through the API to get datasets corresponding to my query. And it
>> works perfectly.
>> But I need only a few fields from each dataset.
>> So I filter the result in a kind of middleware to get only the fields
>> that  I need to send to my JavaScript part.
>> This CKAN is very large (1.8 million datasets), the result can be very
>> slow. So I am trying to remove some treatments and reduce the network
>> traffic.
>>
>> Is it possible get only those fields directly from the API ?
>>
>> And of course, my web application isn't the only application that use
>> this CKAN, so I can't modify deeply its configuration
>>
>> Thanks in advance for your help
>>
>> Best regards
>>
>> Antoine.
>>
>> --
>> ____________________________________________
>> アントワーヌ ブラン [Antoine BLANC]
>> 株式会社 神戸デジタル・ラボ
>> 〒650-0033 神戸市中央区江戸町93番 栄光ビル5F
>> TEL:078-327-2280
>> FAX:078-327-2278
>> @ : blanc at kdl.co.jp
>> ____________________________________________
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev




More information about the ckan-dev mailing list