[ckan-discuss] CKAN is slowwwwww

Rufus Pollock rufus.pollock at okfn.org
Sat Oct 2 16:11:05 BST 2010

Dear Christophe,

To follow up David's earlier comments:

* It will probably be *much* more efficient to use the dedicated
resource search api:


This query returned in 390ms :) and immediately tells you there are
169 resources with format 'api/sparql' (note some of these may be the
same url since a resource is associated to a specific package). The
following query:


Gives you the full list of resources with package ids and using those
you can retrieve each package for further analysis.

* API slowness is something we will be looking into (in particular
better cache configuration). That said, you are iterating through
every item in the repository :) With more than 1500 packages at 1s a
package you are looking at around 30m, at 2s a package 1h at 4s a
dataset 2h ... (I note that, on what may be a slow wifi connection,
loading google front page or flickr takes between 1-3s). For this kind
of bulk analysis it may be worth reinstating our daily json dumps of
the entire db.


2010/9/30 David Read <david.read at okfn.org>:
> Christophe,
> Yes it shouldn't be this slow doing 1500 queries. We've suffered
> performance problems in the past 24 hours and this is probably
> related. Having said that, I've opened a ticket to take a proper look
> at this:
> http://knowledgeforge.net/ckan/trac/ticket/667
> This particular problem sounds ilke a job for the 'resource search'
> feature, which achieves what you want in one query, taking under a
> second:
> http://ckan.net/api/search/resource?format=api/sparql
> and you could add &all_fields=1 to get all the package properties to process.
> I'm afraid this is a new feature so has been put into the ckanclient
> yet, but should not be too hard to add in, as package search is almost
> identical. Do write back to the list to let us know how you get on and
> if you want any more help.
> David
> 2010/9/30 Christophe Guéret <cgueret at few.vu.nl>:
>>  Hello!
>> I've made a small script (attached to this mail) using the python CKAN API
>> to browse the content of CKAN in search for SPARQL end points.
>> Everything works fine apart from the fact that this script takes at least 2h
>> to run! I was hoping that it would take no more than a few seconds, or maybe
>> a minute or so. But not hours ;-)
>> Is it normal that CKAN is so slow to browse?
>> Cheers,
>> Christophe
>> --
>> Dr. Christophe Guéret (cgueret at few.vu.nl)
>> http://cgueret.net
>> Postdoc working on SOKS (http://www.few.vu.nl/soks)
>> Knowledge Representation&  Reasoning Group
>> Computational Intelligence Group
>> Department of Computer Science, AI
>> VU University Amsterdam
>> _______________________________________________
>> ckan-discuss mailing list
>> ckan-discuss at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
> _______________________________________________
> ckan-discuss mailing list
> ckan-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-discuss

Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/

More information about the ckan-discuss mailing list