[ckan-discuss] CKAN is slowwwwww

Rufus Pollock rufus.pollock at okfn.org
Sat Oct 2 16:11:05 BST 2010


Dear Christophe,

To follow up David's earlier comments:

* It will probably be *much* more efficient to use the dedicated
resource search api:

<http://ckan.net/api/search/resource?format=api/sparql>

This query returned in 390ms :) and immediately tells you there are
169 resources with format 'api/sparql' (note some of these may be the
same url since a resource is associated to a specific package). The
following query:

<http://ckan.net/api/search/resource?format=api/sparql&limit=169&all_fields=1>

Gives you the full list of resources with package ids and using those
you can retrieve each package for further analysis.

* API slowness is something we will be looking into (in particular
better cache configuration). That said, you are iterating through
every item in the repository :) With more than 1500 packages at 1s a
package you are looking at around 30m, at 2s a package 1h at 4s a
dataset 2h ... (I note that, on what may be a slow wifi connection,
loading google front page or flickr takes between 1-3s). For this kind
of bulk analysis it may be worth reinstating our daily json dumps of
the entire db.

Rufus

2010/9/30 David Read <david.read at okfn.org>:
> Christophe,
>
> Yes it shouldn't be this slow doing 1500 queries. We've suffered
> performance problems in the past 24 hours and this is probably
> related. Having said that, I've opened a ticket to take a proper look
> at this:
> http://knowledgeforge.net/ckan/trac/ticket/667
>
> This particular problem sounds ilke a job for the 'resource search'
> feature, which achieves what you want in one query, taking under a
> second:
> http://ckan.net/api/search/resource?format=api/sparql
>
> and you could add &all_fields=1 to get all the package properties to process.
>
> I'm afraid this is a new feature so has been put into the ckanclient
> yet, but should not be too hard to add in, as package search is almost
> identical. Do write back to the list to let us know how you get on and
> if you want any more help.
>
> David
>
> 2010/9/30 Christophe Guéret <cgueret at few.vu.nl>:
>>  Hello!
>>
>> I've made a small script (attached to this mail) using the python CKAN API
>> to browse the content of CKAN in search for SPARQL end points.
>> Everything works fine apart from the fact that this script takes at least 2h
>> to run! I was hoping that it would take no more than a few seconds, or maybe
>> a minute or so. But not hours ;-)
>>
>> Is it normal that CKAN is so slow to browse?
>>
>> Cheers,
>> Christophe
>>
>>
>> --
>> Dr. Christophe Guéret (cgueret at few.vu.nl)
>> http://cgueret.net
>> Postdoc working on SOKS (http://www.few.vu.nl/soks)
>> Knowledge Representation&  Reasoning Group
>> Computational Intelligence Group
>> Department of Computer Science, AI
>> VU University Amsterdam
>>
>>
>> _______________________________________________
>> ckan-discuss mailing list
>> ckan-discuss at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>>
>>
>
> _______________________________________________
> ckan-discuss mailing list
> ckan-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>



-- 
Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/



More information about the ckan-discuss mailing list