[ckan-discuss] CKAN is slowwwwww

Christophe Guéret cgueret at few.vu.nl
Mon Oct 4 12:56:18 BST 2010


  On 10/02/2010 05:11 PM, Rufus Pollock wrote:
> Dear Christophe,
>
> To follow up David's earlier comments:
>
> * It will probably be *much* more efficient to use the dedicated
> resource search api:
>
> <http://ckan.net/api/search/resource?format=api/sparql>
>
> This query returned in 390ms :) and immediately tells you there are
> 169 resources with format 'api/sparql' (note some of these may be the
> same url since a resource is associated to a specific package). The
> following query:
>
> <http://ckan.net/api/search/resource?format=api/sparql&limit=169&all_fields=1>
>
> Gives you the full list of resources with package ids and using those
> you can retrieve each package for further analysis.
That's indeed better. Next time, I will have a closer look at the API
before implementing some naive approach :-P

> * API slowness is something we will be looking into (in particular
> better cache configuration). That said, you are iterating through
> every item in the repository :) With more than 1500 packages at 1s a
> package you are looking at around 30m, at 2s a package 1h at 4s a
> dataset 2h ... (I note that, on what may be a slow wifi connection,
> loading google front page or flickr takes between 1-3s). For this kind
> of bulk analysis it may be worth reinstating our daily json dumps of
> the entire db.
Right, but 4s a package is still a bit low. The daily dump and some
optimisation of the API speed would be nice.

Christophe




> Rufus
>
> 2010/9/30 David Read<david.read at okfn.org>:
>> Christophe,
>>
>> Yes it shouldn't be this slow doing 1500 queries. We've suffered
>> performance problems in the past 24 hours and this is probably
>> related. Having said that, I've opened a ticket to take a proper look
>> at this:
>> http://knowledgeforge.net/ckan/trac/ticket/667
>>
>> This particular problem sounds ilke a job for the 'resource search'
>> feature, which achieves what you want in one query, taking under a
>> second:
>> http://ckan.net/api/search/resource?format=api/sparql
>>
>> and you could add&all_fields=1 to get all the package properties to process.
>>
>> I'm afraid this is a new feature so has been put into the ckanclient
>> yet, but should not be too hard to add in, as package search is almost
>> identical. Do write back to the list to let us know how you get on and
>> if you want any more help.
>>
>> David
>>
>> 2010/9/30 Christophe Guéret<cgueret at few.vu.nl>:
>>>   Hello!
>>>
>>> I've made a small script (attached to this mail) using the python CKAN API
>>> to browse the content of CKAN in search for SPARQL end points.
>>> Everything works fine apart from the fact that this script takes at least 2h
>>> to run! I was hoping that it would take no more than a few seconds, or maybe
>>> a minute or so. But not hours ;-)
>>>
>>> Is it normal that CKAN is so slow to browse?
>>>
>>> Cheers,
>>> Christophe
>>>
>>>
>>> --
>>> Dr. Christophe Guéret (cgueret at few.vu.nl)
>>> http://cgueret.net
>>> Postdoc working on SOKS (http://www.few.vu.nl/soks)
>>> Knowledge Representation&    Reasoning Group
>>> Computational Intelligence Group
>>> Department of Computer Science, AI
>>> VU University Amsterdam
>>>
>>>
>>> _______________________________________________
>>> ckan-discuss mailing list
>>> ckan-discuss at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>>>
>>>
>> _______________________________________________
>> ckan-discuss mailing list
>> ckan-discuss at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>>
>
>


-- 
Dr. Christophe Guéret (cgueret at few.vu.nl)
http://cgueret.net
Postdoc working on SOKS (http://www.few.vu.nl/soks)
Knowledge Representation&  Reasoning Group
Computational Intelligence Group
Department of Computer Science, AI
VU University Amsterdam

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cgueret.vcf
Type: text/x-vcard
Size: 430 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/ckan-discuss/attachments/20101004/28da561a/attachment.vcf>


More information about the ckan-discuss mailing list