[ckan-discuss] CKAN is slowwwwww

Rufus Pollock rufus.pollock at okfn.org
Thu Dec 23 13:31:32 GMT 2010


http://ckan.net/api/search/resource?format=api/sparql&limit=200

Rufus

2010/12/23 Christophe Guéret <c.d.m.gueret at vu.nl>:
> Dear Rufus, David, everyone,
>
> I'm again trying to get a list of SPARQL end points from CKAN and I'm trying
> not to do the same mistake again.
> The code attached is supposed to fetch the desired information using the API
> call you indicated but I get only 20 results out of the 175 available :-/
>
> Is there a way to get all the results at once or shall I do several calls
> with different offsets?
>
> Christophe
>
>
> On 10/02/2010 05:11 PM, Rufus Pollock wrote:
>>
>> Dear Christophe,
>>
>> To follow up David's earlier comments:
>>
>> * It will probably be *much* more efficient to use the dedicated
>> resource search api:
>>
>> <http://ckan.net/api/search/resource?format=api/sparql>
>>
>> This query returned in 390ms :) and immediately tells you there are
>> 169 resources with format 'api/sparql' (note some of these may be the
>> same url since a resource is associated to a specific package). The
>> following query:
>>
>>
>> <http://ckan.net/api/search/resource?format=api/sparql&limit=169&all_fields=1>
>>
>> Gives you the full list of resources with package ids and using those
>> you can retrieve each package for further analysis.
>>
>> * API slowness is something we will be looking into (in particular
>> better cache configuration). That said, you are iterating through
>> every item in the repository :) With more than 1500 packages at 1s a
>> package you are looking at around 30m, at 2s a package 1h at 4s a
>> dataset 2h ... (I note that, on what may be a slow wifi connection,
>> loading google front page or flickr takes between 1-3s). For this kind
>> of bulk analysis it may be worth reinstating our daily json dumps of
>> the entire db.
>>
>> Rufus
>>
>> 2010/9/30 David Read<david.read at okfn.org>:
>>>
>>> Christophe,
>>>
>>> Yes it shouldn't be this slow doing 1500 queries. We've suffered
>>> performance problems in the past 24 hours and this is probably
>>> related. Having said that, I've opened a ticket to take a proper look
>>> at this:
>>> http://knowledgeforge.net/ckan/trac/ticket/667
>>>
>>> This particular problem sounds ilke a job for the 'resource search'
>>> feature, which achieves what you want in one query, taking under a
>>> second:
>>> http://ckan.net/api/search/resource?format=api/sparql
>>>
>>> and you could add&all_fields=1 to get all the package properties to
>>> process.
>>>
>>> I'm afraid this is a new feature so has been put into the ckanclient
>>> yet, but should not be too hard to add in, as package search is almost
>>> identical. Do write back to the list to let us know how you get on and
>>> if you want any more help.
>>>
>>> David
>>>
>>> 2010/9/30 Christophe Guéret<cgueret at few.vu.nl>:
>>>>
>>>>  Hello!
>>>>
>>>> I've made a small script (attached to this mail) using the python CKAN
>>>> API
>>>> to browse the content of CKAN in search for SPARQL end points.
>>>> Everything works fine apart from the fact that this script takes at
>>>> least 2h
>>>> to run! I was hoping that it would take no more than a few seconds, or
>>>> maybe
>>>> a minute or so. But not hours ;-)
>>>>
>>>> Is it normal that CKAN is so slow to browse?
>>>>
>>>> Cheers,
>>>> Christophe
>>>>
>>>>
>>>> --
>>>> Dr. Christophe Guéret (cgueret at few.vu.nl)
>>>> http://cgueret.net
>>>> Postdoc working on SOKS (http://www.few.vu.nl/soks)
>>>> Knowledge Representation&    Reasoning Group
>>>> Computational Intelligence Group
>>>> Department of Computer Science, AI
>>>> VU University Amsterdam
>>>>
>>>>
>>>> _______________________________________________
>>>> ckan-discuss mailing list
>>>> ckan-discuss at lists.okfn.org
>>>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>>>>
>>>>
>>> _______________________________________________
>>> ckan-discuss mailing list
>>> ckan-discuss at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>>>
>>
>>
>
>



-- 
Co-Founder, Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/



More information about the ckan-discuss mailing list