[ckan-dev] [#2963] Timeout on tag pages with lots of datasets

Ian Ward ian at excess.org
Mon Jan 14 13:56:18 UTC 2013


It looks like there are 887 datasets on datahub.io tagged "lod", and that's
enough to make it impossible to get those datasets from the web or via the
API. Is that correct?

I'm asking because I will soon be importing datasets where thousands will
be tagged the same way.

Ian
On Jan 11, 2013 7:12 PM, "Vitor Baptista" <vitor at vitorbaptista.com> wrote:

> Hi,
>
> I was looking into this ticket today. Basically, if a tag has many
> datasets (i.e. http://thedatahub.org/tag/lod), it times out. Looking into
> the code, I found a few of n+1s (log attached), but could only fix one so
> far (
> https://github.com/vitorbaptista/ckan/commit/424ac7703fb58202260e4a8cb7ee31cd01a3962d
> ).
>
> The main culprit for these queries (~13 per package) is
> model_dictize.package_dictize. I've tried to refactor it a bit, so I could
> start optimizing, but I had a few problems. The main one is that the
> relationships are created manually (i.e. Tag.packages), so it's not
> possible to eager load the relationships without changing every method. Any
> specific reason it was done like this, instead of using SQLAlchemy?
>
> Thanks,
> Vítor.
>
> P.S.: Next time, should I comment here, on trac, or on a github issue?
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20130114/fc5c8b0b/attachment-0001.html>


More information about the ckan-dev mailing list