[ckan-dev] DataStore seems to grow indefinitely when harvesting

Steven De Costa steven.decosta at linkdigital.com.au
Wed Feb 1 22:42:01 UTC 2017


We will sometimes need to audit tables after unexpected harvesting
behaviour has deleted resources. It isn't a typical use case though.

I'd been thinking it would be a nice feature on the UI for platform
custodians to have a view of table sizes so they can see which ones might
be a burden over time.

Row count, column count and update frequency create the potential product
of a large table over time. Purging or cleaning of such tables could be set
by global or local rules, or executed manually via the front end.

*STEVEN DE COSTA *|
*EXECUTIVE DIRECTOR*www.linkdigital.com.au



On 2 February 2017 at 08:03, Stefan Oderbolz <stefan.oderbolz at liip.ch>
wrote:

> In the meantime I created the issue (https://github.com/ckan/ckan/
> issues/3422) to tackle the deletion of DataStore tables, when resources
> are deleted. But looking at the DataStore and especially the state of the
> DataPusher led me to believe, that no one really uses these components in
> production at the moment. Or am I wrong?
>
> Also the performance problems we encounter when harvesting ~1500 datasets
> are a sign, that this has not yet been tested with real load (and I mean:
> compared to others, this is nothing). We are currently investigating on
> different sides to see why we have these problems, but I'm curious to hear
> from others. We currently suspect the DataPusher (which runs into
> timeouts). But maybe we even have problems on the PostgreSQL end and need
> to spend some to improve our database server setup.
>
> In case this all comes over as very negative, please don't get me wrong:
> we are happy to contribute and will definitely give back our fixes and
> share our experience!
>
> Best regards Stefan
>
> On Fri, Jan 27, 2017 at 1:43 PM, Stefanie Taepke <stefanie.taepke at liip.ch>
> wrote:
>
>> Hey all!
>>
>> I would like to discuss how DataPusher and DataStore works for you in
>> Production and I hope this is the right place for that.
>>
>> Right now we have set it up for our Testing-Environment. So every time we
>> harvest, the DataPusher is triggered it loads everything it can to the
>> DataStore. There I could see that every time we harvest, the
>> datastore_default-database grows, even though the original data did not
>> change (as much). Each Harvesting for one Harvest-Job added 2GB to the
>> DataStore. I assume that this is a result of the re-harvested resources
>> every time something on a dataset changes.
>>
>> If I am not mistaken, there is no logic whatsoever on when data is
>> deleted from the DataStore? Partly, this is great, as this means, the
>> endpoint of the data does not change if I want to create something with the
>> data from the datastore.
>>
>> I understand, that, as explained here https://github.com/ckan/c
>> kan/issues/3268, that it is hard to retrieve if there have been changes
>> to the data itself. Sure, we can implement it.
>>
>> What I am wondering is, how are you dealing with this if you use this in
>> Production? Does your DataBase grow indefinitely or am I missing something
>> trivial? Is there something like a cleanup-task, that we can run (or
>> implement) and are there any plans yet on how to tackle this if you have
>> similar problems?
>>
>>
>> Cheers and thank you for your thoughts and input,
>> Stef
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>
>>
>
>
> --
> Liip AG  // Limmatstrasse 183 //  CH-8005 Zürich
> Tel +41 43 500 39 80 <+41%2043%20500%2039%2080> // GnuPG 0x7B588C67 //
> www.liip.ch
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20170202/73512472/attachment-0003.html>


More information about the ckan-dev mailing list