[ckan-dev] DataStore seems to grow indefinitely when harvesting

Stefanie Taepke stefanie.taepke at liip.ch
Fri Jan 27 12:43:48 UTC 2017


Hey all!

I would like to discuss how DataPusher and DataStore works for you in Production and I hope this is the right place for that.

Right now we have set it up for our Testing-Environment. So every time we harvest, the DataPusher is triggered it loads everything it can to the DataStore. There I could see that every time we harvest, the datastore_default-database grows, even though the original data did not change (as much). Each Harvesting for one Harvest-Job added 2GB to the DataStore. I assume that this is a result of the re-harvested resources every time something on a dataset changes.

If I am not mistaken, there is no logic whatsoever on when data is deleted from the DataStore? Partly, this is great, as this means, the endpoint of the data does not change if I want to create something with the data from the datastore.

I understand, that, as explained here https://github.com/ckan/ckan/issues/3268 <https://github.com/ckan/ckan/issues/3268>, that it is hard to retrieve if there have been changes to the data itself. Sure, we can implement it.

What I am wondering is, how are you dealing with this if you use this in Production? Does your DataBase grow indefinitely or am I missing something trivial? Is there something like a cleanup-task, that we can run (or implement) and are there any plans yet on how to tackle this if you have similar problems?


Cheers and thank you for your thoughts and input,
Stef
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20170127/5bd5b50f/attachment-0002.html>


More information about the ckan-dev mailing list