[ECODP-dev] estat ingestion problem

John Glover john.glover at okfn.org
Thu Jun 27 07:23:34 UTC 2013


Hi Bert,

The results of David's investigation into this matter are given below.

Regards,
John

----------------

Investigation into why imports from rdf2ckan have failed.

We have done lots of investigation into why some data imports from rdf2ckan
have failed. This includes

* Looking through the logs.
* Looking at the database dump supplied for performance issues and basic
integrity checks.
* Testing importing the datasets repeatedly.
* Profiling the code to see what is taking long.
* Experimenting to see if we can reproduce the issue.

Sadly we have not been able to reproduce the issue when doing tests on our
servers, using the database dump supplied to us, even after repeated
imports.  There is also nothing in the log.

What we do know is that if any dataset took over 1 minute to load then the
request could time out causing the issue.
In our tests with and a copy of the production database no dataset has
longer than 9 seconds to import.  Tenforce also ran tests and applied a
simulated load to the server at the same time and managed to push the time
for a single dataset over the 1 minute mark.

To get round this we suggested doing the dataloading to Apache directly,
which means that this 1 minute timeout would no longer apply.  After doing
this we where also not able to reproduce this issue even applying simulated
load at the same time.

Further suggestions:

We hope that this issue does not occur again but if it does it would be
good to make the logs of the rdf2ckan loader clearer so we could see the
individual rdf files and json files that failed, to see if this is a
regular occurrence for certain datasets and to see if their are any
patterns .  It may also be good also just to skip the ones that fail and
not stop the process entirely as these will get retried again when the next
eurostat import happens (in half a day).


On 27 June 2013 00:21, Bert Van Nuffelen <bert.van.nuffelen at tenforce.com>wrote:

> Hi David,
>
> do you have any news on the estat ingestion problem. Why it happened
> that EU ODP ckan processes died under heavy load?
> We need this for the release notes.
>
> Bert
>
>
> --
> Bert Van Nuffelen
>
> Semantic Technologies Software Architect at TenForce
> www.tenforce.be
>
> Bert.Van.Nuffelen at tenforce.com
> Office: +32 (0)16 31 48 60
> Mobile:+32 479 06 24 26
> skype: bert.van.nuffelen
>
> _______________________________________________
> Ecodp-dev mailing list
> Ecodp-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ecodp-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20130627/d4149ef1/attachment.html>


More information about the ecodp-dev mailing list