[ckan-dev] Datastore and linked-to resources that no longer exist

Nigel Babu nigel.babu at okfn.org
Thu Apr 18 14:28:50 UTC 2013


Ah, this is also connected to the dataproxy discussions.

Currently, there's no way for CKAN to know that something was attempted to
be loaded into the datastore and it failed. Almost all the failures I've
noticed in production are because of the file, i.e. not a CKAN/datastore
issue.  When a user attempts to preview the resource, there will be nothing
in the datastore for this resource and CKAN attempt to use the dataproxy
and fails.

This gives us two options
1) Remove dataproxy. If a file isn't in the datastore, there's something
wrong with it and it can't be loaded. Offer a download link instead.
2) Have a way to mark that a file was attempted to be loaded into the
datastore and it failed. If the file isn't in the datastore and this
failure is marked, dataproxy should not attempt to preview and only offer a
download link.

The datastore_upload[1] script will ignore the resource if the download of
the resource fails. That means if there was an existing entry in the
datastore, it will continue to remain in the datastore. If a file was
updated, the datastore will be updated on the next run.  The delete
behaviour may not be entirely appropriate and should probably discussed
further.

[1]
https://github.com/okfn/ckanext-datastorer/blob/master/ckanext/datastorer/commands.py#L193

Nigel.



On 18 April 2013 19:07, Sean Hammond <sean.hammond at okfn.org> wrote:

> > Take a look at this dataset on publicdata.eu:
> >
> > http://publicdata.eu/dataset/ministerial-data-cabinet-office
> >
> > If you click on any of the resources you'll get an error:
> >
> > Could not load preview: DataProxy returned an error (Data transformation
> > failed. HTTPError: HTTP Error 404: Not Found)
> >
> > and if you try to download any of the resource files from the source
> > site you'll find they no longer exist, eg:
> >
> >
> http://www.cabinetoffice.gov.uk/sites/default/files/resources/pm-meetings.csv
> >
> > Related to the new work that's being done around the new datastorer
> > service (data pusher is its current name I think) and the new datastorer
> > paster command/cron job:
> >
> > I'm not sure how we intend to deal with this problem in CKAN -- when a
> > resource file is linked to, and then the source file on the remote site
> > moves or disappears. Once we have the datastorer service and script
> > stuff sorted out, then it can be deployed and a resource file like this
> > would have been pulled into the datastore so could be previewed from the
> > datastore. But what should the datastorer do, when it finds that the
> > original source file is gone? Should it leave the data in the datastore,
> > so that preview and data API keep working? Or should it delete the data
> > in the datastore, and have the resource page display some clear error
> > message that says the source file is no longer there?
>
> Ping. This seems related to the datapusher discussion we had this
> morning
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20130418/49ee17d7/attachment-0001.html>


More information about the ckan-dev mailing list