[ckan-dev] Datastore and linked-to resources that no longer exist

Sean Hammond sean.hammond at okfn.org
Fri Apr 26 07:47:35 UTC 2013


Good thoughts Nigel. Actually, I wonder if it would be worth adding this
to this wiki page?

https://github.com/okfn/ckan/wiki/Spec:-DataStore-and-FileStore-Consolidation

On Thu, Apr 18, 2013 at 07:58:50PM +0530, Nigel Babu wrote:
> Ah, this is also connected to the dataproxy discussions.
> 
> Currently, there's no way for CKAN to know that something was attempted to
> be loaded into the datastore and it failed. Almost all the failures I've
> noticed in production are because of the file, i.e. not a CKAN/datastore
> issue.  When a user attempts to preview the resource, there will be nothing
> in the datastore for this resource and CKAN attempt to use the dataproxy
> and fails.
> 
> This gives us two options
> 1) Remove dataproxy. If a file isn't in the datastore, there's something
> wrong with it and it can't be loaded. Offer a download link instead.
> 2) Have a way to mark that a file was attempted to be loaded into the
> datastore and it failed. If the file isn't in the datastore and this
> failure is marked, dataproxy should not attempt to preview and only offer a
> download link.
> 
> The datastore_upload[1] script will ignore the resource if the download of
> the resource fails. That means if there was an existing entry in the
> datastore, it will continue to remain in the datastore. If a file was
> updated, the datastore will be updated on the next run.  The delete
> behaviour may not be entirely appropriate and should probably discussed
> further.
> 
> [1]
> https://github.com/okfn/ckanext-datastorer/blob/master/ckanext/datastorer/commands.py#L193
> 
> Nigel.
> 
> 
> 
> On 18 April 2013 19:07, Sean Hammond <sean.hammond at okfn.org> wrote:
> 
> > > Take a look at this dataset on publicdata.eu:
> > >
> > > http://publicdata.eu/dataset/ministerial-data-cabinet-office
> > >
> > > If you click on any of the resources you'll get an error:
> > >
> > > Could not load preview: DataProxy returned an error (Data transformation
> > > failed. HTTPError: HTTP Error 404: Not Found)
> > >
> > > and if you try to download any of the resource files from the source
> > > site you'll find they no longer exist, eg:
> > >
> > >
> > http://www.cabinetoffice.gov.uk/sites/default/files/resources/pm-meetings.csv
> > >
> > > Related to the new work that's being done around the new datastorer
> > > service (data pusher is its current name I think) and the new datastorer
> > > paster command/cron job:
> > >
> > > I'm not sure how we intend to deal with this problem in CKAN -- when a
> > > resource file is linked to, and then the source file on the remote site
> > > moves or disappears. Once we have the datastorer service and script
> > > stuff sorted out, then it can be deployed and a resource file like this
> > > would have been pulled into the datastore so could be previewed from the
> > > datastore. But what should the datastorer do, when it finds that the
> > > original source file is gone? Should it leave the data in the datastore,
> > > so that preview and data API keep working? Or should it delete the data
> > > in the datastore, and have the resource page display some clear error
> > > message that says the source file is no longer there?
> >
> > Ping. This seems related to the datapusher discussion we had this
> > morning
> >
> > _______________________________________________
> > ckan-dev mailing list
> > ckan-dev at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/ckan-dev
> > Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
> >

> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev





More information about the ckan-dev mailing list