[ckan-dev] Datastore and linked-to resources that no longer exist

Ross Jones ross at servercode.co.uk
Fri Apr 26 18:29:26 UTC 2013


Hi Darwin,

I understand you're talking about removing the _use_ of the jsonpdataproxy, which sounds sensible, but removing jsonpdataproxy itself might cause issues (although no longer for data.gov.uk).  Of course you could always fix the dataproxy as it has value as a stand-alone thing (unless there's a huge objection or cost to using appengine) ;).

Prior to 2.0 the application.js shipping with 1.x was using jsonpdataproxy.appspot.com, so removing the app from appspot is likely to break other CKAN installations.  Of course it's a free service, but it might be best if there was an alternative for users who aren't yet ready to upgrade to 2.0 before turning off jsonpdataproxy and potentially breaking 1.x applications.

Ross


On 26 Apr 2013, at 18:33, Darwin Peltan wrote:

> I'd be +1 for removing the dataproxy and prompting user to download rather than the current behaviour where the user see's a failed attempt to preview. Is there any reason to keep the dataproxy? 
> 
> Darwin Peltan
> Project Manager
> 
> The Open Knowledge Foundation 
> http://www.okfn.org
> 
> Skype: darwinp
> Twitter: @darwin
> 
> 
> On 26 April 2013 08:47, Sean Hammond <sean.hammond at okfn.org> wrote:
> Good thoughts Nigel. Actually, I wonder if it would be worth adding this
> to this wiki page?
> 
> https://github.com/okfn/ckan/wiki/Spec:-DataStore-and-FileStore-Consolidation
> 
> On Thu, Apr 18, 2013 at 07:58:50PM +0530, Nigel Babu wrote:
> > Ah, this is also connected to the dataproxy discussions.
> >
> > Currently, there's no way for CKAN to know that something was attempted to
> > be loaded into the datastore and it failed. Almost all the failures I've
> > noticed in production are because of the file, i.e. not a CKAN/datastore
> > issue.  When a user attempts to preview the resource, there will be nothing
> > in the datastore for this resource and CKAN attempt to use the dataproxy
> > and fails.
> >
> > This gives us two options
> > 1) Remove dataproxy. If a file isn't in the datastore, there's something
> > wrong with it and it can't be loaded. Offer a download link instead.
> > 2) Have a way to mark that a file was attempted to be loaded into the
> > datastore and it failed. If the file isn't in the datastore and this
> > failure is marked, dataproxy should not attempt to preview and only offer a
> > download link.
> >
> > The datastore_upload[1] script will ignore the resource if the download of
> > the resource fails. That means if there was an existing entry in the
> > datastore, it will continue to remain in the datastore. If a file was
> > updated, the datastore will be updated on the next run.  The delete
> > behaviour may not be entirely appropriate and should probably discussed
> > further.
> >
> > [1]
> > https://github.com/okfn/ckanext-datastorer/blob/master/ckanext/datastorer/commands.py#L193
> >
> > Nigel.
> >
> >
> >
> > On 18 April 2013 19:07, Sean Hammond <sean.hammond at okfn.org> wrote:
> >
> > > > Take a look at this dataset on publicdata.eu:
> > > >
> > > > http://publicdata.eu/dataset/ministerial-data-cabinet-office
> > > >
> > > > If you click on any of the resources you'll get an error:
> > > >
> > > > Could not load preview: DataProxy returned an error (Data transformation
> > > > failed. HTTPError: HTTP Error 404: Not Found)
> > > >
> > > > and if you try to download any of the resource files from the source
> > > > site you'll find they no longer exist, eg:
> > > >
> > > >
> > > http://www.cabinetoffice.gov.uk/sites/default/files/resources/pm-meetings.csv
> > > >
> > > > Related to the new work that's being done around the new datastorer
> > > > service (data pusher is its current name I think) and the new datastorer
> > > > paster command/cron job:
> > > >
> > > > I'm not sure how we intend to deal with this problem in CKAN -- when a
> > > > resource file is linked to, and then the source file on the remote site
> > > > moves or disappears. Once we have the datastorer service and script
> > > > stuff sorted out, then it can be deployed and a resource file like this
> > > > would have been pulled into the datastore so could be previewed from the
> > > > datastore. But what should the datastorer do, when it finds that the
> > > > original source file is gone? Should it leave the data in the datastore,
> > > > so that preview and data API keep working? Or should it delete the data
> > > > in the datastore, and have the resource page display some clear error
> > > > message that says the source file is no longer there?
> > >
> > > Ping. This seems related to the datapusher discussion we had this
> > > morning
> > >
> > > _______________________________________________
> > > ckan-dev mailing list
> > > ckan-dev at lists.okfn.org
> > > http://lists.okfn.org/mailman/listinfo/ckan-dev
> > > Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
> > >
> 
> > _______________________________________________
> > ckan-dev mailing list
> > ckan-dev at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/ckan-dev
> > Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
> 
> 
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
> 
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20130426/af080673/attachment-0001.html>


More information about the ckan-dev mailing list