[ckan-dev] Harvesting improvement ideas for ArcGIS

Steven De Costa steven.decosta at linkdigital.com.au
Thu Dec 10 03:00:18 UTC 2015


Heya folks,

We are looking at the best way to close the gap on a harvest workflow
between CKAN and ArcGIS.

If you look at the info on the ArcGIS site here:
http://doc.arcgis.com/en/open-data/provider/federating-with-ckan.htm

Then, the problem we want to solve is mentioned at the end of the page.
That is:
*"Note: You may notice some strange behavior the first time you try to
preview a CSV or JSON file. Open Data is generating a cache of this data
and CKAN does not know how to handle this case when the data is processing.
This will not occur again the next time you try to preview the file."*

The effect of this for a site that actually wants to harvest the resource
rather than just store the resource URL is that we end up getting a
response when requesting something like aCSV or geojson that looks like
this:
{"processing_time":"0.005 seconds","count":1,"generating":
{"progress":"100%","start":144215,"csv":"generated","geojson":true,"kml":true}

We have decided that we'll need to parse such results and create a sub
queue that can be rerun to gather actual files when the generation of the
file is complete. We have found that if ArcGIS is itself harvesting from a
third source in realtime then the end result might be an error, so we also
need to handle these in the main queue and sub queue.

I'm not really posting this as a question, but it would be great to know if
anyone has already built this kind of process extension to harvesting. We'd
rather use an approach that is commonly used in such cases than create a
whole new approach :)

Any thoughts or pointers?

Cheers,
Steven


*STEVEN DE COSTA *|
*EXECUTIVE DIRECTOR*www.linkdigital.com.au
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20151210/da495f10/attachment-0002.html>


More information about the ckan-dev mailing list