[ckan-dev] CKAN upload custom format tsv on schedule

Denis Zgonjanin deniszgonjanin at gmail.com
Mon Aug 10 13:55:29 UTC 2015


Hi Christopher,

Something that meets these requirements doesn't exist yet. For #2, you can
use datapusher. For #3, you would need a cron job or a scheduled celery
task to poll that file periodically and pull it in. For #1, there is
nothing, and this part may be hard to build.

The approach that I think you're taking - pulling the file periodically -
is a good approach, but sub-optimal because there is a delay between when
the data is updated and when your cron job will pull it in.

For your case though, it seems the data is only updated once a month, so
it's not that big of a deal. But in general, I think CKAN is still waiting
for somebody to build a solution to this type of problem - where data is
updated remotely, but you want it to be available and always up-to-date in
the datastore.

For example, I have a similar problem with data like this:
http://data.ottawa.ca/dataset/recreation-guides/resource/0bed5111-0361-4b1e-8ddc-183428a575ce

That file is updated every 15 minutes with latest data, and I want it in
the datastore. If I had the time to do this, I would probably try building
something using Postgres Foreign Data Wrappers (
http://multicorn.readthedocs.org/en/latest/)

- Denis

On Mon, Aug 10, 2015 at 8:53 AM, Christopher Njuguna <
cnjuguna at rwanda.cmu.edu> wrote:

> Hi,
>
>
> I am trying to upload custom formatted data files from the UK climate site
> e.g.this file
> <http://www.metoffice.gov.uk/pub/data/weather/uk/climate/stationdata/aberporthdata.txt>.
> There are 5 lines of metadata and 1 header line, the file is tab-delimited
> with some special column data.
>
>
> 1) Can CKAN preprocess the file according to a format I give it so that
> only data are picked up. Possibly saving the metadata in the description?
>
> I would prefer a frontend option because I want users to be able to do
> this themselves.
>
>
> 2) Is it possible to have a dataset uploaded automatically once the url is
> entered. I currently have to go to the manage -> datastore page and click
> on upload to datastore to have the data populated.
>
>
> 3) Can the dataset be updated at a regular interval?
>
>
> Thanks,
>
>
> Chris
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20150810/22c9d842/attachment-0003.html>


More information about the ckan-dev mailing list