[ckan-dev] CKAN upload custom format tsv on schedule

Matthew Fullerton matthew at smartlane.de
Tue Aug 11 08:54:46 UTC 2015


Hi Chris,

We are running into this issue quite a lot and we have two and half approaches :)


1. Pull the data when needed for visualization, either directly (which rarely works because the format isn't suitable as in your case) or with live processing with a backend. I.e. do you really need the data in your CKAN or do you just need to access it?

2. Pre-process the data at regular intervals and write it to the datastore. There is a nice API for that, and it means you can archive historical values if the external datasource only delivers the live values. This is like the cron-jon option Denis mentioned.

3. Treat the external data as a stream and trigger an update any time the data changes. "Everything's a stream"...

http://www.confluent.io/blog/stream-data-platform-1/ - we are working in this direction and I'd love to talk to others who are interested in seeing this happen. I'm not sure it needs to be part of CKAN but on the other hand some kind of UI for keeping track of everything is important.


Best,

Matt


Projekt SMARTLANE

matthew at smartlane.de<mailto:florian at smartlane.de>
T +49.89.289.28575
F +49.89.289.22333
http://www.smartlane.de/en

EXIST-Gründungsvorhaben „Tapestry“
c/o Lehrstuhl für Verkehrstechnik
Technische Universität München
Arcisstraße 21
80333 München

Gefördert vom Bundesministerium für Wirtschaft und Technologie
aufgrund eines Beschlusses des Deutschen Bundestages.

________________________________
Von: ckan-dev <ckan-dev-bounces at lists.okfn.org> im Auftrag von Denis Zgonjanin <deniszgonjanin at gmail.com>
Gesendet: Montag, 10. August 2015 15:55
An: CKAN Development Discussions
Betreff: Re: [ckan-dev] CKAN upload custom format tsv on schedule

Hi Christopher,

Something that meets these requirements doesn't exist yet. For #2, you can use datapusher. For #3, you would need a cron job or a scheduled celery task to poll that file periodically and pull it in. For #1, there is nothing, and this part may be hard to build.

The approach that I think you're taking - pulling the file periodically - is a good approach, but sub-optimal because there is a delay between when the data is updated and when your cron job will pull it in.

For your case though, it seems the data is only updated once a month, so it's not that big of a deal. But in general, I think CKAN is still waiting for somebody to build a solution to this type of problem - where data is updated remotely, but you want it to be available and always up-to-date in the datastore.

For example, I have a similar problem with data like this: http://data.ottawa.ca/dataset/recreation-guides/resource/0bed5111-0361-4b1e-8ddc-183428a575ce

That file is updated every 15 minutes with latest data, and I want it in the datastore. If I had the time to do this, I would probably try building something using Postgres Foreign Data Wrappers (http://multicorn.readthedocs.org/en/latest/)

- Denis

On Mon, Aug 10, 2015 at 8:53 AM, Christopher Njuguna <cnjuguna at rwanda.cmu.edu<mailto:cnjuguna at rwanda.cmu.edu>> wrote:

Hi,


I am trying to upload custom formatted data files from the UK climate site e.g.this file<http://www.metoffice.gov.uk/pub/data/weather/uk/climate/stationdata/aberporthdata.txt>. There are 5 lines of metadata and 1 header line, the file is tab-delimited with some special column data.


1) Can CKAN preprocess the file according to a format I give it so that only data are picked up. Possibly saving the metadata in the description?

I would prefer a frontend option because I want users to be able to do this themselves.


2) Is it possible to have a dataset uploaded automatically once the url is entered. I currently have to go to the manage -> datastore page and click on upload to datastore to have the data populated.


3) Can the dataset be updated at a regular interval?


Thanks,


Chris

_______________________________________________
ckan-dev mailing list
ckan-dev at lists.okfn.org<mailto:ckan-dev at lists.okfn.org>
https://lists.okfn.org/mailman/listinfo/ckan-dev
Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20150811/b0a87b02/attachment-0003.html>


More information about the ckan-dev mailing list