[ckan-dev] What's the future of DataPusher?

Ian Ward ian at excess.org
Tue Jan 30 13:37:47 UTC 2018


David's Express loader extension is a really promising replacement for
datapusher: It's simpler (uses background jobs, has no separate micro
service or bidirectional API) more efficient (loads straight into the
database) and more reliable (doesn't guess column types)


+1 to python scripts and the datastore API, we use those in a few
places. We're using the new datastore trigger functionality to enforce
validation of submitted data and to record row creation times,
modification times and the last user to edit each row
https://github.com/open-data/ckanext-canada/blob/master/ckanext/canada/triggers.py


open.canada.ca also uses datastore with ckanext-recombinant
https://github.com/open-data/ckanext-recombinant/tree/standalone to
collect/update/delete data. We collect similar data from all our
organizations and publish combined CSV files publicly. The schema for
these files is controlled centrally so services that automatically
load arbitrary tabular data don't fit our use case. Our schemas are
defined in yaml as part of our theme extension
https://github.com/open-data/ckanext-canada/tree/master/ckanext/canada/tables

Recombinant generates Excel templates for editors to upload and update
datastore tables. These templates include drop-downs for controlled
vocabularies and validation of data while it's being entered. Excel
validation highlights cells that will cause an error when the user
tries to upload, so they can find and fix data issues much more
quickly.

Updates are handled through the same templates by using normal
datastore upsert to replace based on primary keys. Deletion is handled
with a form where users can paste the keys for they rows they want
removed. The templates only need the records that are being added or
updated, not the complete data.

I'd like to share some of the Recombinant features more widely. Maybe
integrating with Adrià's table schema work we could provide templates
and validation for users to upload data iteratively with live
validation based on the schema attached to the resource.

On Tue, Jan 30, 2018 at 3:38 AM, Matthew Fullerton
<matt.fullerton at gmail.com> wrote:
> Hi Florian,
>
> Yes, Python scripts :-) The data is being incrementally updated so
> Datapusher doesn't fit. It is occasionally useful when a CSV file arrives
> though.
>
> Best,
> Matt
>
> On 30 January 2018 at 08:54, <Florian.Brucker at it.karlsruhe.de> wrote:
>>
>> Hi Matthew,
>>
>> thanks for your input!
>>
>> > I have used Datastore a lot without the datapusher because it gives
>> > me a nice HTTP API over the data table entries.
>>
>> So I guess you have some (probably custom) software that you use to upload
>> data to the DataStore via its API in place of DataPusher?
>>
>>
>> Best regards
>> Florian
>>
>>
>> "ckan-dev" <ckan-dev-bounces at lists.okfn.org> schrieb am 30.01.2018
>> 08:32:09:
>>
>> > Von: Matthew Fullerton <matt.fullerton at gmail.com>
>> > An: CKAN Development Discussions <ckan-dev at lists.okfn.org>,
>> > Datum: 30.01.2018 08:32
>> > Betreff: Re: [ckan-dev] What's the future of DataPusher?
>> > Gesendet von: "ckan-dev" <ckan-dev-bounces at lists.okfn.org>
>> >
>> > Hey Florian,
>> >
>> > I have used Datastore a lot without the datapusher because it gives
>> > me a nice HTTP API over the data table entries.
>> >
>> > Although the new feature of correcting column types via the data
>> > dictionary was in my opinion a major improvement, it's also a bit
>> > cumbersome and feels like a plaster.
>> >
>> > Best,
>> > Matt
>> >
>> > On 30 January 2018 at 08:28, <Florian.Brucker at it.karlsruhe.de> wrote:
>> > Hi everyone,
>> >
>> > after running into yet another problem* with DataPusher I was
>> > wondering if there are any plans to replace it with a more modern
>> > solution? In my experience, one reason why working with DataPusher
>> > is often cumbersome is its implementation as a separate service. As
>> > far as I understand, the same functionality could nowadays be
>> > implemented easily as a CKAN extension based on background jobs. In
>> > my opinion, this functionality should even be incorporated directly
>> > into the DataStore extension (honest question: are there many people
>> > using DataStore without the DataPusher, and if so, why?).
>> >
>> >
>> > Best regards,
>> > Florian
>> >
>> >
>> > * after adding a resource with an uploaded CSV file to an existing
>> > dataset, DataPusher not only failed to push the data into DataStore
>> > automatically but also somehow managed to break the DataStore-
>> > entries for the dataset's existing resources so that all of them had
>> > to be re-uploaded to the DataStore "manually" by clicking the buttonin
>> > the UI.
>> >
>> > _______________________________________________
>> > ckan-dev mailing list
>> > ckan-dev at lists.okfn.org
>> > https://lists.okfn.org/mailman/listinfo/ckan-dev
>>
>> > Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>
>> > _______________________________________________
>> > ckan-dev mailing list
>> > ckan-dev at lists.okfn.org
>> > https://lists.okfn.org/mailman/listinfo/ckan-dev
>> > Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>



More information about the ckan-dev mailing list