[ckan-dev] Distributed data harvesting of both metadata and data

Philip Ashlock - QXA philip.ashlock at gsa.gov
Wed May 16 23:59:23 UTC 2018

I think there are a few examples where CKAN is used in concert with merging
similar data from a few disparate sources, e.g the UK's organogram data, or
where there are different metadata schemas used for different kinds of
information, as I believe open.canada.ca has done with certain types of
 information using the scheming extension, but I'm curious if anybody has
experimented with this approach using the DataStore and DataPusher, where
you might be grabbing multiple different spreadsheets from different URLs,
make sure they validate against the same schema, and then merge them all
into the same dataset in DataStore?

There are a lot of potential use cases I'd be interested in exploring

In the long term I'd also be interested if that approach might be developed
to take advantage of similar functionality for harvesting, e.g. merging
some of the concepts of DataPusher and harvester.

I'm just thinking about this at a high level right now, but curious if
there have been any experiments with extensions that might take the
distributed multi-source harvest approach down to specific datasets in
