[ckan-discuss] Harvesting harvesters

Adrià Mercader adria.mercader at okfn.org
Mon Nov 4 10:30:30 GMT 2013


Hi Philip,

Apologies for the late reply.

Although harvest sources are internally implemented using a type of
dataset, they have a separate model and logic. This means that even if
you managed to create a local dataset with the same attributes you
still wouldn't get a functional harvest source, so I don't think going
the CKAN harvester route is a good option.

Ideally, you would get all harvest sources from the remote CKAN
instance with the harvest_source_list action and recreate them on your
local instance with harvest_source_create. I say ideally because
unfortunately the current implementation of harvest_source_list is
really slow (it will probably time out in a big instance like
data.gov) and does not return the sources in the proper schema that
harvest_source_create would consume. I've created an issue for fixing
this, which should not be difficult to do [1].

In the meantime, you can adapt the output of the package_search action
you posted slightly to create harvest sources via harvest_source
create, as this script (lightly tested) shows:

https://gist.github.com/amercader/7300751


Hope this helps, let us know if you have any doubt.

Adrià




[1] https://github.com/okfn/ckanext-harvest/issues/73

On 31 October 2013 16:03, Philip Ashlock <phil at civicagency.org> wrote:
> What's the best way to import harvest sources from another instance of CKAN?
>
> I see that it's easy to make an API call that filters for harvesters in
> particular organizations, eg
> http://catalog.data.gov/api/3/action/package_search?q=type:harvest%20AND%20organization:usgs-gov
>
> but I also see that the stock CKAN harvester filters out harvest sources:
>
>             if package_dict.get('type') == 'harvest':
>
>                 log.warn('Remote dataset is a harvest source, ignoring...')
>
>                 return True
>
>
> https://github.com/okfn/ckanext-harvest/blob/master/ckanext/harvest/harvesters/ckanharvester.py#L248
>
> Filtering out harvest sources on an import sounds like a necessary and
> obvious default, but if I actually just wanted to import the harvest sources
> would it make sense to create a copy of the main CKAN harvester extension
> that instead filters only for harvesters rather than filters them out?
>
> _______________________________________________
> ckan-discuss mailing list
> ckan-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-discuss
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-discuss
>



More information about the ckan-discuss mailing list