[ckan-discuss] Help writing a Harvester for a generic Web Accessible Folder

Adrià Mercader adria.mercader at okfn.org
Tue Feb 25 17:41:10 UTC 2014

Hi Stephen,

You can probably reuse all logic in the WAF harvester [1] on your own
harvester (just don't extend Spatial Harvester), as its gather_stage
and fetch_stage basically deal with parsing a remote folder,
downloading the contents of the remote files and storing them into the
CKAN db. The only spatial specific part are some lines that you can
remove [2].

You will need of course to write your own import_stage that will
transform whatever document type you want to harvest into a CKAN dict.
You can look into the ckan-ckan or spatial import_stages or also [3],
which might be simpler to follow.

Hope this helps,


[1] https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/harvesters/waf.py
[2] https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/harvesters/waf.py#L203-L219
[3] https://github.com/ckan/ckanext-dcat/blob/master/ckanext/dcat/harvesters.py#L268

On 25 February 2014 04:45, Stephen Barton <svbarton at ucdavis.edu> wrote:
> Hi, I am trying to make a harvester extension for a generic Web Accessible
> Folder (WAF).  I have reviewed the documentation for the ckanext-harvest
> extension (that harvests other CKAN instances) and the ckanext-spatial
> extension (that harvests spatial metadata from WAFs), but it's not clear how
> to modify the code of ckanharvester.py (or the spatial harvester code
> waf.py) for a generic WAF.  I could not find anything on the discussion
> archive.
> https://lists.okfn.org/mailman/listinfo/ckan-discuss
> The info on this page addresses writing a custom harvester, but it's not
> sufficient for me.
> https://github.com/ckan/ckanext-harvest#the-harvesting-interface
> Thanks
> _______________________________________________
> ckan-discuss mailing list
> ckan-discuss at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-discuss
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-discuss

More information about the ckan-discuss mailing list