[ckan-dev] avoid harvesting duplicated datasets from different instances
mane moshref
many_yammy at yahoo.com
Tue Oct 1 16:27:23 UTC 2019
Hello all,
I would like to harvest two different CSW instances in
- National scale:
http://gdk.gdi-de.org/gdi-de/srv/ger/csw?service=CSW&request=GetCapabilities
- State scale:
http://geoportal.bayern.de/csw/bvv?service=CSW&request=GetCapabilities
The tricky point is that all the state data are theoretically harvested by national instance however practically it is not true.
So I need to harvest both instances but I do not to store/harvest the duplicated datasets. So it means that the "gmd:fileIdentifier" which is unique for both instances should be checked before the datasets copied into ckan database.
Look at these two examples:
- National instance:
http://gdk.gdi-de.org/gdi-de/srv/ger/csw?service=CSW&request=GetRecordById&elementSetName=full&service=CSW&version=2.0.2&OutputSchema=csw:IsoRecord&id=e0eddd10-007a-11e0-be74-0000779eba3a
- State Bavaraia:
http://geoportal.bayern.de/csw/bvv?service=CSW&request=GetRecordById&elementSetName=full&service=CSW&version=2.0.2&OutputSchema=csw:IsoRecord&id=e0eddd10-007a-11e0-be74-0000779eba3a
Now my question is: is there any way to avoid harvesting redundant datasets?
for example adding any condition in the configuration part or adding any line directly in the related part of the code?
Best regards
Mani
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20191001/ce2c64c3/attachment.html>
More information about the ckan-dev
mailing list