[ckan-dev] Harvesting data catalogs - proposal for datamap.json

Dmitry Kachaev dmitry.kachaev at gmail.com
Fri Mar 1 17:52:56 UTC 2013


    Hi everyone,
I'm thinking about ways to improve data.gov. Recently it was said that
data.gov is moving towards CKAN platform -
http://ckan.org/2013/02/04/us-data-gov-to-use-ckan/

It seems that sensible approach is to have federated data.gov where it is
harvesting and indexing varios agencies' data catalogs into main
data.govcatalog/index.

Currently, AFAIK, CKAN harvester is essentially manually fed with list of
CKAN API endpoints to harvest for. What we were thinking is to introduce
automated approach to build such index.

Here is quick and dirty description:

Every agency/organization that runs data catalog(s) will create single
easily discoverable file datamap.json that will list information about its
data catalog API/endpoint for harvesting. Such file will be put into root
of the agency website similar to robots.txt/sitemap.xml e.g.
agency.gov/datamap.json

Example of the datamap.json file:
{
    "data-catalogs": [
        {
            "api-name": "data-json",
            "version": "v1.0",
            "endpoint": "http://data.mcc.gov/raw/index.json",
            "contact": "MCC Open Data Initiative",
            "email": "opendata at mcc.gov"
        },
        {
            "api-name": "ogc-csw",
            "version": "v2.0.2",
            "endpoint": "
http://geo.data.gov/geoportal/csw/discovery?Request=GetCapabilities&Service=CSW&Version=2.0.2",
            "contact": "Geo Spatial One Stop Team",
            "email": "onestop at fgdc.gov"
        },
        {
            "api-name": "socrata-api",
            "version": "v1.0",
            "endpoint": "http://explore.data.gov/api/",
            "contact": "Data.gov team",
            "email": "contact at data.gov"
        }
    ]
}

Such approach will allow to enumerate through all .gov websites and build
index of all data catalog endpoints and then harvest them in unified way
(using CKAN Harvester with extra plugins for different type of catalogs
like Socrata or Geo catalogs supporting CSW standard)

What are you thought on this approach? Are we reinventing the wheel? Is
this a right place to ask this question?

Thanks,
Dmitry

Dmitry Kachaev
voice: (202) 527-9423
twitter: @kachok
mail: dmitry.kachaev at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20130301/27bb5967/attachment.html>


More information about the ckan-dev mailing list