[datacatalogs] Status of data catalog metadata standards
james at opennorth.ca
Sat Jan 25 00:10:23 UTC 2014
On 2014-01-24, at 6:33 PM, Philip Ashlock wrote:
> On Fri, Jan 24, 2014 at 5:33 PM, James McKinney <james at opennorth.ca> wrote:
> http://schema.org/Dataset derives from DCAT.
> It seems like just about everything in this space other than proper DCAT is "derived" from DCAT, yet is still different
In Schema.org, Dataset is a subclass of CreativeWork, so it inherits all of the CreativeWork properties (similar for DataCatalog and DataDownload). In any case, I don't think there are conflicting differences between Schema.org and DCAT. If any other spec conflicts with DCAT, then that's a problem.
> DCAT is a W3C Recommendation, which is as stable and finalized as anything goes in W3C.
> It looks like language clarifying its status as such was just added a week ago. This is reassuring even if such recentness feels a little counter to the notion of "stability" ;) http://www.w3.org/TR/2014/REC-vocab-dcat-20140116/diff-20131217.html
To be promoted to "Recommendation", a spec needs to be stable. The change in document status simply confirms that this is the case.
> In terms of adoption, you'll find an incomplete list here http://www.w3.org/2011/gld/wiki/DCAT_Implementations
> I suspect there's a lot missing there. Seems like https://github.com/okfn/ckanext-dcat could probably be added right?
Yes, the purpose of that list is simply to provide evidence as part of the W3C process. I note that ckanext-dcat is already included: see the row with the words: "Simple Data Catalog Interoperability Proposal. A working implementation of this proposal for CKAN catalogs is being developed within the context of the PublicData.eu project."
> To my knowledge, dataprotocols.org is not about data catalog metadata. It addresses other problems like CSV on the web, etc.
> Right. I think I meant http://spec.datacatalogs.org
I believe that describes an API that delivers data in DCAT format. If you're looking for API specs, both Socrata and OKF have proposals.
> As it is now it seems as if data.gov would need to support three specifications:
> 1. The Project Open Data schema for interoperability within the federal government
> 2. The Schema.org Datasets schema for search engines
> 3. "Pure" DCAT for everything else, eg via https://github.com/okfn/ckanext-dcat
> Does that sound right?
I'm not sure if any search engine actually uses Schema.org's Dataset, etc. classes, but, if any do, then you should use Schema.org. If you just want to mark up HTML semantically, you can use RDFa with DCAT terms, which is documented at http://project-open-data.github.io/metadata-resources/ I last edited that page to get the RDFa up-to-date. I'm not sure if the Schema.org terms are up-to-date.
The POD schema is, as far as I understand, basically DCAT with a few extra properties. It would be straight-forward to write a JSON-LD context that maps the JSON terms to their RDF URIs, making a semantic link between a POD JSON file and the DCAT vocabulary. All I'm saying is that there aren't three entirely different specs that need to be supported; there's a lot of overlap.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the data-catalogs