[ckan-discuss] API For Package Name

Wed Dec 1 15:13:39 GMT 2010

I wonder if (some of) this discussion should be on ckan-dev? Its all
got rather technical for me (which is not to say it's too technical
for others!).

*Definitely* think its important that this goes to one of the CKAN
lists, but not sure which list is for what?

What do folks think?

On Wed, Dec 1, 2010 at 11:25 AM, William Waites <ww at eris.okfn.org> wrote:
> * [2010-11-28 23:12:33 +0000] John Bywater <john.bywater at appropriatesoftware.net> écrit:
>
> ] >Yes, the RDF/JSON is basically used to avoid needing to parse RDF.
> ] >Apart from that it is just another serialisation, the data model stays
> ] >the same.
> ] >
> ]
> ] You mean parse RDF/XML? Does RDF/JSON not have exactly the same
> ] complexity? How is it adequate otherwise? Questions for myself perhaps...
>
> Yes, I meant parsing RDF/{XML,N3,NT}. RDF/JSON is equivalent but it is
> natively JSON so a JSON program can just eval() it.
>
> ] Would any other tools be able to take advantage of it? It seems to be
> ] similar to CSV. CSV also doesn't "have a concept of" URLs. I've always
> ] liked the idea of using URLs as identifiers in the API, but I'm just
> ] trying identify exactly what we gain by introducing URLs into the data
> ] formats.
>
> I guess it means that third parties can write statements about them
> because there is a way to refer to them with the URI. So merely by
> using URIs it means others can write RDF. A good exampe of this is the
> licenses. If the license field were a URI it means that I could go
> and, for example, try to model the license independently and it would
> still join up.
>
> ] Sounds good! At the same time, it occurs to me that many of the other
> ] interfaces I've seen over the last few days present the different
> ] content types at the same locations. I can't remember seeing one that
> ] redirects to a different domain. Is that a common or expected thing
> ] to do?
>
> Good question. There's definitely nothing wrong with redirecting to a
> different ocation, but it might be unusual. One circumstance where
> that happens commonly is where people use services like
> http://purl.org/ which are purely redirect services. There are
> definitely ways we could work around this, one is by putting the RDF
> generation stuff directly in CKAN, another is by playing games with
> the web server config to hook certain URLs before they get to the
> CKAN application.
>
> ] I don't know (think you might have told me once...) what exactly the
> ] cohesive mechanism is behind the RDF service. Whatever it is, perhaps we
> ] could at least consider calling the service from CKAN's API controllers
> ] and have the RDF content returned directly? We could also support the
> ] extensions .json .rdf .n3 so content type can be specified in the locator.
>
> The way it is implemented now, there is a nightly cron job that crawls
> the API and writes out flat files. There is a very simple little
> content autonegotiation script that handles serving out the different
> serialisations. I would like to eventually change this to listen to
> the queue or RSS feed and only generate what ones have changed...
>
> One reason to keep this outside of ckan is the extensions of which
> more below.
>
> ] >It already does: http://semantic.ckan.net/sparql and above links...
> ]
> ] That's really great. I hadn't visited those pages before. You'll have to
> ] explain me how it works. I'm guessing (remembering?) there is a triple
> ] store and you update it from CKAN's API?
>
> Thanks :)
>
> The cron job above uses rdflib to build the RDF representation. In
> the catalogue.data.gov.uk case it just uses a memory store as
> temporary storage before writing out the flat files. These get tarred
> up and rsynced to the same machine as semantic.ckan.net and then dumped
> into the virtuoso store for querying. For semantic.ckan.net it just
> uses the virtuoso store instead of the memory store so there is no
> need for the extra step. When you request an individual .rdf or .n3
> file it just comes off the disk though.
>
> ] >There's always room for improvement of course, most immediately
> ] >separating out extension descriptions (e.g. so that Richard can
> ] >generate voiD separately and have it pulled into the store) and doing
> ] >something similar for the other instances, but I think CKAN already
> ] >has the proverbial 5 stars.
> ] >
> ]
> ] That's great. What do you mean by "separating out extension descriptions"?
>
> Right. So there is some generic information about each package that we
> can express with DCAT/DC and friends. Then there are the packages in
> the LODCloud group. They have extra metadata that only makes sense for
> RDF datasets (sparql endpoints, example resources, counts of triples
> and links to other datasets, etc). There is a vocabulary for this,
> called voiD [0]. A void:Dataset could be understood like a subclass of
> a dcat:Dataset (I'm not sure if this has been explicitly declared to
> be so, there was some discussion about it).
>
> So now, the ckanrdf scripts [1] generate the DCat stuff and look at
> the extras and tags and also generate voiD if they look like they are
> RDF-related. This is fine as far as it goes, but sooner or later it
> will lead to a maintenance nightmare. What we really want is the
> curators of groups, who are the ones who have a much better idea what
> their extras conventions are and how they might map to RDF to be able
> to generate these extra bits of description themselves, and have a
> mechanism for contributing them back.
>
> Another example is data.gov.uk. Our generic dataset describing script
> shouldn't need to know the (suprisingly intricate) details of how to
> determine and reference a UK public body, so we want a separate
> process to do that.
>
> Yet another is the library linked data group that will probably end up
> with a bunch of library data in rdf related extras and tags that are
> really irrelevant to the other datasets.
>
> By separating these things out we also enable the community to take on
> some of the work for the datasets that they care about and to do so in
> whichever programming idiom they are most comfortable (maybe they like
> PHP or Ruby more than Python).
>
> [0] http://vocab.deri.ie/void
> [1] http://bitbucket.org/ww/ckanrdf
>
> ] Is the separate "semantic" hostname particularly desirable? If so, would
> ] it be desirable to have a "semantic." companion for each CKAN site?
>
> In general I don't really think so. It would be nice to have them
> together in some way. We definitely want to arrange so that all their
> catalogues end up in one triplestore so you can easily do queries
> across them but that's a slightly separate issue. We could just have
> some space within semantic.ckan.net/{ca,de,ie,...}/..., that would be one
> option. Putting it directly in {ca,de,ie,...}.ckan.net means we have
> to manage harvesting of community provided descriptions within those
> installations in some way though...
>
> ] If RDF was returned by the API, it could be returned by resources such as:
> ]
> ] http://catalogue.data.gov.uk/api
> ]
> ] What's '{"version": "1"}' in RDF? :-)
>
> Strictly speaking, this bit of JSON is underspecified. "version 1" of
> what? It seems obvious if you (as a human) look at that URI and break
> it apart, do some background research and discover that the site runs
> CKAN and CKAN has an API and infer that that's probably what that bit
> of JSON refers to (and not the version of CKAN for instance)...
>
> There is some way of expressing software package versions (see DOAP
> which you may be familiar with frin PyPi).
>
> So maybe you have something like,
>
> <http://catalogue.data.gov.uk/api> :supports ckan:APIv1 .
>
> ckan:API rdfs:label "CKAN API";
>         foaf:homepage <http://ckan.org/wiki/page/describing/etc>;
>         dcterms:hasVersion ckan:APIv1.
>
> ckan:APIv1 rdfs:label "CKAN APIv1";
>         foaf:homepage <http://ckan.org/wiki/page/describing/v1>.
>
> (I made up the :supports predicate, there might be an existing one
> that is good to use or we might have to invent our own, and we would
> need to map the ckan: prefix to a namespace where we describe CKAN)
>
> ] Whatever the service architecture, given there are so many possibilities
> ] in between that appear to offer little but agony, we're in great shape.
> ] I think we could very usefully document these different interfaces (the
> ] Web Interface, the Semantic Interface, and the Domain Model Interface)
> ] as a coherent multi-channel provision. I know the Web UI package details
> ] package presented together links to package resources in different
> ] formats. But we could make something more of the range of different
> ] service capabilities. Now that we've identified the rather different
> ] worlds each addresses, perhaps we could document the different
> ] engineering purposes?
> ]
> ] Or am I just catching up with what everybody already knows? :-)
>
> No, certainly as always there is a general lack of documentation for
> all of this. It's discoverable in a "follow your nose" sense but we
> really should work to make it more obvious. We really do need to work
> out our conventions for the other CKAN instances in terms of minting
> URIs and decide on things like moving some of the logic up into the
> webserver config, etc.
>
> Cheers,
> -w
>
> --
> William Waites
> http://eris.okfn.org/ww/foaf#i
> 9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664
>
> _______________________________________________
> ckan-discuss mailing list
> ckan-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>

-- 
Jonathan Gray

Community Coordinator
The Open Knowledge Foundation
http://blog.okfn.org

http://twitter.com/jwyg
http://identi.ca/jwyg