[ckan-discuss] API For Package Name

Sun Nov 28 23:12:33 GMT 2010

Hi Will,

William Waites wrote:
> * [2010-11-28 14:00:20 +0000] John Bywater <john.bywater at appropriatesoftware.net> écrit:
> ]
> ] I also found a message from Jeni Tenison, which started a long thread:
> ] http://www.mail-archive.com/public-lod@w3.org/msg04086.html
> ] 
> ] [...]

I'd like to find out whether or not Jeni got what she was looking for.

> ] 
> ] So, looking at all this from a distance, there appears to be two poles: 
> ] RDF for RDF-enabled developers and RDF-enabled generic tools, and (for 
> ] the sake of simplicity, let's assume) domain driven design for "normal" 
> ] developers who are writing "normal" applications.
> 
> Yes, the RDF/JSON is basically used to avoid needing to parse RDF. 
> Apart from that it is just another serialisation, the data model stays
> the same.
> 

You mean parse RDF/XML? Does RDF/JSON not have exactly the same 
complexity? How is it adequate otherwise? Questions for myself perhaps...

> That's why I made http://bnb.bibliographica.org/isbn for the Wikimedia DE
> folks, it's lossy but easy to understand.
> 

Very nice!

> ] 1. A domain model is a model of behaviour-state. An RDF model is a model 
> ] of state (RDF models behaviour as state). That is, a Domain model 
> ] constitutes its own changes of state, whereas an RDF model expects 
> ] something external to constitute changes of state.
> 
> There has been some work recently that could introduce some notion of
> state transitions -- it's tied to provenance:
> 
>       http://iswc2010.semanticweb.org/accepted-papers/241
> 
> so to express the state change from t0 -> t2 you write down the
> SPARQL SELECT/INSERT/DELETE statements that describe the transition
> (more or less). You do the writing down in RDF. So the transition
> behaviour itself becomes state in the code is data sense.
> 
> I could imagine (not saying this would be a good idea) carrying around
> snippets of python code for operating on the data in the data itself
> to express more or less the same thing. But this is maybe a bit "out
> there". 
> 

Very interesting!

> ] 2. If CKAN presents JSON with absolute URLs where there are currently 
> ] invariant UUIDs (in Version 2, or variable names in Version 1), which 
> ] existing tools would be able to undertake traversals? For example, would 
> ] Googlebot (or anything else in operation today) treat URLs in JSON as 
> ] links, and follow them?
> 
> I'm not sure but I don't think Googlebot will follow them. However for
> the existing spiders there are always the html pages that contain
> parallel linkages that they can crawl.
> 

Would any other tools be able to take advantage of it? It seems to be 
similar to CSV. CSV also doesn't "have a concept of" URLs. I've always 
liked the idea of using URLs as identifiers in the API, but I'm just 
trying identify exactly what we gain by introducing URLs into the data 
formats.

As a lower limit, perhaps we can say that what is good for the CSV goose 
must be good for the JSON gander? Beyond that, I'm wondering who 
benefits. Apart from getting five stars for the API. :-)

Let's try to keep this discussion open?

> ] 3. Can we make use of the HTTP 'Accept' header? We could continue to 
> ] support DDD with application/json and introduce support for RDF with 
> ] application/rdf+xml. We could discriminate with content negotiation.
> 
> Yes this is a good idea. There's already some less automatic linkage
> in the HTML header and a "human click on this if you want RDF" link in
> the page. Autoneg would also be trivial to implement -- "requested
> some RDF? 303 -> semantic.ckan.net".
> 

Sounds good! At the same time, it occurs to me that many of the other 
interfaces I've seen over the last few days present the different 
content types at the same locations. I can't remember seeing one that 
redirects to a different domain. Is that a common or expected thing to do?

I don't know (think you might have told me once...) what exactly the 
cohesive mechanism is behind the RDF service. Whatever it is, perhaps we 
could at least consider calling the service from CKAN's API controllers 
and have the RDF content returned directly? We could also support the 
extensions .json .rdf .n3 so content type can be specified in the locator.

It's just a thought, we wouldn't *necessarily* need to swamp up the CKAN 
codebase. :-)

> ] 4. Where does DCat fit in? Could DCat be the data format used to 
> ] represent a package entity for clients that prefer response content type 
> ] of application/rdf+xml? Would DCat be able to represent a package 
> ] register? Or should we use other RDF elements for that? What about the 
> ] other objects of the model, such as groups?
> 
> DCat is just a vocabulary used in RDF. It represents some elements of
> a package registry, but other vocabularies are used as well. For an
> example, see http://semantic.ckan.net/package/bas-clamp.n3 and 
> http://semantic.ckan.net/catalogue.n3 (note, turtle or N3 will be much
> easier for you as a human to read. Pay little attention to RDF/XML
> unless you are a computer). 
> 

Wow, that's really neat. I haven't looked at this before.

> ] 5. Can CKAN support RDF-enabled developers and RDF-enabled generic tools 
> ]  as a viewpoint on its domain model? The good thing is that the list of 
> ] use cases for the semantic web is very short, and there ought to be (in 
> ] the language of domain driven design) a cohesive mechanism for it. "The 
> ] collection of Semantic Web technologies (RDF, OWL, SKOS, SPARQL, etc.) 
> ] provides an environment where application can query that data, draw 
> ] inferences using vocabularies, etc." That is, apart from fixing the 
> ] content type, we can at least imagine that CKAN could support queries.
> ] http://www.w3.org/standards/semanticweb/data
> 
> It already does: http://semantic.ckan.net/sparql and above links...
> 

That's really great. I hadn't visited those pages before. You'll have to 
explain me how it works. I'm guessing (remembering?) there is a triple 
store and you update it from CKAN's API?

> There's always room for improvement of course, most immediately
> separating out extension descriptions (e.g. so that Richard can
> generate voiD separately and have it pulled into the store) and doing
> something similar for the other instances, but I think CKAN already
> has the proverbial 5 stars.
> 

That's great. What do you mean by "separating out extension descriptions"?

Is the separate "semantic" hostname particularly desirable? If so, would 
it be desirable to have a "semantic." companion for each CKAN site? If 
RDF was returned by the API, it could be returned by resources such as:

http://catalogue.data.gov.uk/api

What's '{"version": "1"}' in RDF? :-)

Whatever the service architecture, given there are so many possibilities 
in between that appear to offer little but agony, we're in great shape. 
I think we could very usefully document these different interfaces (the 
Web Interface, the Semantic Interface, and the Domain Model Interface) 
as a coherent multi-channel provision. I know the Web UI package details 
package presented together links to package resources in different 
formats. But we could make something more of the range of different 
service capabilities. Now that we've identified the rather different 
worlds each addresses, perhaps we could document the different 
engineering purposes?

Or am I just catching up with what everybody already knows? :-)

Great discussion, thanks a lot for it. Let's also continue the 
discussions about CKAN package extras, but maybe on ckan-dev?

Best wishes,

John.