[ckan-discuss] API For Package Name

Sun Nov 28 14:00:20 GMT 2010

Hi Richard,

Richard Cyganiak wrote:
> On 24 Nov 2010, at 15:28, John Bywater wrote:
>> The API is not very hypermedia-driven at the moment. […] Perhaps we 
>> could go right back to the start, and look at the package register? At 
>> the moment it returns a list of package IDs (or package names in API 
>> Version 1). I'm detecting that you'd slightly prefer a list of 
>> absolute URLs. :-)
> 
> Yes!
> 

Thought so. ;-)

>> Please forgive me, but recently I have been unable to decide whether 
>> or not the IDs can be treated as relative URLs, with the locator of 
>> package register (somehow) as the base URL? What do you think about 
>> that? Is there a definitive answer?
> 
> Good question. Some representation formats, such as HTML (and, in fact, 
> RDF!), are designed explicitly for hypermedia, and the format specifies 
> which parts of the message are URLs and what base URL they are resolved 
> against. That makes the use of relative URLs straightforward, which is 
> good for keeping message size down. JSON, for all its advantages, 
> unfortunately doesn't know a URL from a string.
> 

Thanks for saying that.

How do we define hypermedia? From what I can see, a hypermedia system 
doesn't need to involve URLs, but rather there needs to be: a series of 
resources where any resource can reference any other resource; an 
independent means of accessing representations of referenced resources; 
and a capability to infer references to resources from within a 
representation of resource.

If so, CKAN+ckanclient appears already to be a hypermedia system.

Also, isn't RESTfulness more generic than either the Web or RDF?

That is, a RESTful API doesn't need to conform to the Web. A Web UI is 
RESTful for Web clients. The CKAN API is RESTful for ckanclients. The 
Web, RDF, and CKAN share common principals (the principals of REST).

They all use HTTP. But where the Web uses HTML, RDF uses RDF/XML (or 
RDF/JSON, or something like that), and CKAN uses its own JSON schema.

>> Wikipedia says, "If it is likely that the client will want to access 
>> related resources, these should be identified in the representation 
>> returned, for example by providing their URIs in sufficient context, 
>> such as hypertext links." There are identifiers. Are we missing the 
>> "sufficient context", or is that provided by the published resource 
>> locator templates? I really don't know. I've seen some discussions 
>> about it being okay given that the locator templates are published. 
>> But I wasn't totally convinced. :-)
>>
>> So, if we prefix each with the package registry locator, then the 
>> message size goes up, but probably to no more than double. So that's 
>> okay? And given the deliberate redundancy, it may be more susceptible 
>> to compression than the average message.
> 
> I think you summed up the trade-off very well. If URLs are included in 
> the messages, then less context has to be hardcoded into clients -- I 
> wouldn't need to tell the client that the URL for retrieving a package 
> representation is obtained by concatenating the returned package ID to 
> "http://ckan.net/api/rest/package/". That would simplify the clients, 
> and make them more resilient against future change on the server (such 
> as a move to a different domain!).
> 
> On the downside, message size increases.
> 
> Personally, I have a high tolerance for redundancy in messages. A 
> side-effect of prolonged exposure to RDF ;-)
> 

Thanks for the compliments. I'm not necessarily averse to that. The word 
I should perhaps have selected is not redundancy but rather repetition? 
Repetition is not redundant if it's necessary. :-)

> A good compromise might be to use URLs relative to the API base URL 
> "http://ckan.net/api/rest/" when referring to resources within messages. 
> So we'd have "package/this" and "package/that" and "group/foo" and 
> "tag/bar" etc.
> 

Yes, it might be. Or is that a half-way house that would please nobody?

>> Is that the sort of thing you'd like to see? We could make a list.
>>
>> The API is versioned, so we could develop all this into Version 3.
> 
> +1!
> 

Great. Let's see what we can do. There's a lot to consider.

I've been reading more about RDF. With its abstract data model, the data 
formats (e.g. "RDF/XML"), and the libraries and tools, RDF appears to be 
a hypermedia system par excellence. (The only fracture seems to be the 
multitude of offerings which appear as attempts perhaps to win fame by 
making RDF a bit more approachable. In doing so perhaps they only 
compound the difficulties it presents? I'm not sure.)

I was also reading about the relation between linked data and RDF. 
Whereas linked data seems originally to have been conceived in terms of 
RDF (or at least to expect usage of RDF), the 5-stars of linked open 
data clearly do not mandate RDF (you can use CSV and still have five stars).

I was also looking at how JSON appears within RDF and linked data. There 
is RDF/JSON and JSON-LD. There is also (for example) JRON and probably a 
dozen other offerings arrayed across a spectrum of possibility. None 
appear to have achieved pre-eminence or even ubiquity. Therefore none of 
them appear to be especially desirable (at least at the moment for a 
"normal" developer).

I also found a message from Jeni Tenison, which started a long thread:
http://www.mail-archive.com/public-lod@w3.org/msg04086.html

<quote>
As part of the linked data work the UK government is doing, we're 
looking at how to use the linked data that we have as the basis of APIs 
that are readily usable by developers who really don't want to learn 
about RDF or SPARQL.

One thing that we want to do is provide JSON representations of both RDF 
graphs and SPARQL results. I wanted to run some ideas past this group as 
to how we might do that.

To put this in context, what I think we should aim for is a pure 
publishing format that is optimised for approachability for normal 
developers, *not* an interchange format. RDF/JSON [1] and the SPARQL 
results JSON format [2] aren't entirely satisfactory as far as I'm 
concerned because of the way the objects of statements are represented 
as JSON objects rather than as simple values. I still think we should 
produce them (to wean people on to, and for those using more generic 
tools), but I'd like to think about producing something that is a bit 
more immediately approachable too.
</quote>

So, looking at all this from a distance, there appears to be two poles: 
RDF for RDF-enabled developers and RDF-enabled generic tools, and (for 
the sake of simplicity, let's assume) domain driven design for "normal" 
developers who are writing "normal" applications.

I should admit that my head nearly exploded trying to match up RDF with 
domain driven design. Metaphorically speaking, I had to switch 
everything off for a while and allow the heat to dissipate. I was left 
with the impression that there's a gross impedance mis-match between 
domain driven design and RDF. That is to say:

1. A domain model is a model of behaviour-state. An RDF model is a model 
of state (RDF models behaviour as state). That is, a Domain model 
constitutes its own changes of state, whereas an RDF model expects 
something external to constitute changes of state.

2. The scope of RDF is the World, vocabularies are fashioned to 
represent factual aspects of the World (even if such a fact pertains to 
an actual fiction). The scope of a domain model is a circumscribed 
domain, the rest of the World is always already out of scope. Domain 
driven design fashions worlds from domains. The little worlds of domain 
driven design are contrived and immanent fictions, but they are always 
based on facts.

3. Domain driven design (from the standpoint of RDF) is therefore highly 
parochial ("you should use URLs instead of local names so everybody 
knows what you mean"), whereas RDF (from the standpoint of domain driven 
design) is highly static ("you should collocate behaviour with state so 
there is an rhizome of coherent objects that does useful work").

4. RDF can incorporate domain driven design as a way of fashioning a new 
vocabulary, but without its behaviour not even instantiation can happen.

5. Domain driven design can incorporate RDF as a ready-made data model 
for an infinite domain, but behaviour would need to be reconstituted 
from a state representation of behaviour (software is also data).

So I wondered:

1. Can we at once make CKAN conform with the 5-stars of linked open data 
by using URLs for identifiers and enhance the experience for normal 
developers? That is, would a better domain model be obtained by using 
URLs instead of the locally typed and scoped identifiers that are 
contextualized by (and therefore receive meaning from) the domain model? 
CKAN would then have five stars (it currently has three stars?).

2. If CKAN presents JSON with absolute URLs where there are currently 
invariant UUIDs (in Version 2, or variable names in Version 1), which 
existing tools would be able to undertake traversals? For example, would 
Googlebot (or anything else in operation today) treat URLs in JSON as 
links, and follow them?

3. Can we make use of the HTTP 'Accept' header? We could continue to 
support DDD with application/json and introduce support for RDF with 
application/rdf+xml. We could discriminate with content negotiation.

4. Where does DCat fit in? Could DCat be the data format used to 
represent a package entity for clients that prefer response content type 
of application/rdf+xml? Would DCat be able to represent a package 
register? Or should we use other RDF elements for that? What about the 
other objects of the model, such as groups?

5. Can CKAN support RDF-enabled developers and RDF-enabled generic tools 
  as a viewpoint on its domain model? The good thing is that the list of 
use cases for the semantic web is very short, and there ought to be (in 
the language of domain driven design) a cohesive mechanism for it. "The 
collection of Semantic Web technologies (RDF, OWL, SKOS, SPARQL, etc.) 
provides an environment where application can query that data, draw 
inferences using vocabularies, etc." That is, apart from fixing the 
content type, we can at least imagine that CKAN could support queries.
http://www.w3.org/standards/semanticweb/data

Best wishes,

John.

PS Can we do it? Yes we CKAN! :-)

> Best,
> Richard
> 
> 
> 
>>
>> Best wishes,
>>
>> John.
>>
>>
>>> Best,
>>> Richard
>>
>