[ckan-discuss] CKAN package ID debate

Luke Closs luke.closs at socialtext.com
Mon Mar 29 22:13:23 BST 2010


Hey folks,

Some thoughts inline.

On 2010-03-23, at 11:11 AM, David Read wrote:

> We're debating changes to the CKAN API to refer primarily to Package
> IDs instead of Package Names. Read on below and please do contribute
> your thoughts.
> 
> David
> 
> Users can interact with CKAN packages through the REST API by
> referring to packages by name. Names don't change much, but we do want
> to support mutability of this field, and so we're looking at using IDs
> to refer to packages in the API instead, since these definitely don't
> change, even when we start syncing packages across multiple CKAN
> instances.
> 
> Examples of current use of package name in API:
> Asking the API for a list of packages: ['aiddata-china', 'naptan',
> 'water-voles-uk']
> Read a package: api/rest/package/aiddata-china returns
> "{'name':'aiddata-china', 'title':'Aid data for China', ...}"
> Search returns a list of matching packages: ['aiddata-china',
> 'naptan', 'water-voles-uk']
> 
> Although the 'title' field is best for human reading, you may want to
> change the package's name for a few reasons. It may appear in a URL
> somewhere and it for various reasons may need to reflect the content.
> e.g. 'water-voles-2006-09' may be better as 'water-voles' when it
> becomes clear that the dataset will be updated in future years. Also
> we may want to change 'osm' to 'open-street-map' to disambiguify when
> another package with those initials comes along, or they change their
> name to 'OpenMap' because of a legal dispute with OSM Inc, and are
> keen to change all references.
> 
> But there are advantages of using names in the API:
> * more human readable
> * aligns CKAN (and datapkg) with apt-get and CPAN, although I get the
> impression those essentially don't allow module names to change

Infact with CPAN once a package is uploaded, it's there.  If you want to change the name, you make and upload a new package.

I like the name based URIs as they look so nice and are easy for people to grok.  But I recently ran into issues with a bulk import of a whole bunch of packages.  The name field was too short for the names i wanted to use, and the acceptable characters was very limiting.

> I think we want to therefore switch to using IDs. Dealing in names as
> well is a 'nice to have' and kept perhaps for backwards compatibility.

Yes, I would like to see us keep the name-based URI beside the ID based URI.  Another option for someone fetching a JSON name URI that had multiple packages would be to return a JSON array. That's not too much of a burden for clients to code against, and would probably never be run into in 98% of usage.

> It's relatively simple to allow users to specify an ID instead of
> names in requests (whilst accepting either). The question is whether
> we return names, IDs or both.

I would prefer to fetch the list of all packages as a json array of hashes, where each hash had the id, name and title (and maybe 1 or 2 other things such as last modified or maintainer) as a "minimal" representation of that resource.

> So here are my suggested options:
> 
> Option A - Use new URLs that include an API version number. Users
> accessing this new version of the API get back package IDs. e.g.
> /api/rest/2.0/package returns ['0d9ea8d59be5', '44758e5a0f9c', ...]
> We could implement API versioning as suggested in the first answer
> here: http://stackoverflow.com/questions/389169/best-practices-for-api-versioning

This is fine, but I suggest you use a monotonically increasing API version instead of 2.0. (eg: just '2').

> Option B - User specifies an option for the return format if he wants
> IDs instead of package names. This could be a URL parameter or HTTP
> header option, although not particularly RESTful.

I say give them both id and name in a hash.

> Option C - Break back compatibility and just return IDs. We are still
> sort of in beta and may not have many API users.

As an API user I'm totally fine with helping co-evolve the API and making backwards incompatible changes at this point.  Now's the time to do this kind of thing IMO.

If IDs become the main thing then please remove some of the constraints on name (eg: length).

Cheers,
Luke


More information about the ckan-discuss mailing list