[ckan-discuss] Multiple package schemas

Richard Cyganiak richard at cyganiak.de
Thu Oct 7 16:57:22 BST 2010

On 7 Oct 2010, at 09:23, David Read wrote:
> Richard's guidance doesn't contradict any of our core field guidance,

That's deliberate.

> apart from in these cases:
> * he gives more specific instructions for a couple of the resource
> fields the format field has suggested values like
> "application/rdf+xml" which is in fact two pieces of data - the
> purpose of the download (e.g. the application, an example, meta-info,
> download_page) and the format itself. These would be better in
> separate columns.

Rufus has stated at some point that the content of the format field  
should be an Internet Media Type [1], and he encouraged the use of  
made-up “pseudo-types” like “api/search”. So I blame the idea on him ;-)

I agree that having a “format” field (with media type as value where  
possible) and a separate “purpose” or “type” field with values such as  
“Download”, “Example”, “Schema”, “Documentation”, “API” would be good.

> * he suggests adding a number of tags according to the properties of
> the package. I think these would be better stored as extra fields,

Again, things like the “format-rdf” tag were already widely used on  
CKAN before we started, so again I don't accept the blame ;-)

> I think he (and others) have chosen tags over extra fields, because  
> tags
> are easier to browse/search on CKAN.

That's not the main reason. I think the main reasons for choosing tags  
over custom fields are:

1. Tags are more “lightweight”. Coining a new custom field can be a  
bit scary, because it feels like we might perhaps be “polluting” the  
space of field names. Tags are free-form, so there is less concern  
about coining new ones.

2. There is no way (as far as I can see) to check if a given custom  
field name has already been used elsewhere, so if I use a “format” or  
“topic” custom field I don't know if I'm stepping on someone else's toe

3. Working with custom fields is quite awkward because of the three- 
fields limitation in the form.

4. Custom fields are single-value, so you can't say "format"="dc"  
*and* "format"="foaf"

I'm not sure what this implies for the design of CKAN, just sharing  

> If we resolved these two points, I think the LOD use case would
> suggest a schema that just describes extra fields.

Not quite. I think that some things can't be solved with just extra  

1. Pre-defined values for the format field of resources. This is very  
important. This field is the basis for any kind of automated access to  
the data package; free-form text just doesn't cut it. Some of the  
formats that are commonly used in the LOD realm are virtually unknown  
elsewhere, so the values would have to be per-schema I think.

2. Positioning of custom fields. The most obviously missing field is  
“author homepage”. You wouldn't believe how many LOD packages have a  
homepage URL stuck behind the author name, or in the email field.  
Having an “author homepage” custom field half a screen down from the  
“author name/email” fields doesn't feel like it would solve this; the  
custom field would have to be located close to the name/email fields.

These are the biggies I think. Everything else could perhaps be done  
via extra fields.


> David
> On 6 October 2010 22:53, Tim McNamara <paperless at timmcnamara.co.nz>  
> wrote:
>> On 7 October 2010 06:58, Richard Cyganiak <richard at cyganiak.de>  
>> wrote:
>>> On 6 Oct 2010, at 18:17, David Read wrote:
>>>> Excellent point. Yes, maybe we want a 'schema' to merely define
>>>> specific 'extra' fields, with their validation and later their
>>>> display. Then you could have a package having several 'schemas'  
>>>> quite
>>>> simply. The core package fields then wouldn't be affect by any of
>>>> this.
>>> But 'schemas' still might want to modify the behaviour of some of  
>>> the core
>>> fields:
>>> - add a note underneath the field
>>> - provide a selection of choices for the resource format field
>>> - provide a number of checkboxes to add specific tags with special
>>> meenings
>>> - ...
>> Would this level of flexibility be desirable? It may it things very
>> difficult to build applications on the basis of CKAN's packages if  
>> they have
>> different structures. I prefer the idea of a common set of  
>> information that
>> is fixed with possible extensions. I think there should be a strong
>> community push to keep to the common set unless there are  
>> compelling reasons
>> (necessity) to add an extension.
>> Tim.

More information about the ckan-discuss mailing list