[ckan-discuss] Multiple package schemas

David Read david.read at okfn.org
Thu Oct 7 17:19:31 BST 2010


Apologies, I didn't mean to suggest you were doing anything wrong - I
meant to say that in those two cases you *specialise* the use of the
field, rather than go against any guidance. CKAN has been pretty
free-form so far, to let people establish use cases. We have an aim to
guide people more and more as usage becomes clearer, and constrain the
form to get more codified input, to bring us towards our aim of
automatable use of the metadata. And looking at what you've done with
our existing structures is very useful!

You highlight some very useful things about extra fields that we can
improve our UI on:
* a button to add more 'extra' fields
* autocomplete in extra field name, so the user can see existing names
and be guided to them or give him confidence in creating a new one
and as I mentioned before:
* better facteted browse / search by extra field keys.

But I'm a bit stumped about how to achieve "format"="dc" *and*
"format"="foaf" in extra fields. Could we allow multiple keys the
same? This once again another RDF precedent...

Predefined values for extra fields would be included in the extra
field schema. As for the other current form fields, the value field
could be represented by a combo-box, date field, check-box, multiple
input boxes etc.. However you input this field, the form will
translate it into a corresponding text value that is actually stored
in the db.

As for ordering these extra fields to get the author_url field near to
the author and author_email fields, I wonder if this problem could be
solved instead by making all of these fields compulsory? If we are
going to allow multiple 'extra field schemas' per package, then the
fields would have to be grouped on the package edit page, to make
sense I think.


On 7 October 2010 16:57, Richard Cyganiak <richard at cyganiak.de> wrote:
> On 7 Oct 2010, at 09:23, David Read wrote:
>> Richard's guidance doesn't contradict any of our core field guidance,
> That's deliberate.
>> apart from in these cases:
>> * he gives more specific instructions for a couple of the resource
>> fields the format field has suggested values like
>> "application/rdf+xml" which is in fact two pieces of data - the
>> purpose of the download (e.g. the application, an example, meta-info,
>> download_page) and the format itself. These would be better in
>> separate columns.
> Rufus has stated at some point that the content of the format field should
> be an Internet Media Type [1], and he encouraged the use of made-up
> “pseudo-types” like “api/search”. So I blame the idea on him ;-)
> I agree that having a “format” field (with media type as value where
> possible) and a separate “purpose” or “type” field with values such as
> “Download”, “Example”, “Schema”, “Documentation”, “API” would be good.
>> * he suggests adding a number of tags according to the properties of
>> the package. I think these would be better stored as extra fields,
> Again, things like the “format-rdf” tag were already widely used on CKAN
> before we started, so again I don't accept the blame ;-)
>> I think he (and others) have chosen tags over extra fields, because tags
>> are easier to browse/search on CKAN.
> That's not the main reason. I think the main reasons for choosing tags over
> custom fields are:
> 1. Tags are more “lightweight”. Coining a new custom field can be a bit
> scary, because it feels like we might perhaps be “polluting” the space of
> field names. Tags are free-form, so there is less concern about coining new
> ones.
> 2. There is no way (as far as I can see) to check if a given custom field
> name has already been used elsewhere, so if I use a “format” or “topic”
> custom field I don't know if I'm stepping on someone else's toe
> 3. Working with custom fields is quite awkward because of the three-fields
> limitation in the form.
> 4. Custom fields are single-value, so you can't say "format"="dc" *and*
> "format"="foaf"
> I'm not sure what this implies for the design of CKAN, just sharing
> experience.
>> If we resolved these two points, I think the LOD use case would
>> suggest a schema that just describes extra fields.
> Not quite. I think that some things can't be solved with just extra fields:
> 1. Pre-defined values for the format field of resources. This is very
> important. This field is the basis for any kind of automated access to the
> data package; free-form text just doesn't cut it. Some of the formats that
> are commonly used in the LOD realm are virtually unknown elsewhere, so the
> values would have to be per-schema I think.
> 2. Positioning of custom fields. The most obviously missing field is “author
> homepage”. You wouldn't believe how many LOD packages have a homepage URL
> stuck behind the author name, or in the email field. Having an “author
> homepage” custom field half a screen down from the “author name/email”
> fields doesn't feel like it would solve this; the custom field would have to
> be located close to the name/email fields.
> These are the biggies I think. Everything else could perhaps be done via
> extra fields.
> Richard
>> David
>> On 6 October 2010 22:53, Tim McNamara <paperless at timmcnamara.co.nz> wrote:
>>> On 7 October 2010 06:58, Richard Cyganiak <richard at cyganiak.de> wrote:
>>>> On 6 Oct 2010, at 18:17, David Read wrote:
>>>>> Excellent point. Yes, maybe we want a 'schema' to merely define
>>>>> specific 'extra' fields, with their validation and later their
>>>>> display. Then you could have a package having several 'schemas' quite
>>>>> simply. The core package fields then wouldn't be affect by any of
>>>>> this.
>>>> But 'schemas' still might want to modify the behaviour of some of the
>>>> core
>>>> fields:
>>>> - add a note underneath the field
>>>> - provide a selection of choices for the resource format field
>>>> - provide a number of checkboxes to add specific tags with special
>>>> meenings
>>>> - ...
>>> Would this level of flexibility be desirable? It may it things very
>>> difficult to build applications on the basis of CKAN's packages if they
>>> have
>>> different structures. I prefer the idea of a common set of information
>>> that
>>> is fixed with possible extensions. I think there should be a strong
>>> community push to keep to the common set unless there are compelling
>>> reasons
>>> (necessity) to add an extension.
>>> Tim.

More information about the ckan-discuss mailing list