[okfn-help] International development data on CKAN

Jonathan Gray jonathan.gray at okfn.org
Wed Dec 9 13:24:22 GMT 2009


David,

Many thanks for your reply. This is exactly what I was thinking. We'll
probably continue to develop the database externally (on Google Docs
for now!) but would be great to make sure it can be imported to CKAN.

Also thought it would be a good test of CKAN's support for arbitrary
metadata. (E.g. development specific stuff.) In longer term there is
potential that CKAN could be main home for registry of open data on
international development, and its interesting to think how much it is
built to give support to other domain-specific querying functonality.
E.g. could we have a 'country' plugin built on ISO standards? Also, in
the longer term will CKAN aim to support information such as what
fields are available in a given dataset - or are there no plans to
make it this fine-grained? Basically question is how flexible CKAN is
going to be - and how much it will be possible to drill down and do
domain specific querying.

Another question moving forward is do we have 'main' version of our
database on CKAN and push to another site to publish - or do we have
main version on another site and pull to CKAN as necessary?

Jonathan

On Wed, Dec 9, 2009 at 12:14 PM, David Read <david.read at okfn.org> wrote:
> Ben,
>
> I'd be interested in what Rufus has to say on this, because this is
> quite fundamental to where CKAN is going, but here are my thoughts.
>
> The idea of the metadata in CKAN is to help users find datasets and to
> help linking of datasets. I think many of the fields you have here are
> useful in going into CKAN, and perhaps others are best left associated
> with the data. Here are some examples, based on your example dataset.
>
> I can envision someone finding it because they were looking for some
> data to do with development finance, or to do with Cambodia, or just
> recent data, for example.
>
> Browsing metadata in CKAN they might see the opportunity to examine
> the link between ODA events in Cambodia and political events and
> propose plotting amount of money in ODA against time and include
> events in Cambodia from Microfacts. So in CKAN it is useful to see the
> temporal coverage of the ODA data, that the fields contain budget
> information (rather than just a vague description) and that the can be
> got at in XML (not something difficult like PDF) and that the license
> is compatible with Microfacts.
>
> You have fields to do with provenance and a lot of
> development-specific information, which is great, but not the sort of
> thing that CKAN is best at indexing. I can see someone wanting to find
> all datasets which are 'humanitarian aid' but not 'development aid' or
> 'compliant with DAC standards', which can all be tagged and textually
> searched in CKAN, but I expect you would want a custom development
> search for these fields.
>
> If my assumptions are reasonable then it suggests to me that you
> should have your customised IDD  database / website, with each record
> having the key points synced into a CKAN record. Does anyone else want
> to comment?
>
> David
>
>
> 2009/12/9 Ben Harden <b.e.harden.03 at cantabgold.net>:
>> Hi David, Rufus,
>>
>> Thanks for the feedback on the questions Jonathan posted. For reference,
>> here is a link to the first draft of the IDD fields.
>>
>> http://spreadsheets.google.com/ccc?key=0AnHh6dpmBwS7dFVlcFZzWV9yVG8tUURRckVDMVo3Q2c&hl=en
>>
>> This'll still need some cleaning up and standardization, but it's a start!
>> Not sure which is going to be the best option yet (building in CKAN or
>> making the database consistent with CKAN)- will probably need some more
>> assistance on this point in the near future...
>>
>> Thanks, all the best,
>>
>> Ben
>>
>> David Read wrote:
>>>
>>> Ben,
>>>
>>> Pleased to meet you! It sounds excellent to get the IDD onto CKAN.
>>> Here are some pointers (see below), but feel free to ask more. All the
>>> best,
>>>
>>> David
>>>
>>>
>>>>
>>>> It would be great if this could be done either via CKAN (and then
>>>> published on an external website with basic querying functionality),
>>>> or at least published in a form that meant copies of the profiles
>>>> could go onto CKAN.
>>>>
>>>
>>> Yes, do link up with our API data to get data into and out of CKAN. If
>>> you've not already, see: http://ckan.net/api/
>>>
>>>
>>>>
>>>>  * Currently we have several fields that it would be good to import
>>>> into CKAN preserving some structure (rather than just adding to free
>>>> text field). I understand CKAN can now support arbitrary metadata. Is
>>>> this the same as the key/value pairs?
>>>>
>>>
>>> Yes, see the 'extras' field in a package. The key and value is free
>>> text. Structure the value field as you see fit.
>>>
>>>
>>>>
>>>> If so can we have more than the
>>>> three that come up on web interface?
>>>>
>>>
>>> You can use as many as you like. In the web interface, when you run
>>> out of fields, hit 'preview' to get some more.
>>>
>>>
>>>>
>>>> It would be great to use this
>>>> functionality for our data profiles. For example, we have fields with
>>>> dates and associated country names - which are probably fairly generic
>>>> fields. Some have many country names associated with them. Perhaps we
>>>> could use appropriate ISO values?
>>>>
>>>
>>> We're currently evaluating using Ordnance Survey ontologies but
>>> geography in the UK (with the current focus on UK government datasets)
>>> but not clear yet on what to use abroad.
>>>
>>> Rufus mentioned you might be interested in temporal fields and it
>>> certainly looks like we'll have some for the government data.
>>>
>>>
>>>>
>>>>  * In some areas changes to our model would be quite simple. E.g. we
>>>> currently have a 'contact information' field. We could create a
>>>> separate email address field which could correspond to the 'owner
>>>> email ' field or 'maintainer email' field in CKAN. What is distinction
>>>> between owner/maintainer here?
>>>>
>>>
>>> Our intention is that 'Author' is the original creator of the data. If
>>> the Maintainer is different to the Author then supply Maintainer too.
>>>
>>>
>>>>
>>>>  * Regarding license field, we currently have 'OKD compliant' as
>>>> Yes/No field. Would probably better to use something to correspond
>>>> with drop down menu in CKAN menu. Would a number be best here? If so,
>>>> is there a list that we can use to link numbers to items in the menus?
>>>>
>>>
>>> There is an ID for every license (license_id is there but undocumented
>>> in the API), but I fear it's best not to make a dependency on that. So
>>> I suggest just using the text name of the license.
>>>
>>>
>>>>
>>>>  * Generally, I wonder whether it would be worth looking to existing
>>>> standards and guidance for this. Especially where fields may be
>>>> generic. It would be great to ensure the fields in our profiles comply
>>>> with standards, where standards exist. I wonder whether this has been
>>>> thought about in relation to government data? Should we be looking to
>>>> Dublin Core? Are there other metadata standards we should examine?
>>>>
>>>
>>> A few weeks ago we copied ckan.net packages to an RDF store and you
>>> can look at the ontologies used here:
>>>
>>> http://api.talis.com/stores/ckan/meta?about=http%3A%2F%2Fckan.net%2Fpackage%2Frdf%2F32000-naples-florida-businesses-kml
>>> This is still experimental, so things could well change soon.
>>>
>>
>>
>



-- 
Jonathan Gray

Community Coordinator
The Open Knowledge Foundation
http://www.okfn.org



More information about the okfn-help mailing list