[okfn-help] International development data on CKAN
Jonathan Gray
jonathan.gray at okfn.org
Thu Dec 17 23:20:40 GMT 2009
To follow this up it looks like we've decided to create data on Google
Docs, then to import into CKAN. Question still remains as to whether
(i) effectively fork (copy on CKAN + copy on external website)
(ii) keep primary (latest) copy on CKAN and push to external website
(iii) keep primary (latest) copy on external site and push to CKAN
Have we done this kind of thing before with CKAN package data? Any thoughts?
On Wed, Dec 9, 2009 at 1:24 PM, Jonathan Gray <jonathan.gray at okfn.org> wrote:
> David,
> Many thanks for your reply. This is exactly what I was thinking. We'll
> probably continue to develop the database externally (on Google Docs
> for now!) but would be great to make sure it can be imported to CKAN.
> Also thought it would be a good test of CKAN's support for arbitrary
> metadata. (E.g. development specific stuff.) In longer term there is
> potential that CKAN could be main home for registry of open data on
> international development, and its interesting to think how much it is
> built to give support to other domain-specific querying functonality.
> E.g. could we have a 'country' plugin built on ISO standards? Also, in
> the longer term will CKAN aim to support information such as what
> fields are available in a given dataset - or are there no plans to
> make it this fine-grained? Basically question is how flexible CKAN is
> going to be - and how much it will be possible to drill down and do
> domain specific querying.
> Another question moving forward is do we have 'main' version of our
> database on CKAN and push to another site to publish - or do we have
> main version on another site and pull to CKAN as necessary?
> Jonathan
> On Wed, Dec 9, 2009 at 12:14 PM, David Read <david.read at okfn.org> wrote:
>> Ben,
>> I'd be interested in what Rufus has to say on this, because this is
>> quite fundamental to where CKAN is going, but here are my thoughts.
>> The idea of the metadata in CKAN is to help users find datasets and to
>> help linking of datasets. I think many of the fields you have here are
>> useful in going into CKAN, and perhaps others are best left associated
>> with the data. Here are some examples, based on your example dataset.
>> I can envision someone finding it because they were looking for some
>> data to do with development finance, or to do with Cambodia, or just
>> recent data, for example.
>> Browsing metadata in CKAN they might see the opportunity to examine
>> the link between ODA events in Cambodia and political events and
>> propose plotting amount of money in ODA against time and include
>> events in Cambodia from Microfacts. So in CKAN it is useful to see the
>> temporal coverage of the ODA data, that the fields contain budget
>> information (rather than just a vague description) and that the can be
>> got at in XML (not something difficult like PDF) and that the license
>> is compatible with Microfacts.
>> You have fields to do with provenance and a lot of
>> development-specific information, which is great, but not the sort of
>> thing that CKAN is best at indexing. I can see someone wanting to find
>> all datasets which are 'humanitarian aid' but not 'development aid' or
>> 'compliant with DAC standards', which can all be tagged and textually
>> searched in CKAN, but I expect you would want a custom development
>> search for these fields.
>> If my assumptions are reasonable then it suggests to me that you
>> should have your customised IDD database / website, with each record
>> having the key points synced into a CKAN record. Does anyone else want
>> to comment?
>> David
>> 2009/12/9 Ben Harden <b.e.harden.03 at cantabgold.net>:
>>> Hi David, Rufus,
>>> Thanks for the feedback on the questions Jonathan posted. For reference,
>>> here is a link to the first draft of the IDD fields.
>>> http://spreadsheets.google.com/ccc?key=0AnHh6dpmBwS7dFVlcFZzWV9yVG8tUURRckVDMVo3Q2c&hl=en
>>> This'll still need some cleaning up and standardization, but it's a start!
>>> Not sure which is going to be the best option yet (building in CKAN or
>>> making the database consistent with CKAN)- will probably need some more
>>> assistance on this point in the near future...
>>> Thanks, all the best,
>>> Ben
>>> David Read wrote:
>>>> Ben,
>>>> Pleased to meet you! It sounds excellent to get the IDD onto CKAN.
>>>> Here are some pointers (see below), but feel free to ask more. All the
>>>> best,
>>>> David
>>>>> It would be great if this could be done either via CKAN (and then
>>>>> published on an external website with basic querying functionality),
>>>>> or at least published in a form that meant copies of the profiles
>>>>> could go onto CKAN.
>>>> Yes, do link up with our API data to get data into and out of CKAN. If
>>>> you've not already, see: http://ckan.net/api/
>>>>> * Currently we have several fields that it would be good to import
>>>>> into CKAN preserving some structure (rather than just adding to free
>>>>> text field). I understand CKAN can now support arbitrary metadata. Is
>>>>> this the same as the key/value pairs?
>>>> Yes, see the 'extras' field in a package. The key and value is free
>>>> text. Structure the value field as you see fit.
>>>>> If so can we have more than the
>>>>> three that come up on web interface?
>>>> You can use as many as you like. In the web interface, when you run
>>>> out of fields, hit 'preview' to get some more.
>>>>> It would be great to use this
>>>>> functionality for our data profiles. For example, we have fields with
>>>>> dates and associated country names - which are probably fairly generic
>>>>> fields. Some have many country names associated with them. Perhaps we
>>>>> could use appropriate ISO values?
>>>> We're currently evaluating using Ordnance Survey ontologies but
>>>> geography in the UK (with the current focus on UK government datasets)
>>>> but not clear yet on what to use abroad.
>>>> Rufus mentioned you might be interested in temporal fields and it
>>>> certainly looks like we'll have some for the government data.
>>>>> * In some areas changes to our model would be quite simple. E.g. we
>>>>> currently have a 'contact information' field. We could create a
>>>>> separate email address field which could correspond to the 'owner
>>>>> email ' field or 'maintainer email' field in CKAN. What is distinction
>>>>> between owner/maintainer here?
>>>> Our intention is that 'Author' is the original creator of the data. If
>>>> the Maintainer is different to the Author then supply Maintainer too.
>>>>> * Regarding license field, we currently have 'OKD compliant' as
>>>>> Yes/No field. Would probably better to use something to correspond
>>>>> with drop down menu in CKAN menu. Would a number be best here? If so,
>>>>> is there a list that we can use to link numbers to items in the menus?
>>>> There is an ID for every license (license_id is there but undocumented
>>>> in the API), but I fear it's best not to make a dependency on that. So
>>>> I suggest just using the text name of the license.
>>>>> * Generally, I wonder whether it would be worth looking to existing
>>>>> standards and guidance for this. Especially where fields may be
>>>>> generic. It would be great to ensure the fields in our profiles comply
>>>>> with standards, where standards exist. I wonder whether this has been
>>>>> thought about in relation to government data? Should we be looking to
>>>>> Dublin Core? Are there other metadata standards we should examine?
>>>> A few weeks ago we copied ckan.net packages to an RDF store and you
>>>> can look at the ontologies used here:
>>>> http://api.talis.com/stores/ckan/meta?about=http%3A%2F%2Fckan.net%2Fpackage%2Frdf%2F32000-naples-florida-businesses-kml
>>>> This is still experimental, so things could well change soon.
> --
> Jonathan Gray
> Community Coordinator
> The Open Knowledge Foundation
> http://www.okfn.org
Jonathan Gray
Community Coordinator
The Open Knowledge Foundation
More information about the okfn-help
mailing list