[ckan-discuss] NZ CKAN Instance - Ideas

Glen Barnes glen at opengovt.org.nz
Wed Jun 16 02:30:53 BST 2010


Hi David,

A few notes below.
 
On 15/06/2010, at 9:45 PM, David Read wrote:

> Glen,
> 
> Many thanks for sending this feedback on CKAN as a whole from your
> experience in NZ. It is incredibly useful and I think we need to sift
> through this carefully. Here's my take - please do feed back and we
> can work on some concrete improvements.

No problem. I tried to pack a bunch into the email to get it on record so we can move these things going forward. Some notes below.

> 
> Groups - Specifying department and sub-department can easily be
> achieved in the 'extra' fields, and that's what we do with other
> countries' government data. But it's interesting that you thought the
> 'groups' option would be way to do this, and suggests to me that we
> should do more to expose ways to browse and search by department and
> sub-department in the UI.

I guess I looked at groups as being like that as they were a top level item in the site. Of course using the metadata fields would be the best and we can make up a list of meta data fields that we need for NZ and build these in. I'm not sure if there is a defined set of these? I've been getting into some Open Street Map bulk uploading recently and although their system of deciding on tags is not perfect the idea of working on common sets of tags across countries is quite good. For example I mention below a feature of showing where data is being used. We could come up with a common set of tags:

external_use:=zoodle
external_use:url=http://www.zoodle.co.nz 
external_use:description=NZ Governement data is used to show property information on a map and combined with other  proprietary data 
external_use:organisation=Zoodle Limted
external_use:screenshot(s)=http://some/url
external_use:thumbnail=http://some/url
external_use:related_package(s)=nz_school_boundaries
external_use:related_package(s)=nz_rental_statistics

external_use:=zoodle
external_use:url=http://www.zoodle.co.nz 
external_use:description=NZ Governement data is used to show property information on a map and combined with other  proprietary data 
external_use:organisation=Zoodle Limted
external_use:screenshot(s)=http://some/url
external_use:thumbnail=http://some/url
external_use:related_package(s)=nz_linz_cadastral
external_use:related_package(s)=nz_rental_statistics

Of course the UI for this could make it nice and easy to add these relationships.

> 
> Theming - We have recently introduced ways to allow CKAN visual
> templates to be customisable - see http://wiki.okfn.org/ckan/doc/theme

The wiki is down right now - I'll take a look at it. I guess one fundamental question is whether you want the CKAN database to be consumer facing or not. If so then I'm happy to help out with a bit of direction in terms of some of the UX issues.

> 
> Jargon ('package') - We see a core advantage of CKAN is the concept of
> data packaging (the 'Debian of data' concept), so perhaps we need to
> explain that. Maybe we should rename from 'Packages' to 'Data
> Packages'? Or have a side-bar 'What is a Package of data?' linking to
> expanations.

I just don't think you are going to get people to change their thinking on what a data set is. Take these 2 Google Searches:

http://www.google.co.nz/search?hl=en&client=safari&rls=en&&sa=X&ei=cCIYTJ7nFZGxcYLLkewK&ved=0CBYQBSgA&q=open+government+data+set&spell=1

http://www.google.co.nz/search?hl=en&client=safari&rls=en&q=open+government+package&aq=f&aqi=&aql=&oq=&gs_rfai=

We are used to talking about open data and data sets. Packages is a foreign term to most people and while the underlying idea of versioning, etc. is sound I just don't think in the long term that it will be useful for the uptake of the use of the data (see SEO below - using the term data set will also increase findability).

Check out this  post here on Ubuntu and particularly items 3 and 6 - (http://design.canonical.com/2010/06/when-new-users-first-encounter-ubuntu-5-show-stoppers/)

> 
> New package form - Yes I agree we can add more explanation text. And
> we're talking about having some customisations for different
> sorts/sources of data. We have already made it easy for an instance to
> heavily customise the form (see http://ca.ckan.net/package/new for
> example). But ckan.net being central, it needs to cater for all sorts
> of data and help the user. It would be great to have you involved in
> designing this going forward.

No problem in being involved. Maybe to keep the codebase modular we could add form templates and and some configuration items on a instance by instance basis. We can then share these when it makes sense.
 
> 
> Link to uses - A package has many associations. Of course plenty can
> already be added as resources: mirrors of data, scraped versions,
> derived data, SPARQL endpoints added on. You're quite right that a
> 'use' of the data would be good to link to. Property data aggregation
> site is one sort, people are plotting geo data on a map, visualising
> in other ways, combining with other datasets, writing news articles
> about the data, commenting on the methods etc. The data.gov.uk site is
> trying this out with per-package comments and wiki and I don't see
> much use of these particular features so far. Indeed their email list
> has seemed by far the most effective way to pool interesting and
> related information.

Right - Anywhere we have a version of the data that is not the canonical source could be a use for this. I was thinking of something more structured as above. This is mainly for the politicians so they can have case studies to point to  "I opened up data set x and now small businesses save n hours a month saving them $40 million a year." They want/need this information to justify opening more data. We can accelerate our cause if we can prove benefits. Also for people who are remixing it gives them some SEO love and another outlet for promoting what they are doing.

> 
> Cost of datasets - I'm not sure we've thought much about this - we
> seem more focussed on open data. I'm not sure that advertising costs
> of data on CKAN will drive them down. In the UK, the meme of Freedom
> of Information, lobbying by Tim Berners-Lee and new economists seem to
> be the most effective.

We have the problem that some data is theoretically 'open' (whatever that means) but there are costs associated with them. Again we can store this as metadata and have a customised dataset view page which formats this how we like it.

> 
> Search engine optimisation - excellent - we're keen to improve on
> this. I've created a ticket to collect these ideas and get it done:
> http://knowledgeforge.net/ckan/trac/ticket/350

Nice - I'm by no means an SEO expert but I'll try and get friend to do one of his SEO audits on the site and give some feedback.

> 
> Harvesting other Catalogues - We've worked a great deal on tools to do
> this, with the API, getdata scripts, spreadsheet importer, changeset
> mechanism etc. Several batches of meta data from other sources have
> gone in. Going forward we need to work out metadata to target
> importing, how best to synchronise changes and feed back corrections.
> 
> Data migration of NZ data - is this all done by Tim McNamara now or do
> we need to discuss this more?

Yeah. A single import has been done. We have to migrate my existing cat.open.org.nz dataset, dedup and set setup regular syncs between the data.govt.nz and NZ CKAN. Maybe the first step is to get a dev/staging environment set up for an NZ instance. I want to get things working on that first before switching cat.open.org.nz over. 





More information about the ckan-discuss mailing list