[ckan-discuss] NZ CKAN Instance - Ideas

Glen Barnes glen at opengovt.org.nz
Sat Jun 12 00:38:02 BST 2010

Hi All,

I had a great chat with Jonathan Gray last week about moving the NZ Open Data Catalogue to the CKAN platform. He suggested I post to the list outlining some of the ideas and thinking behind where I want to see things go in terms of the platform so it meets our needs. Below is a bit of a rambling set of things that I have - By no means exhaustive but it may spur some discussion. I would be interested in feedback both for and against anything below.

Open Data Catalogue Background

I had been interested in open data for a while as I was always hamstrung when trying to work on projects by either the lack of data, the price or an inability to actually find the data I wanted. My main interest is around using open data to improve efficiencies for companies and individuals. For example my day job os working for Zoodle.co.nz where we try and aggregate many datasets into a single site for property information. This saves the user time as they do not have to visit the Ministry of Education, Department of Building and Housing and the local council website to find the information they are after (Well that is the theory - some of the datasets we want are not  available or way to expensive right now to give away to our users for free).

Last year a lot of people had been talking about setting up some form of catalogue but nothing had been done to date or the discussions seemed to be based around standards and met data, etc. My thought was that if we got something up at least it would spark some interest. On that note I sent about 20 hours putting together something in WordPress and launched http://cat.open.org.nz/. It has served its initial purpose to get people interested and spark some interest from within government (we now have a data.govt.nz and we continue to work with them to improve things).

Now it's time to step things up a notch and make the catalogue more useful. WordPress being, well WordPress, means it is not exactly the right platform to use especially if you are not a PHP programmer as we need some pretty specific things to make it a better catalogue. I originally looked at the Sunlight Foundations http://nationaldatacatalog.com/ code base as it was Ruby/Rails and I have a little bit of experience with that and the design was quite clean. I've now realised I can't really do this alone and the example sites coming out from CKAN are starting to look really nice (an important part of the end user experience in my mind).

So given the above here is my ideas around what I would like to see in the CKAN codebase going forward:

Nested Groups

At the moment the 'groups' functionality only has one level (http://www.ckan.net/group/). I was thinking that we would use the groups feature to split out out datasets into departments like we have already - http://cat.open.org.nz/category/official_source/. I've built the ODC on the premise that it covers every organisation covered by the Official Information Act and the Local Government Official Information Act. We have these nested in this way:

- Central Government

-- Public Service Department
-- Crown Agents, Autonomous Crown Entities, Crown Entitity Companies, Trusts
-- DHBs
-- Crown Research Institutes
-- Reserve Bank of New Zealand
-- Non Public Service Departments
-- Office of Parliament
-- Education Institutions and Wananga
-- State-Owned Enterprises

- Local Government

-- City and District Councils
-- Regional Councils and Territorial Authorities


I would like to able to do some (basic?) theming of the site. This I would like to do:

- Custom design of the home page. I prefer the more basic 'web app' style of home pages with less data, cleaner interface.
- Ability to pull in an RSS feed on the homepage for news items. I don't think the catalogue has to add blog features but it would be good if we could pull in articles from our main blog to display on the home page.
- Naming of items - I don't think the public knows what a package is. So using terms like "dataset" throughout. Also things like "Register a new package" would probably be best worded as "Add a new dataset".

- Submitting new datasets

I like the adding new dataset page here - http://www.ckan.net/package/new. It may be a little bit intimidating for a few people as the form is quite long. Maybe we can look at redesigning the form at some point to add some instructions, look at how we select a license, change some of the wording to be human ("Add row to table" -> "Add another download format"). Also for different types of datasets we will have different metadata and it would be good to have the form adapt to the different formats.

Linking to uses

One of the key things that people from within government want is concrete examples of where this data is being used. I'm really keen of having the ability for people to add links to sites where the data is being used (commercial and non-profit). The user could add a title, link, description, contact and optional screenshot (we could auto populate the screenshot by pulling from the linked site). They user could link this site to one or more datasets. Building up these case studies makes it a great one stop shop for searching for uses of the information. We could then generate a pretty nice report for each department showing what datasets they have,  etc.

Getting Access to Datasets

One of the key problems I have run into is a) where do we get the data from and b) how much does it cost. A lot of government data is available but not published anywhere online and quite often has a cost associated with it. I'm really keen to get this on the catalogue so people can see where councils are charging. During my latest attempt to get some council data it was going to cost me $40,000/year and when I made an LGOIA request to see how much they make of this dataset it was only $30K/year total.

I'm guessing that we can do this using custom fields but on a catalogue by catalogue basis we will want to think about the metadata we want to collect and format the add dataset form accordingly. Again I guess this is some form of config/theming issue.


We've been pretty successful at SEO without even really trying (see http://www.google.co.nz/search?client=safari&rls=en&q=auckland+google+transit+feed&ie=UTF-8&oe=UTF-8&redir_esc=&ei=dsYSTOzJLs2eceuZiI8I as an example). This to me is key. If we are to make data available it has to be findable which is the main reason for a catalogue. There are probably things we should be doing on CKAN like using slugged urls (http://www.ckan.net/package/ascoe -> http://www.ckan.net/package/ascoe/atmospheric-chemistry-studies-in-the-oceanic-environment), setting the H1 tag correctly ("Atmospheric Chemistry Studies in the Oceanic Environment"  on the example above). Some basic SEO 101 on page optimisations. 

Harvesting other Catalogues

At the moment we don't harvest data from any other catalogues but I do want to start by getting access to the data.govt.nz dataset (they used ours as a base for theirs when they set it up). and using these external catalogues as the canonical versions if that makes sense (augmented by local information that they may not want to share like contact names and pricing).

Data Migration

As mentioned above the catalogue is in WordPress right now so it will have to be migrated to the CKAN format. The database is WordPress formatted with some custom form plugins so it is readable but hooking up the tables takes a little bit of work trying to work out the right keys to join on. When we get to the migration step I can give people a copy. I don't want to publish this anywhere on the net as it does have email addresses of people in some of the tables. Let me know via a direct email if you want to have a look at it.

Glen Barnes
New Zealand Open Data Catalogue


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-discuss/attachments/20100612/3b7192c4/attachment.htm>

More information about the ckan-discuss mailing list