[ckan-discuss] 262 New Zealand, 70 Australian datasets happily in CKAN

Tim McNamara paperless at timmcnamara.co.nz
Mon Jun 14 10:36:36 BST 2010

On 14 June 2010 20:17, Rufus Pollock <rufus.pollock at okfn.org> wrote:

> On 13 June 2010 03:46, Tim McNamara <paperless at timmcnamara.co.nz> wrote:
> > Hi all,
> >
> > Just a heads up that every dataset from http://data.australia.gov.au &
> > http://data.govt.nz is now also found at CKAN
> Amazing work Tim. BTW, if you'd be happy to, we'd love to store a copy
> of your scripts in our ckanext repo under, say, "nz" and "australia":
> <http://knowledgeforge.net/ckan//ckanext>


I did notice the two that in there when I checked out the ckan client. Which
was after I had implemented my own client via httplib2 :/

I have some tiding up to do of the code, including full reST docstrings etc.
In fact, I would like to see if I can change the two scripts to become
config files. Hopefully, in a similar manner to a unittest runner, a central
module will simply follow the instructions in each of the config files it
finds and send info to CKAN/file/whereever. If I don't make any significant
process, I'll just send the two scripts through.

> (If you happen to be using python mercurial easiest way may be to fork
> from our mirror on bitbucket: <http://bitbucket.org/okfn/ckanext>, add
> changes and then we'll pull)

I tend to use Bazaar/Launchpad for most of my work. I quite enjoy using hg
too though. This wont be a problem.

> > Generate format-xls/format-pdf/etc tags depending on the file types.
> Even better you could add this format info (if you haven't already) to
> the resources (they now have a format attribute) and we're sort of
> deprecating use of tags for specifying formats in favour of these
> dedicated fields.

Yes, I'm sending this through. At the moment, I'm just extracting the text
that's on the host's website. So, NZ data is often classified as format:
"Spreadsheet".[1] I am tempted to do an HTTP HEAD request for each of the
resources & investigate the Content-Type header.

Re tags, noted.

 > Licencing information is not yet being sent to CKAN in the format it
> wants.
> > I have included the original text, but getting a machine to guess what
> the
> > licence is and then match it to the id codes of licences that are already
> > accepted seems like a task for when I need to procrastinate. Basically,
> the
> > full licence details appear in the details section down the bottom of
> each
> > page, but not in the info box on the top right.
> The question of how we deal with licenses effectively going forward is
> an interesting one. Several people have already suggested a dedicated
> free text field for license info in addition to the enumeration.
> Personally, I feel if it is free text you may as well put that info in
> the notes ...

By free text, I mean "Creative Commons - Attribution 2.5 Australia

One problem is that there are so many CC licences, given that each country
has its own version. Perhaps the global CKAN instance could have
CC-BY-Country and CC-BY-Unported. That would imply that people using the
data would need to refer to the specific local licence, or take the global

>  > None of the files have hashes. I am reluctant to add my own hashes to
> the
> > downloaded files, because I can't ensure their authenticity as a third
> > party.
> Very reasonable, though at this point I think it will be need to be
> 3rd parties who add them -- or even a bot we use to go through
> nightly. (Perhaps we then add a suitable notice no the site). One
> reason this is useful is not just doing authenticity but being to tell
> when files have been updated (but name kept the same).
> Rufus

Cheers Rufus, thanks for the response.

[1] Example: http://www.ckan.net/package/immigration-new-zealand-statistics
[2] http://www.ckan.net/package/australia-nsw-crime-data
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-discuss/attachments/20100614/7f0cdbfe/attachment.htm>

More information about the ckan-discuss mailing list