[okfn-discuss] CKAN spam

Rufus Pollock rufus.pollock at okfn.org
Mon Jul 13 19:08:28 UTC 2009


Alan, Jonathan,

To follow up from a few weeks ago ...

2009/6/23 Jonathan Gray <jonathan.gray at okfn.org>:
> On Tue, Jun 23, 2009 at 3:25 PM, Alan
> Jenkins<alan-jenkins at tuffmail.co.uk> wrote:
>> I think most sites have multiple download files.

Yes, that's my impression to. As yet most data download doesn't have
the all-in-one for download form that software has where usually there
is a single tar.gz/zip/exe (or at least one per platform).

>> My impression is that the links on CKAN are for human consumption.
>> Usually when visiting the main URL for a site, it's not immediately
>> obvious where to find the page with download information.
>>
>> If I wanted to download a large dataset, I would want to visit the
>> download page on the appropriate site first.  It would be a nice way to
>> check that the CKAN info (e.g. license) was sane and up-to-date, that
>> the site was still live, what size file it was, etc.

I think there is some ambiguity there at the moment as to whether the
download link should be a specific file or a page with download info.
I think you are probably right that where there is no single canonical
download file/link we should have a link to the page. I note that even
with software there is quite a lot of support now for automagically
extracting download links from a page and using those to get the data
onto your machine automatically.

>> My opinion is that direct download URLs are only appropriate in a few
>> limited settings, e.g. in a "pull request" email for a GIT repository,
>> or in a how-to document.  I don't think CKAN is one of these case, but
>> of course you are free to decide otherwise :-).

Right. I guess I'm wondering whether "data/content" will go the way of
software (multiple files in source but a single bundle to download) or
whether it will remain multi-file. At least for the present we've
clearly got to support the multi-file option ...

> This is true. One of our long term goals with CKAN is to develop
> something like an 'apt-get' for open knowledge - in which case the
> direct download links would be useful. We're also hoping that where
> there are multiple links, we can introduce support for multiple
> download URLs, or scripts to grab or scrape the relevant files. I
> think we'd also like to mirror versions of open datasets and to
> encourage people to link to 'cleaned up', linked or other versions of
> the data...

Yes, we have a "work-in-progress" in the form of our data package
script: http://www.okfn.org/datapkg

[...]

Rufus




More information about the okfn-discuss mailing list