[ckan-discuss] Questions about CKAN as data repository

Rufus Pollock rufus.pollock at okfn.org
Wed Nov 9 18:59:53 GMT 2011

On 9 November 2011 15:36, Adrian Pohl <adrian.pohl at okfn.org> wrote:
> Hello,
> I've seen with interest that for some time now you can upload data to
> thedatahub.org. Obviously the Data Hub now acts as a data repository.
> I would like to have a bit more information about this step to be able
> to answer questions people ask me about it. We talked about it in
> yesterday's openbiblio meeting regarding the special use case that
> thedatahub.org could be used for saving bibliographic datasets for the
> BibServer project and collected some questions:
> * How big a dataset can be uploaded to CKAN?

At the present time I think the size limit for upload of a single file
is 500Mb but we could raise this pretty easily.

> * When and what data should be uploaded to CKAN? When should a dataset
> that is stored elsewhere just be registered? Are there some
> recommendations/reflections about this anywhere?

We don't have particular recommendations but I note that we have been
hard at work on what we call "archiving" capability. If enabled this
will automatically backup/cache a copy of a remote file to our local
storage so that if the remote file disappears we can still provide it
(this obviously needs to be somewhat configurable as there are some
massive files around and we don't want to cache e.g. simple html pages

> * Is there a post about the CKAN change to a repository?

There's a post about the release of this extension for CKAN:


We actually deployed this at the same time on thedatahub for a test
period but didn't officially announce. It's been in use since and
completely stable (backend storage is actually google storage).

> Generally, I think there are some problems with how changes like the
> name (CKAN/the Data Hub) or this move to being a repository are
> communicated. I think, even people who are quite interested in the
> development (like me) get confused by it.

Yes :-) and we hear you. We're planning to get much better at this. We
already have a regular weekly update re CKAN the software. We're
planning a blog for thedatahub where we can put updates specifically
about it and interesting datasets and plan to mail users (now that we
have user email addresses) about major changes like this.


More information about the ckan-discuss mailing list