[ckan-dev] CKAN and Civic Knowledge Data Bundles

Eric Busboom eric at clarinova.com
Wed Jun 13 21:00:15 UTC 2012


CKAN Developers, 

I'm working on a related project that would benefit from and can contribute to CKAN, so I'd like to introduce myself and the project.  

Short Story: We're building a data warehouse for public data, and part of the project is a data package format that is similar to the CKAN DataPackage. I'd like to swap experience with the two formats. 

The project, Civic Knowledge, is creating data warehouses for public datasets, with an early focus on investigative journalists. By "data warehouse" we mean the formal definition of a large database with a structure that is specifically designed for reporting and analysis, in the style of Inmon and Kimball. ( We're mostly  Kimballites )

The promise of this project is that journalist can visit the Civic Knowledge website and immediately start issuing queries on a wide range of linked public data sets, quickly answering questions, such as "is the rate of nursing home violations correlated with the average income of an area, after controlling for crime rate? "

Users will also be able to get direct access to the database -- they will get the server, username and password for a Postgres database -- and can hook up Tableau, qGIS, Navicat, or any other reporting tool that can connect to a Postgres database. 

Here is the corporate-speak overview: 

	 http://www.clarinova.com/civic-knowledge-overview

To support this project we are also creating a data format. Our format has some particular requirements, which include being able to break up a single dataset into multiple partitions. Just one of the 9 US Census datasets is about 80GB, unmanageably large for most users, so the dataset gets partitioned into about 2,000 files, requiring special features to manage. 

The requirements and design documents for the Data Bundles are here: 

	http://www.clarinova.com/bundles

However, despite the differences in requirements, it would be quite sensible to provide a way to convert our bundles into CKAN packages. This would result in benefits for both our projects: 

	* It would make available many high value datasets in the CKAN format. 
	* It would allows users to access Civic Knowledge data via CKAN APIs and search functions. 

As we work on the design, I'd like to keep track of developments on the CKAN package spec and post updates of our spec, keeping open to places where the two can be harmonized. 

I'm very open to comments and suggestions, so please let me know what you think, 

thanks, 

eric. 

--------------------------------------------------------------------------------------------------
Eric Busboom, CEO, Clarinova                                               (858) 386-4134















More information about the ckan-dev mailing list