[ckan-discuss] Introducing Odessa, a CKAN metadata repository

Brian Lee Yung Rowe rowe at muxspace.com
Tue Oct 22 19:10:03 BST 2013


Hello,

I'm a relative newcomer to the open data world. I have a background in quantitative analysis (mostly in finance) and am a professor in a graduate program for data analytics. I've been watching the growth of open data with interest as the potential for conducting useful and interesting research with all this data is quite compelling. With so many repositories offering data, finding data is no longer the challenge it used to be. Instead the challenges I've had with actually using the data are two-fold: transforming datasets into analysis-friendly formats, and transforming indices into a common format to join datasets together. I have been focused on addressing the second challenge, so as to make it easier for my students to perform analysis without getting bogged down in data cleaning and manipulation.

This is the motivation for Odessa, a CKAN instance with a corresponding R package. In essence, each dataset registered on Odessa is a package containing a reference to actual data plus a metadata file that describes the indices available in the dataset. I use a combination of regular expressions and a graph construction to make automatic inferences about how to join datasets together. The idea is that you tell the library which datasets to connect, and it figures out how to do it. The secondary idea is that only one person needs to create a metadata file for a given dataset and then anyone in the community has access to it via the shared platform.

The overall framework is alpha-level maturity, but I thought it would be helpful to get some first impressions and feedback on how useful people think this idea is and what sorts of features/functionality are of interest. The CKAN instance is located at http://odessa.zatonovo.com/ and the R package is at https://github.com/zatonovo/odessa. I have plans for adding a corresponding Python implementation in addition to adding more inference capabilities. I'm actively looking for case studies to work on, so if you have any thoughts, please let me know.

Warm Regards,
Brian Rowe


P.S. I hope this isn't too presumptuous, but I will be in London/Cambridge Wednesday - Sunday if anybody in that neck of the woods is interested in discussing any of this in person.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-discuss/attachments/20131022/1de1e0c7/attachment.htm>


More information about the ckan-discuss mailing list