[ckan4rdm] Short introduction of project EDaWaX
h.bunke at zbw.eu
Fri Apr 19 08:15:56 UTC 2013
we had a conversation with Mark Wainwright and Velichka Dimitrova
recently about our project and the plans for using CKAN for it.
Mark suggested that we should join this list and write some
introductory words, which I will gladly do hereby.
I'm working as a developer at the german 'Leibniz Information
Centre for Economics' (ZBW, formerly known as 'German National
Library of Economics', http://zbw.eu/inded-e.html). In
cooperation with several partners we are conducting a project
called EDaWaX ("European Data Watch Extended').
In short, EDaWaX is looking for ways to publish and curate
resarch data in economics. Our focus is on publication related
data, meaning especially the data authors of journal papers have
used for their articles. One objective of the project is the
development of a data archive for journals using an integral
approach. For more information on the project please have look at
our website and blog: http://www.edawax.de, and here esp. the
As a first step we'll try to setup a pilot application that
demonstrates some features we think such a data archive should
have. And that's the part where CKAN comes in. CKAN will be the
central part of our scenario.
We came to the decision for CKAN after evaluating several
software packages, namely Dataverse (which we are using for
archiving datasets provided by authors of 'Economics', an Open
Access E-Journal which I'm also responsible for), Nesstar (used
by many research data centres in Germany) and CKAN. We only had
a few common but fundamental criteria for the software:
- Open Source
this is a fundamental principle for us, but there are
also practical reasons for this. We want to be able to
modify and extend the software, and we would like to
share our extensions.
- API (reading and writing)
This is quite important since we don't want the archive
to be a 'silo'. We want to be able to program our own
user interface, for example, and to provide integration
packages for other systems (CMS oder journal software).
- Simple User Interface
we are mainly targeting authors and editorial offices who
don't have time, resources and 'knowhow' to learn and use
complicated UIs and workflows. This is also important
for lowering the barriers for publishing resarch
- RDF metadata representation
we are aware that this might be a somehow avant-garde
criteria. But for us as a scientific library it is
important and we predict that it will be more and more
important in the near future to have a general, linkable
and machine readable metadata interface, so our research
data can be used and adopted most widely.
Looking at this criteria you will see very quickly why we have
chosen CKAN. Another reason --not the decisive one, of course--
was that CKAN is written in Python, which is also the program
language of choice for me. :-) So far we can only see one
argument against CKAN: it is not so much focused on research
data, like for example Dataverse. Hopefully, we can contribute a
bit to change that.
So, these are our plans for the next six months or so.
1. install CKAN as the centre of our demo scenario
2. do some UI tweaks (layout, theme, CSS etc.)
3. develop a CKAN extension for integration of the metadata
schema provided by da|ra (extended datacite schema, so to
say; you can find the schema files here (german):
I've already started working on this, and we intend to
publish and opensource it as 'ckanext-dara' on pypi and
github as soon at is proves stable.
4. develop a demo webapp that uses CKAN API for searching
and and writing to our CKAN instance. I've already
implemented a rough demo for this based on the Pyramid
framework and ckanclient (Python client for CKAN API),
and it just works very well. There have been some issues
with ckanclient related to file upload, and I'm glad I
could contribute some minor fixes for that
5. develop a third-party app add-on, that uses the CKAN API.
This will be done for Plone, which is the base of the
above mentioned E-Journal 'Economics'
(http://www.economics-ejournal.org). It should mainly be
a testcase for usability of CKAN for editorial offices.
Editors of 'Economics' have some experience with
Dataverse (and are not always happy with it) so we do
have a very good setting here. Generally we consider the
integration in third-party systems to be very important
for the acceptance of CKAN as a repository for
publication-related resarch data. Users should not be
bothered with having to use two (or even more) different
systems for data and text. This approach gives the
maximum of integration for data and articles. Dataverse,
for example, will develop such functionalities for OJS
(Open Journal System) presumably within the next two
years. CKAN has kind of a head start here due to its
great API, but I think we need to popularise CKAN in this
So much for the moment. I hope you got the rough picture.
Needless to say, that we'd love to stay in contact with OKFN, the
CKAN community and, of course, other institutions using CKAN for
research data. We hope that we can take our part in making
CKAN a viable solution for managing research data.
It would be great to hear from you. Any comments on our project
and the plans for the CKAN implementation are very much
Dr. Hendrik Bunke
ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften
--Innovative Informations- und Publikationstechnologien--
Tel.: +49 40 42834 454 (Hamburg) OR +49 421 7940430 (homeoffice)
More information about the ckan4rdm