[ckan4rdm] Short introduction of project EDaWaX

Hendrik Bunke h.bunke at zbw.eu
Fri Apr 19 08:15:56 UTC 2013

Dear all,

we had a conversation with Mark Wainwright and Velichka Dimitrova
recently about our project and the plans for using CKAN for it.
Mark suggested that we should join this list and write some
introductory words, which I will gladly do hereby.

I'm working as a developer at the german 'Leibniz Information
Centre for Economics' (ZBW, formerly known as 'German National
Library of Economics', http://zbw.eu/inded-e.html). In
cooperation with several partners we are conducting a project
called EDaWaX ("European Data Watch Extended').

In short, EDaWaX is looking for ways to publish and curate
resarch data in economics. Our focus is on publication related
data, meaning especially the data authors of journal papers have
used for their articles. One objective of the project is the
development of a data archive for journals using an integral
approach. For more information on the project please have look at
our website and blog: http://www.edawax.de, and here esp. the
/about section.
As a first step we'll try to setup a pilot application that
demonstrates some features we think such a data archive should
have. And that's the part where CKAN comes in. CKAN will be the
central part of our scenario.

We came to the decision for CKAN after evaluating several
software packages, namely Dataverse (which we are using for
archiving datasets provided by authors of 'Economics', an Open
Access E-Journal which I'm also responsible for), Nesstar (used
by many research data centres in Germany) and CKAN. We only had
a few common but fundamental criteria for the software:

    -   Open Source
        this is a fundamental principle for us, but there are
        also practical reasons for this. We want to be able to
        modify and extend the software, and we would like to
        share our extensions.

    -   API (reading and writing)
        This is quite important since we don't want the archive
        to be a 'silo'. We want to be able to program our own
        user interface, for example, and to provide integration
        packages for other systems (CMS oder journal software).

    -   Simple User Interface
        we are mainly targeting authors and editorial offices who
        don't have time, resources and 'knowhow' to learn and use
        complicated UIs and workflows. This is also important
        for lowering the barriers for publishing resarch

    -   RDF metadata representation
        we are aware that this might be a somehow avant-garde
        criteria. But for us as a scientific library it is
        important and we predict that it will be more and more
        important in the near future to have a general, linkable
        and machine readable metadata interface, so our research
        data can be used and adopted most widely.

Looking at this criteria you will see very quickly why we have
chosen CKAN. Another reason --not the decisive one, of course--
was that CKAN is written in Python, which is also the program
language of choice for me. :-) So far we can only see one
argument against CKAN: it is not so much focused on research
data, like for example Dataverse. Hopefully, we can contribute a
bit to change that.

So, these are our plans for the next six months or so.

    1.  install CKAN as the centre of our demo scenario
    2.  do some UI tweaks (layout, theme, CSS etc.)

    3.  develop a CKAN extension for integration of the metadata
        schema provided by da|ra (extended datacite schema, so to
        say; you can find the schema files here (german):
        I've already started working on this, and we intend to
        publish and opensource it as 'ckanext-dara' on pypi and
        github as soon at is proves stable.

    4.  develop a demo webapp that uses CKAN API for searching
        and and writing to our CKAN instance. I've already
        implemented a rough demo for this based on the Pyramid
        framework and ckanclient (Python client for CKAN API),
        and it just works very well. There have been some issues
        with ckanclient related to file upload, and I'm glad I
        could contribute some minor fixes for that

    5.  develop a third-party app add-on, that uses the CKAN API.
        This will be done for Plone, which is the base of the
        above mentioned E-Journal 'Economics'
        (http://www.economics-ejournal.org). It should mainly be
        a testcase for usability of CKAN for editorial offices.
        Editors of 'Economics' have some experience with
        Dataverse (and are not always happy with it) so we do
        have a very good setting here. Generally we consider the
        integration in third-party systems to be very important
        for the acceptance of CKAN as a repository for
        publication-related resarch data. Users should not be
        bothered with having to use two (or even more) different
        systems for data and text. This approach gives the
        maximum of integration for data and articles. Dataverse,
        for example, will develop such functionalities for OJS
        (Open Journal System) presumably within the next two
        years. CKAN has kind of a head start here due to its
        great API, but I think we need to popularise CKAN in this

So much for the moment. I hope you got the rough picture.
Needless to say, that we'd love to stay in contact with OKFN, the
CKAN community and, of course, other institutions using CKAN for
research data. We hope that we can take our part in making
CKAN a viable solution for managing research data.

It would be great to hear from you. Any comments on our project
and the plans for the CKAN implementation are very much

best regards

Dr. Hendrik Bunke
ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften
--Innovative Informations- und Publikationstechnologien--
Tel.: +49 40 42834 454 (Hamburg) OR +49 421 7940430 (homeoffice)

More information about the ckan4rdm mailing list