[ckan4rdm] Short introduction of project EDaWaX

Mon Apr 22 13:38:31 UTC 2013

Hi Hendrik,

Thanks for such a detailed and informative post to the discussion list!
It's really interesting to learn about your new project. I have a few
comments inline below...

On 19/04/2013 09:15, "Hendrik Bunke" <h.bunke at zbw.eu> wrote:

>Dear all,
>
>we had a conversation with Mark Wainwright and Velichka Dimitrova
>recently about our project and the plans for using CKAN for it.
>Mark suggested that we should join this list and write some
>introductory words, which I will gladly do hereby.
>
>I'm working as a developer at the german 'Leibniz Information
>Centre for Economics' (ZBW, formerly known as 'German National
>Library of Economics', http://zbw.eu/inded-e.html). In
>cooperation with several partners we are conducting a project
>called EDaWaX ("European Data Watch Extended').
>
>In short, EDaWaX is looking for ways to publish and curate
>resarch data in economics. Our focus is on publication related
>data, meaning especially the data authors of journal papers have
>used for their articles. One objective of the project is the
>development of a data archive for journals using an integral
>approach. For more information on the project please have look at
>our website and blog: http://www.edawax.de, and here esp. the
>/about section.

This looks very interesting. I also understand why you've posted to this
discussion list, given your interests in curation and archiving of open
data, which I think forms the background of other members of this list.

>    
>As a first step we'll try to setup a pilot application that
>demonstrates some features we think such a data archive should
>have. And that's the part where CKAN comes in. CKAN will be the
>central part of our scenario.
>
>We came to the decision for CKAN after evaluating several
>software packages, namely Dataverse (which we are using for
>archiving datasets provided by authors of 'Economics', an Open
>Access E-Journal which I'm also responsible for), Nesstar (used
>by many research data centres in Germany) and CKAN. We only had
>a few common but fundamental criteria for the software:
>
>    -   Open Source
>        this is a fundamental principle for us, but there are
>        also practical reasons for this. We want to be able to
>        modify and extend the software, and we would like to
>        share our extensions.
>
>    -   API (reading and writing)
>        This is quite important since we don't want the archive
>        to be a 'silo'. We want to be able to program our own
>        user interface, for example, and to provide integration
>        packages for other systems (CMS oder journal software).
>
>    -   Simple User Interface
>        we are mainly targeting authors and editorial offices who
>        don't have time, resources and 'knowhow' to learn and use
>        complicated UIs and workflows. This is also important
>        for lowering the barriers for publishing resarch
>        data.
>
>    -   RDF metadata representation
>        we are aware that this might be a somehow avant-garde
>        criteria. But for us as a scientific library it is
>        important and we predict that it will be more and more
>        important in the near future to have a general, linkable
>        and machine readable metadata interface, so our research
>        data can be used and adopted most widely.
>
>Looking at this criteria you will see very quickly why we have
>chosen CKAN. Another reason --not the decisive one, of course--
>was that CKAN is written in Python, which is also the program
>language of choice for me. :-) So far we can only see one
>argument against CKAN: it is not so much focused on research
>data, like for example Dataverse. Hopefully, we can contribute a
>bit to change that.

You're right that it's not so much focused on 'research data', although
personally, I am beginning to frame it a bit differently:

What makes research data different to open government data? I think (I
could be wrong!) that the research data community has a lot of overlap
with the institutional repository community (often it is the same person
performing both tasks) and that 'research data' implies a curatorial
approach to data as practised by archivists and librarians. I suspect this
is different to the way that CKAN is being deployed in the government
sector. I don't know for sure, but the impression I get is that the effort
by public sector open data enthusiasts deploying CKAN has not yet
addressed the long-term curatorial functions of archiving datasets. The
data published on data.gov.uk, for example, is already archived elsewhere.
CKAN is not being used as the primary archival tool, but rather as a
discovery/publishing tool. I think this discussion list is interested in
how CKAN can be more than that.

So when you say that CKAN is not focused on 'research data', are you
suggesting that CKAN has yet to address the curatorial needs of managing
open data over the long-term?

I should say that in my evaluation of CKAN, this seems to be something I
keep coming back to. The driving motivations for the development of CKAN
are not coming from the archives/librarian professions, which is where
'research data management' seems to now sit.

>
>So, these are our plans for the next six months or so.
>
>    1.  install CKAN as the centre of our demo scenario
>        
>    2.  do some UI tweaks (layout, theme, CSS etc.)
>
>    3.  develop a CKAN extension for integration of the metadata
>        schema provided by da|ra (extended datacite schema, so to
>        say; you can find the schema files here (german):
>        
>http://www.da-ra.de/de/fuer-datenzentren/daten-registrieren/doi-und-metada
>ten/
>        I've already started working on this, and we intend to
>        publish and opensource it as 'ckanext-dara' on pypi and
>        github as soon at is proves stable.
>
>    4.  develop a demo webapp that uses CKAN API for searching
>        and and writing to our CKAN instance. I've already
>        implemented a rough demo for this based on the Pyramid
>        framework and ckanclient (Python client for CKAN API),
>        and it just works very well. There have been some issues
>        with ckanclient related to file upload, and I'm glad I
>        could contribute some minor fixes for that
>        (https://github.com/okfn/ckanclient/commits/master).
>
>    5.  develop a third-party app add-on, that uses the CKAN API.
>        This will be done for Plone, which is the base of the
>        above mentioned E-Journal 'Economics'
>        (http://www.economics-ejournal.org). It should mainly be
>        a testcase for usability of CKAN for editorial offices.
>        Editors of 'Economics' have some experience with
>        Dataverse (and are not always happy with it) so we do
>        have a very good setting here. Generally we consider the
>        integration in third-party systems to be very important
>        for the acceptance of CKAN as a repository for
>        publication-related resarch data. Users should not be
>        bothered with having to use two (or even more) different
>        systems for data and text. This approach gives the
>        maximum of integration for data and articles. Dataverse,
>        for example, will develop such functionalities for OJS
>        (Open Journal System) presumably within the next two
>        years. CKAN has kind of a head start here due to its
>        great API, but I think we need to popularise CKAN in this
>        respect.
>
>So much for the moment. I hope you got the rough picture.
>Needless to say, that we'd love to stay in contact with OKFN, the
>CKAN community and, of course, other institutions using CKAN for
>research data. We hope that we can take our part in making
>CKAN a viable solution for managing research data.

Does your roadmap contradict what I suggest above? :-) Point 3 refers to a
standards-based development of the CKAN metadata schema; point 4 refers to
discovery and ingest of datasets; point 5 refers to integration (and
workflow?).

Will the integration work create additional workflows for data curation
via OJS and Plone?

If so, this is similar to what we've done at Lincoln, where we use the
CKAN APIs to incorporate CKAN into a curatorial deposit workflow that
retrieves a datacite DOI and deposits metadata into Eprints, which is the
canonical record of institutional research outputs and data.

As you can probably tell, given the requirements that came out of our
CKAN4RDM workshop in February
(http://orbital.blogs.lincoln.ac.uk/2013/02/27/ckan-for-rdm-workshop/),
I'm inclined to think that as well as specific development relating to the
RDM domain, our community would benefit from the development in CKAN of
common curatorial features and workflows, and in turn that would benefit
CKAN as a generic data management system, too.

Cheers
Joss

>
>It would be great to hear from you. Any comments on our project
>and the plans for the CKAN implementation are very much
>appreciated.
>
>best regards
>hendrik
>
>
>-- 
>Dr. Hendrik Bunke
>ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften
>--Innovative Informations- und Publikationstechnologien--
>Tel.: +49 40 42834 454 (Hamburg) OR +49 421 7940430 (homeoffice)
>http://zbw.eu
>
>_______________________________________________
>ckan4rdm mailing list
>ckan4rdm at lists.okfn.org
>http://lists.okfn.org/mailman/listinfo/ckan4rdm