[ckan4rdm] Workflows and DOIs / CKAN as research data repository

Aaron McGlinchy McGlinchyA at landcareresearch.co.nz
Tue Feb 10 21:14:06 UTC 2015


Hi, regards the workflow and DOI work, my organisation has indicated to CKAN/OKFN that we will put some resources ($) towards development of this (we don’t have the capacity in house to handle this at present so have contracted some support time from CKAN/OKFN).

I would be interested in a description of how the DOI extension (https://github.com/NaturalHistoryMuseum/ckanext-doi) works in practical (user – data uploader and/or admin/sysadmin) terms?

We are registered to issue Datacite DOIs and have begun (in progress by CKAN support staff) a first step of changing the metadata stored in CKAN to align better with the datacite metadata requirements (not all of them, the mandatory ones plus one or two others).  This also lines up with the metadata in an excel addin tool that we are workin on (basically taken over the DataUp work that began at California Digital Library but they have since ceased) which allows QA, recording of metadata, and deposit of data from excel to CKAN in tab separated value and/or excel format, and integrates with requesting a DOI.

For the workflow (https://github.com/ckan/ideas-and-roadmap/issues/108) we have said we can help pay for part of the work to keep it progressing, and to help beta test it.  Indications were that our support would see them work on it for inclusion in release 2.4.

Perhaps if others are interested we could collectively resource ($ and/or programmer time?)  this work in full to get it done asap?

Regards

Aaron McGlinchy
Research Data Manager
Landcare Research NZ
http://datastore.landcareresearch.co.nz<http://datastore.landcareresearch.co.nz/>


From: Florian May [mailto:florian.wendelin.mayer at gmail.com]
Sent: Wednesday, 11 February 2015 8:06 a.m.
To: CKAN for Research Data Management
Subject: Re: [ckan4rdm] using CKAN as research data repository


Stephan,
Sorry I didn't mean to imply that using an existing field would be inappropriate!

John, the rules you describe are exactly what django-fsm provides. Having transitions between well defined states with optional gate checks proved to be extremely versatile.

Cheers,
Florian
On 10/02/2015 10:11 pm, "John Erickson" <erickj4 at rpi.edu<mailto:erickj4 at rpi.edu>> wrote:
This continues to be an interesting discussion.

Note that short of having a full-fledged workflow engine, one way to
"approximate" an approval workflow is to implement event-triggered
rules; en example would be, "If a user publishes an object with
attributes (a OR b OR b) but they don't have attributes (x OR y),
de-publish the object and notify someAdmin at theirDomain.org<mailto:someAdmin at theirDomain.org>"

We've done that on other platforms (esp. Drupal) to implement a
kind-of approval workflow for posting certain content to certain
pages, etc.

Likely, before CKAN gets a workflow engine, it needs rules ;)

John

On Tue, Feb 10, 2015 at 8:49 AM, Stefan Oderbolz
<stefan.oderbolz at liip.ch<mailto:stefan.oderbolz at liip.ch>> wrote:
> HI Florian,
>
> I'm well aware that workflows can be quite complicated. The "status"
> field on the "dataset" is currently the only element that comes even
> close to a workflow in CKAN.
> I would very much appreciate if this functionality would be
> implemented in CKAN core.
>
> - Stefan
>
> On Tue, Feb 10, 2015 at 1:43 PM, Florian May
> <florian.wendelin.mayer at gmail.com<mailto:florian.wendelin.mayer at gmail.com>> wrote:
>> Marta, Stefan,
>>
>> Authentication might involve workflows which are more complex than updating
>> one field - think two-tier approval, rejection, escalation and such.
>> Django's finite state machine extension has served me well implementing such
>> approvals in a django project of ours, maybe a similar plugin exists for
>> pylons?
>>
>> Cheers,
>> Florian
>>
>> On 10/02/2015 6:49 pm, "Marta Hoffman-Sommer" <m.hoffman-sommer at icm.edu.pl<mailto:m.hoffman-sommer at icm.edu.pl>>
>> wrote:
>>>
>>> Hi Stefan,
>>> Thanks a lot for your suggestions. If we come up with some solution, I'll
>>> let you know here on the list.
>>> Regards,
>>> Marta
>>>
>>>
>>> W dniu 2015-02-10 o 11:14, Stefan Oderbolz pisze:
>>>>
>>>> Hi Marta,
>>>>
>>>> this is one of those features that is still missing in CKAN core. What
>>>> you basically want is some kind of workflow for a dataset to be
>>>> published and/or updated.
>>>> It's on the CKAN roadmap
>>>> (https://github.com/ckan/ideas-and-roadmap/issues/108) but not yet
>>>> implemented.
>>>>
>>>> At the moment you could either try to solve it in a separate extension
>>>> or try to "hack" this feature using the status field (i.e. set all new
>>>> datasets to status "private", and only allow admins to set it to
>>>> "public"). But I'm not saying this is easy to implement. I already
>>>> tried to do this twice and gave up because of the complexity of gettng
>>>> the details right.
>>>>
>>>> Best regards Stefan
>>>>
>>>> On Tue, Feb 10, 2015 at 10:00 AM, Marta Hoffman-Sommer
>>>> <m.hoffman-sommer at icm.edu.pl<mailto:m.hoffman-sommer at icm.edu.pl>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Has anyone tried to use CKAN in such a way, that all datasets created by
>>>>> users need to be approved by a sysadmin before becoming publicly
>>>>> visible?
>>>>> (so that we could e.g. prevent users from posting non-scientific data?)
>>>>> How
>>>>> could this be done? Is there any possibility of configuring CKAN to do
>>>>> this,
>>>>> or would we need to write an extension?
>>>>> I would be grateful for any comments on this.
>>>>> Best,
>>>>> Marta
>>>>>
>>>>> -------
>>>>> Marta Hoffman-Sommer
>>>>> Open Science Platform
>>>>> ICM University of Warsaw
>>>>> http://pon.edu.pl
>>>>>
>>>>>
>>>>> W dniu 2015-01-16 o 10:24, Marta Hoffman-Sommer pisze:
>>>>>
>>>>>
>>>>> Thanks a lot for the hints - this is great help. We are now looking at
>>>>> your
>>>>> extensions in detail, together with my colleagues. The DOI extension
>>>>> should
>>>>> work for us, I think, and that's very important (we're aware that we
>>>>> will
>>>>> need a contract for this). I'll be watching for your embargo extension.
>>>>> And
>>>>> if we stick to our decision of using CKAN (which we most likely will),
>>>>> then
>>>>> I will certainly have more questions.
>>>>>
>>>>> Best,
>>>>> Marta
>>>>>
>>>>>
>>>>>
>>>>> W dniu 2015-01-16 o 05:03, Florian May pisze:
>>>>>
>>>>> Marta, Ben,
>>>>>
>>>>> this is highly interesting! I'm using CKAN to archive all the research
>>>>> datasets without a proper home (dedicated data warehouse).
>>>>>
>>>>> +1 on embargoing! We run one instance with all datasets set to "public"
>>>>> inside our well-protected intranet, and one completely separate (because
>>>>> I
>>>>> feared accidental leaks) instance facing outside. As we work with
>>>>> sensitive
>>>>> data about threatened species, our data release process is painfully
>>>>> manual
>>>>> and goes across several desks, so there's no automation yet. It would be
>>>>> great to have some sort of auditable data release sign-off, possibly
>>>>> triggering a push to the external site.
>>>>>
>>>>> Great to hear about ckanext-doi! I'll have to have a chat with the
>>>>> Australian National Data Service, who offered to mint DOIs for us,
>>>>> whether
>>>>> that offer would extent to a modified ckanext-doi.
>>>>>
>>>>> Ben, FYI our colleagues at the WA Museum have just adopted
>>>>> CollectiveAccess
>>>>> for their collection data management:
>>>>>
>>>>> http://www.gaiaresources.com.au/collectiveaccess-powerful-flexible-collection-management/
>>>>>
>>>>> Cheers,
>>>>> Florian
>>>>>
>>>>> On Fri, Jan 16, 2015 at 12:23 AM, Ben Scott <ben at benscott.co.uk<mailto:ben at benscott.co.uk>> wrote:
>>>>>>
>>>>>> Hi Maria -
>>>>>>
>>>>>> We're using CKAN as a repository for our research and collections data
>>>>>> here at the Natural History Museum, London -http://data.nhm.ac.uk/.
>>>>>>
>>>>>> 1) Embargoing datasets - this is on our roadmap and a high priority so
>>>>>> we
>>>>>> should be writing an extension for this soon.
>>>>>>
>>>>>> 2) Batch upload - we've built data import pipelines using Spotify's
>>>>>> Luigi
>>>>>> framework (https://github.com/spotify/luigi) and the CKAN api. It's
>>>>>> very
>>>>>> specialised for our collections database though, and not implemented as
>>>>>> an
>>>>>> extension - but it might be useful
>>>>>> (https://github.com/NaturalHistoryMuseum/ke2mongo).
>>>>>>
>>>>>> 3) We've written an extension for assigning DataCite DOIs -
>>>>>> https://github.com/NaturalHistoryMuseum/ckanext-doi (You will need a
>>>>>> contract with DataCite / their national representative to be able to
>>>>>> mint
>>>>>> DOIs).
>>>>>>
>>>>>> Cheers,
>>>>>> Ben
>>>>>>
>>>>>> -----------------------------------
>>>>>> Data Portal Lead Architect
>>>>>> Biodiversity Informatics,
>>>>>> Natural History Museum,
>>>>>> London
>>>>>> +44 (0) 207 942 4277<tel:%2B44%20%280%29%20207%20942%204277>
>>>>>>
>>>>>> On 15 Jan 2015, at 13:57, Marta Hoffman-Sommer
>>>>>> <m.hoffman-sommer at icm.edu.pl<mailto:m.hoffman-sommer at icm.edu.pl>> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> We're planning an open research data repository which will serve the
>>>>>>> whole scientific community in Poland and we're seriously considering
>>>>>>> to use
>>>>>>> CKAN for this purpose. I was wondering if some of you have already
>>>>>>> implemented CKAN as a stand-alone repository (not part of a data
>>>>>>> management
>>>>>>> system)? Is anybody aware of CKAN extensions that would enable (1)
>>>>>>> embargoing dataset release, (2) batch upload and edition of multiple
>>>>>>> files,
>>>>>>> or (3) DOI assignment and display? We have been unsuccessfully
>>>>>>> searching for
>>>>>>> these on the web.
>>>>>>>
>>>>>>> Best,
>>>>>>> Marta
>>>>>>>
>>>>>>> --
>>>>>>> Marta Hoffman-Sommer
>>>>>>> Open Science Platform
>>>>>>> ICM University of Warsaw
>>>>>>> http://pon.edu.pl
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ckan4rdm mailing list
>>>>>>> ckan4rdm at lists.okfn.org<mailto:ckan4rdm at lists.okfn.org>
>>>>>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
>>>>>>
>>>>>> _______________________________________________
>>>>>> ckan4rdm mailing list
>>>>>> ckan4rdm at lists.okfn.org<mailto:ckan4rdm at lists.okfn.org>
>>>>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ckan4rdm mailing list
>>>>> ckan4rdm at lists.okfn.org<mailto:ckan4rdm at lists.okfn.org>
>>>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
>>>>>
>>>>>
>>>>> --
>>>>> Marta Hoffman-Sommer
>>>>> Open Science Platform
>>>>> ICM University of Warsaw
>>>>> http://pon.edu.pl
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ckan4rdm mailing list
>>>>> ckan4rdm at lists.okfn.org<mailto:ckan4rdm at lists.okfn.org>
>>>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
>>>>>
>>>>>
>>>>> --
>>>>> Marta Hoffman-Sommer
>>>>> Open Science Platform
>>>>> ICM University of Warsaw
>>>>> http://pon.edu.pl
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ckan4rdm mailing list
>>>>> ckan4rdm at lists.okfn.org<mailto:ckan4rdm at lists.okfn.org>
>>>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
>>>>>
>>>>
>>>>
>>>
>>> --
>>> Marta Hoffman-Sommer
>>> Open Science Platform
>>> ICM University of Warsaw
>>> http://pon.edu.pl
>>>
>>> _______________________________________________
>>> ckan4rdm mailing list
>>> ckan4rdm at lists.okfn.org<mailto:ckan4rdm at lists.okfn.org>
>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
>>
>>
>> _______________________________________________
>> ckan4rdm mailing list
>> ckan4rdm at lists.okfn.org<mailto:ckan4rdm at lists.okfn.org>
>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
>>
>
>
>
> --
> Liip AG  // Limmatstrasse 183 //  CH-8005 Zürich
> Tel +41 43 500 39 80<tel:%2B41%2043%20500%2039%2080> // GnuPG 0x7B588C67 // www.liip.ch<http://www.liip.ch>
> _______________________________________________
> ckan4rdm mailing list
> ckan4rdm at lists.okfn.org<mailto:ckan4rdm at lists.okfn.org>
> https://lists.okfn.org/mailman/listinfo/ckan4rdm



--
John S. Erickson, Ph.D.
Director of Operations, The Rensselaer IDEA
Deputy Director, Web Science Research Center (RPI)
<http://tw.rpi.edu> <erickj4 at rpi.edu<mailto:erickj4 at rpi.edu>>
Twitter & Skype: olyerickson
_______________________________________________
ckan4rdm mailing list
ckan4rdm at lists.okfn.org<mailto:ckan4rdm at lists.okfn.org>
https://lists.okfn.org/mailman/listinfo/ckan4rdm

________________________________

Please consider the environment before printing this email
Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails.
The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan4rdm/attachments/20150210/6c0a8315/attachment-0002.html>


More information about the ckan4rdm mailing list