[ckan4rdm] Workflows and DOIs / CKAN as research data repository

Ben Scott ben at benscott.co.uk
Wed Mar 4 12:14:37 UTC 2015


Hi Aaron -

We’ve just released a new version of ckanext-doi, adding a few new features (DOI embargoing & a plugin interface for modifying metadata), and a much better readme explaining what the extension does. 

https://github.com/NaturalHistoryMuseum/ckanext-doi

There are some features we’re planning to develop ourselves, which might have some overlap with what you’re after - see the roadmap section in the readme. If you have any questions or need more info about the extension, please do feel free to drop me a line.  

All the best,
Ben


On 10 Feb 2015, at 21:14, Aaron McGlinchy <McGlinchyA at landcareresearch.co.nz> wrote:

> Hi, regards the workflow and DOI work, my organisation has indicated to CKAN/OKFN that we will put some resources ($) towards development of this (we don’t have the capacity in house to handle this at present so have contracted some support time from CKAN/OKFN).
>  
> I would be interested in a description of how the DOI extension (https://github.com/NaturalHistoryMuseum/ckanext-doi) works in practical (user – data uploader and/or admin/sysadmin) terms? 
>  
> We are registered to issue Datacite DOIs and have begun (in progress by CKAN support staff) a first step of changing the metadata stored in CKAN to align better with the datacite metadata requirements (not all of them, the mandatory ones plus one or two others).  This also lines up with the metadata in an excel addin tool that we are workin on (basically taken over the DataUp work that began at California Digital Library but they have since ceased) which allows QA, recording of metadata, and deposit of data from excel to CKAN in tab separated value and/or excel format, and integrates with requesting a DOI.
>  
> For the workflow (https://github.com/ckan/ideas-and-roadmap/issues/108) we have said we can help pay for part of the work to keep it progressing, and to help beta test it.  Indications were that our support would see them work on it for inclusion in release 2.4.
>  
> Perhaps if others are interested we could collectively resource ($ and/or programmer time?)  this work in full to get it done asap?
>  
> Regards
>  
> Aaron McGlinchy
> Research Data Manager
> Landcare Research NZ
> http://datastore.landcareresearch.co.nz
>  
>  
> From: Florian May [mailto:florian.wendelin.mayer at gmail.com] 
> Sent: Wednesday, 11 February 2015 8:06 a.m.
> To: CKAN for Research Data Management
> Subject: Re: [ckan4rdm] using CKAN as research data repository
>  
> Stephan,
> Sorry I didn't mean to imply that using an existing field would be inappropriate!
> 
> John, the rules you describe are exactly what django-fsm provides. Having transitions between well defined states with optional gate checks proved to be extremely versatile.
> 
> Cheers, 
> Florian
> 
> On 10/02/2015 10:11 pm, "John Erickson" <erickj4 at rpi.edu> wrote:
> This continues to be an interesting discussion.
> 
> Note that short of having a full-fledged workflow engine, one way to
> "approximate" an approval workflow is to implement event-triggered
> rules; en example would be, "If a user publishes an object with
> attributes (a OR b OR b) but they don't have attributes (x OR y),
> de-publish the object and notify someAdmin at theirDomain.org"
> 
> We've done that on other platforms (esp. Drupal) to implement a
> kind-of approval workflow for posting certain content to certain
> pages, etc.
> 
> Likely, before CKAN gets a workflow engine, it needs rules ;)
> 
> John
> 
> On Tue, Feb 10, 2015 at 8:49 AM, Stefan Oderbolz
> <stefan.oderbolz at liip.ch> wrote:
> > HI Florian,
> >
> > I'm well aware that workflows can be quite complicated. The "status"
> > field on the "dataset" is currently the only element that comes even
> > close to a workflow in CKAN.
> > I would very much appreciate if this functionality would be
> > implemented in CKAN core.
> >
> > - Stefan
> >
> > On Tue, Feb 10, 2015 at 1:43 PM, Florian May
> > <florian.wendelin.mayer at gmail.com> wrote:
> >> Marta, Stefan,
> >>
> >> Authentication might involve workflows which are more complex than updating
> >> one field - think two-tier approval, rejection, escalation and such.
> >> Django's finite state machine extension has served me well implementing such
> >> approvals in a django project of ours, maybe a similar plugin exists for
> >> pylons?
> >>
> >> Cheers,
> >> Florian
> >>
> >> On 10/02/2015 6:49 pm, "Marta Hoffman-Sommer" <m.hoffman-sommer at icm.edu.pl>
> >> wrote:
> >>>
> >>> Hi Stefan,
> >>> Thanks a lot for your suggestions. If we come up with some solution, I'll
> >>> let you know here on the list.
> >>> Regards,
> >>> Marta
> >>>
> >>>
> >>> W dniu 2015-02-10 o 11:14, Stefan Oderbolz pisze:
> >>>>
> >>>> Hi Marta,
> >>>>
> >>>> this is one of those features that is still missing in CKAN core. What
> >>>> you basically want is some kind of workflow for a dataset to be
> >>>> published and/or updated.
> >>>> It's on the CKAN roadmap
> >>>> (https://github.com/ckan/ideas-and-roadmap/issues/108) but not yet
> >>>> implemented.
> >>>>
> >>>> At the moment you could either try to solve it in a separate extension
> >>>> or try to "hack" this feature using the status field (i.e. set all new
> >>>> datasets to status "private", and only allow admins to set it to
> >>>> "public"). But I'm not saying this is easy to implement. I already
> >>>> tried to do this twice and gave up because of the complexity of gettng
> >>>> the details right.
> >>>>
> >>>> Best regards Stefan
> >>>>
> >>>> On Tue, Feb 10, 2015 at 10:00 AM, Marta Hoffman-Sommer
> >>>> <m.hoffman-sommer at icm.edu.pl> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> Has anyone tried to use CKAN in such a way, that all datasets created by
> >>>>> users need to be approved by a sysadmin before becoming publicly
> >>>>> visible?
> >>>>> (so that we could e.g. prevent users from posting non-scientific data?)
> >>>>> How
> >>>>> could this be done? Is there any possibility of configuring CKAN to do
> >>>>> this,
> >>>>> or would we need to write an extension?
> >>>>> I would be grateful for any comments on this.
> >>>>> Best,
> >>>>> Marta
> >>>>>
> >>>>> -------
> >>>>> Marta Hoffman-Sommer
> >>>>> Open Science Platform
> >>>>> ICM University of Warsaw
> >>>>> http://pon.edu.pl
> >>>>>
> >>>>>
> >>>>> W dniu 2015-01-16 o 10:24, Marta Hoffman-Sommer pisze:
> >>>>>
> >>>>>
> >>>>> Thanks a lot for the hints - this is great help. We are now looking at
> >>>>> your
> >>>>> extensions in detail, together with my colleagues. The DOI extension
> >>>>> should
> >>>>> work for us, I think, and that's very important (we're aware that we
> >>>>> will
> >>>>> need a contract for this). I'll be watching for your embargo extension.
> >>>>> And
> >>>>> if we stick to our decision of using CKAN (which we most likely will),
> >>>>> then
> >>>>> I will certainly have more questions.
> >>>>>
> >>>>> Best,
> >>>>> Marta
> >>>>>
> >>>>>
> >>>>>
> >>>>> W dniu 2015-01-16 o 05:03, Florian May pisze:
> >>>>>
> >>>>> Marta, Ben,
> >>>>>
> >>>>> this is highly interesting! I'm using CKAN to archive all the research
> >>>>> datasets without a proper home (dedicated data warehouse).
> >>>>>
> >>>>> +1 on embargoing! We run one instance with all datasets set to "public"
> >>>>> inside our well-protected intranet, and one completely separate (because
> >>>>> I
> >>>>> feared accidental leaks) instance facing outside. As we work with
> >>>>> sensitive
> >>>>> data about threatened species, our data release process is painfully
> >>>>> manual
> >>>>> and goes across several desks, so there's no automation yet. It would be
> >>>>> great to have some sort of auditable data release sign-off, possibly
> >>>>> triggering a push to the external site.
> >>>>>
> >>>>> Great to hear about ckanext-doi! I'll have to have a chat with the
> >>>>> Australian National Data Service, who offered to mint DOIs for us,
> >>>>> whether
> >>>>> that offer would extent to a modified ckanext-doi.
> >>>>>
> >>>>> Ben, FYI our colleagues at the WA Museum have just adopted
> >>>>> CollectiveAccess
> >>>>> for their collection data management:
> >>>>>
> >>>>> http://www.gaiaresources.com.au/collectiveaccess-powerful-flexible-collection-management/
> >>>>>
> >>>>> Cheers,
> >>>>> Florian
> >>>>>
> >>>>> On Fri, Jan 16, 2015 at 12:23 AM, Ben Scott <ben at benscott.co.uk> wrote:
> >>>>>>
> >>>>>> Hi Maria -
> >>>>>>
> >>>>>> We're using CKAN as a repository for our research and collections data
> >>>>>> here at the Natural History Museum, London -http://data.nhm.ac.uk/.
> >>>>>>
> >>>>>> 1) Embargoing datasets - this is on our roadmap and a high priority so
> >>>>>> we
> >>>>>> should be writing an extension for this soon.
> >>>>>>
> >>>>>> 2) Batch upload - we've built data import pipelines using Spotify's
> >>>>>> Luigi
> >>>>>> framework (https://github.com/spotify/luigi) and the CKAN api. It's
> >>>>>> very
> >>>>>> specialised for our collections database though, and not implemented as
> >>>>>> an
> >>>>>> extension - but it might be useful
> >>>>>> (https://github.com/NaturalHistoryMuseum/ke2mongo).
> >>>>>>
> >>>>>> 3) We've written an extension for assigning DataCite DOIs -
> >>>>>> https://github.com/NaturalHistoryMuseum/ckanext-doi (You will need a
> >>>>>> contract with DataCite / their national representative to be able to
> >>>>>> mint
> >>>>>> DOIs).
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Ben
> >>>>>>
> >>>>>> -----------------------------------
> >>>>>> Data Portal Lead Architect
> >>>>>> Biodiversity Informatics,
> >>>>>> Natural History Museum,
> >>>>>> London
> >>>>>> +44 (0) 207 942 4277
> >>>>>>
> >>>>>> On 15 Jan 2015, at 13:57, Marta Hoffman-Sommer
> >>>>>> <m.hoffman-sommer at icm.edu.pl> wrote:
> >>>>>>
> >>>>>>> Hi everyone,
> >>>>>>>
> >>>>>>> We're planning an open research data repository which will serve the
> >>>>>>> whole scientific community in Poland and we're seriously considering
> >>>>>>> to use
> >>>>>>> CKAN for this purpose. I was wondering if some of you have already
> >>>>>>> implemented CKAN as a stand-alone repository (not part of a data
> >>>>>>> management
> >>>>>>> system)? Is anybody aware of CKAN extensions that would enable (1)
> >>>>>>> embargoing dataset release, (2) batch upload and edition of multiple
> >>>>>>> files,
> >>>>>>> or (3) DOI assignment and display? We have been unsuccessfully
> >>>>>>> searching for
> >>>>>>> these on the web.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Marta
> >>>>>>>
> >>>>>>> --
> >>>>>>> Marta Hoffman-Sommer
> >>>>>>> Open Science Platform
> >>>>>>> ICM University of Warsaw
> >>>>>>> http://pon.edu.pl
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> ckan4rdm mailing list
> >>>>>>> ckan4rdm at lists.okfn.org
> >>>>>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> ckan4rdm mailing list
> >>>>>> ckan4rdm at lists.okfn.org
> >>>>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> ckan4rdm mailing list
> >>>>> ckan4rdm at lists.okfn.org
> >>>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Marta Hoffman-Sommer
> >>>>> Open Science Platform
> >>>>> ICM University of Warsaw
> >>>>> http://pon.edu.pl
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> ckan4rdm mailing list
> >>>>> ckan4rdm at lists.okfn.org
> >>>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Marta Hoffman-Sommer
> >>>>> Open Science Platform
> >>>>> ICM University of Warsaw
> >>>>> http://pon.edu.pl
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> ckan4rdm mailing list
> >>>>> ckan4rdm at lists.okfn.org
> >>>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
> >>>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> Marta Hoffman-Sommer
> >>> Open Science Platform
> >>> ICM University of Warsaw
> >>> http://pon.edu.pl
> >>>
> >>> _______________________________________________
> >>> ckan4rdm mailing list
> >>> ckan4rdm at lists.okfn.org
> >>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
> >>
> >>
> >> _______________________________________________
> >> ckan4rdm mailing list
> >> ckan4rdm at lists.okfn.org
> >> https://lists.okfn.org/mailman/listinfo/ckan4rdm
> >>
> >
> >
> >
> > --
> > Liip AG  // Limmatstrasse 183 //  CH-8005 Zürich
> > Tel +41 43 500 39 80 // GnuPG 0x7B588C67 // www.liip.ch
> > _______________________________________________
> > ckan4rdm mailing list
> > ckan4rdm at lists.okfn.org
> > https://lists.okfn.org/mailman/listinfo/ckan4rdm
> 
> 
> 
> --
> John S. Erickson, Ph.D.
> Director of Operations, The Rensselaer IDEA
> Deputy Director, Web Science Research Center (RPI)
> <http://tw.rpi.edu> <erickj4 at rpi.edu>
> Twitter & Skype: olyerickson
> _______________________________________________
> ckan4rdm mailing list
> ckan4rdm at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan4rdm
> 
> 
> Please consider the environment before printing this email
> Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails.
> The views expressed in this email may not be those of Landcare Research New Zealand Limited.http://www.landcareresearch.co.nz
> _______________________________________________
> ckan4rdm mailing list
> ckan4rdm at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan4rdm

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan4rdm/attachments/20150304/dd99b3bc/attachment-0002.html>


More information about the ckan4rdm mailing list