[ckan4rdm] using CKAN as research data repository

John Erickson erickj4 at rpi.edu
Tue Feb 10 14:11:32 UTC 2015


This continues to be an interesting discussion.

Note that short of having a full-fledged workflow engine, one way to
"approximate" an approval workflow is to implement event-triggered
rules; en example would be, "If a user publishes an object with
attributes (a OR b OR b) but they don't have attributes (x OR y),
de-publish the object and notify someAdmin at theirDomain.org"

We've done that on other platforms (esp. Drupal) to implement a
kind-of approval workflow for posting certain content to certain
pages, etc.

Likely, before CKAN gets a workflow engine, it needs rules ;)

John

On Tue, Feb 10, 2015 at 8:49 AM, Stefan Oderbolz
<stefan.oderbolz at liip.ch> wrote:
> HI Florian,
>
> I'm well aware that workflows can be quite complicated. The "status"
> field on the "dataset" is currently the only element that comes even
> close to a workflow in CKAN.
> I would very much appreciate if this functionality would be
> implemented in CKAN core.
>
> - Stefan
>
> On Tue, Feb 10, 2015 at 1:43 PM, Florian May
> <florian.wendelin.mayer at gmail.com> wrote:
>> Marta, Stefan,
>>
>> Authentication might involve workflows which are more complex than updating
>> one field - think two-tier approval, rejection, escalation and such.
>> Django's finite state machine extension has served me well implementing such
>> approvals in a django project of ours, maybe a similar plugin exists for
>> pylons?
>>
>> Cheers,
>> Florian
>>
>> On 10/02/2015 6:49 pm, "Marta Hoffman-Sommer" <m.hoffman-sommer at icm.edu.pl>
>> wrote:
>>>
>>> Hi Stefan,
>>> Thanks a lot for your suggestions. If we come up with some solution, I'll
>>> let you know here on the list.
>>> Regards,
>>> Marta
>>>
>>>
>>> W dniu 2015-02-10 o 11:14, Stefan Oderbolz pisze:
>>>>
>>>> Hi Marta,
>>>>
>>>> this is one of those features that is still missing in CKAN core. What
>>>> you basically want is some kind of workflow for a dataset to be
>>>> published and/or updated.
>>>> It's on the CKAN roadmap
>>>> (https://github.com/ckan/ideas-and-roadmap/issues/108) but not yet
>>>> implemented.
>>>>
>>>> At the moment you could either try to solve it in a separate extension
>>>> or try to "hack" this feature using the status field (i.e. set all new
>>>> datasets to status "private", and only allow admins to set it to
>>>> "public"). But I'm not saying this is easy to implement. I already
>>>> tried to do this twice and gave up because of the complexity of gettng
>>>> the details right.
>>>>
>>>> Best regards Stefan
>>>>
>>>> On Tue, Feb 10, 2015 at 10:00 AM, Marta Hoffman-Sommer
>>>> <m.hoffman-sommer at icm.edu.pl> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Has anyone tried to use CKAN in such a way, that all datasets created by
>>>>> users need to be approved by a sysadmin before becoming publicly
>>>>> visible?
>>>>> (so that we could e.g. prevent users from posting non-scientific data?)
>>>>> How
>>>>> could this be done? Is there any possibility of configuring CKAN to do
>>>>> this,
>>>>> or would we need to write an extension?
>>>>> I would be grateful for any comments on this.
>>>>> Best,
>>>>> Marta
>>>>>
>>>>> -------
>>>>> Marta Hoffman-Sommer
>>>>> Open Science Platform
>>>>> ICM University of Warsaw
>>>>> http://pon.edu.pl
>>>>>
>>>>>
>>>>> W dniu 2015-01-16 o 10:24, Marta Hoffman-Sommer pisze:
>>>>>
>>>>>
>>>>> Thanks a lot for the hints - this is great help. We are now looking at
>>>>> your
>>>>> extensions in detail, together with my colleagues. The DOI extension
>>>>> should
>>>>> work for us, I think, and that's very important (we're aware that we
>>>>> will
>>>>> need a contract for this). I'll be watching for your embargo extension.
>>>>> And
>>>>> if we stick to our decision of using CKAN (which we most likely will),
>>>>> then
>>>>> I will certainly have more questions.
>>>>>
>>>>> Best,
>>>>> Marta
>>>>>
>>>>>
>>>>>
>>>>> W dniu 2015-01-16 o 05:03, Florian May pisze:
>>>>>
>>>>> Marta, Ben,
>>>>>
>>>>> this is highly interesting! I'm using CKAN to archive all the research
>>>>> datasets without a proper home (dedicated data warehouse).
>>>>>
>>>>> +1 on embargoing! We run one instance with all datasets set to "public"
>>>>> inside our well-protected intranet, and one completely separate (because
>>>>> I
>>>>> feared accidental leaks) instance facing outside. As we work with
>>>>> sensitive
>>>>> data about threatened species, our data release process is painfully
>>>>> manual
>>>>> and goes across several desks, so there's no automation yet. It would be
>>>>> great to have some sort of auditable data release sign-off, possibly
>>>>> triggering a push to the external site.
>>>>>
>>>>> Great to hear about ckanext-doi! I'll have to have a chat with the
>>>>> Australian National Data Service, who offered to mint DOIs for us,
>>>>> whether
>>>>> that offer would extent to a modified ckanext-doi.
>>>>>
>>>>> Ben, FYI our colleagues at the WA Museum have just adopted
>>>>> CollectiveAccess
>>>>> for their collection data management:
>>>>>
>>>>> http://www.gaiaresources.com.au/collectiveaccess-powerful-flexible-collection-management/
>>>>>
>>>>> Cheers,
>>>>> Florian
>>>>>
>>>>> On Fri, Jan 16, 2015 at 12:23 AM, Ben Scott <ben at benscott.co.uk> wrote:
>>>>>>
>>>>>> Hi Maria -
>>>>>>
>>>>>> We're using CKAN as a repository for our research and collections data
>>>>>> here at the Natural History Museum, London -http://data.nhm.ac.uk/.
>>>>>>
>>>>>> 1) Embargoing datasets - this is on our roadmap and a high priority so
>>>>>> we
>>>>>> should be writing an extension for this soon.
>>>>>>
>>>>>> 2) Batch upload - we've built data import pipelines using Spotify's
>>>>>> Luigi
>>>>>> framework (https://github.com/spotify/luigi) and the CKAN api. It's
>>>>>> very
>>>>>> specialised for our collections database though, and not implemented as
>>>>>> an
>>>>>> extension - but it might be useful
>>>>>> (https://github.com/NaturalHistoryMuseum/ke2mongo).
>>>>>>
>>>>>> 3) We've written an extension for assigning DataCite DOIs -
>>>>>> https://github.com/NaturalHistoryMuseum/ckanext-doi (You will need a
>>>>>> contract with DataCite / their national representative to be able to
>>>>>> mint
>>>>>> DOIs).
>>>>>>
>>>>>> Cheers,
>>>>>> Ben
>>>>>>
>>>>>> -----------------------------------
>>>>>> Data Portal Lead Architect
>>>>>> Biodiversity Informatics,
>>>>>> Natural History Museum,
>>>>>> London
>>>>>> +44 (0) 207 942 4277
>>>>>>
>>>>>> On 15 Jan 2015, at 13:57, Marta Hoffman-Sommer
>>>>>> <m.hoffman-sommer at icm.edu.pl> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> We're planning an open research data repository which will serve the
>>>>>>> whole scientific community in Poland and we're seriously considering
>>>>>>> to use
>>>>>>> CKAN for this purpose. I was wondering if some of you have already
>>>>>>> implemented CKAN as a stand-alone repository (not part of a data
>>>>>>> management
>>>>>>> system)? Is anybody aware of CKAN extensions that would enable (1)
>>>>>>> embargoing dataset release, (2) batch upload and edition of multiple
>>>>>>> files,
>>>>>>> or (3) DOI assignment and display? We have been unsuccessfully
>>>>>>> searching for
>>>>>>> these on the web.
>>>>>>>
>>>>>>> Best,
>>>>>>> Marta
>>>>>>>
>>>>>>> --
>>>>>>> Marta Hoffman-Sommer
>>>>>>> Open Science Platform
>>>>>>> ICM University of Warsaw
>>>>>>> http://pon.edu.pl
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ckan4rdm mailing list
>>>>>>> ckan4rdm at lists.okfn.org
>>>>>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
>>>>>>
>>>>>> _______________________________________________
>>>>>> ckan4rdm mailing list
>>>>>> ckan4rdm at lists.okfn.org
>>>>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ckan4rdm mailing list
>>>>> ckan4rdm at lists.okfn.org
>>>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
>>>>>
>>>>>
>>>>> --
>>>>> Marta Hoffman-Sommer
>>>>> Open Science Platform
>>>>> ICM University of Warsaw
>>>>> http://pon.edu.pl
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ckan4rdm mailing list
>>>>> ckan4rdm at lists.okfn.org
>>>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
>>>>>
>>>>>
>>>>> --
>>>>> Marta Hoffman-Sommer
>>>>> Open Science Platform
>>>>> ICM University of Warsaw
>>>>> http://pon.edu.pl
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ckan4rdm mailing list
>>>>> ckan4rdm at lists.okfn.org
>>>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
>>>>>
>>>>
>>>>
>>>
>>> --
>>> Marta Hoffman-Sommer
>>> Open Science Platform
>>> ICM University of Warsaw
>>> http://pon.edu.pl
>>>
>>> _______________________________________________
>>> ckan4rdm mailing list
>>> ckan4rdm at lists.okfn.org
>>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
>>
>>
>> _______________________________________________
>> ckan4rdm mailing list
>> ckan4rdm at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/ckan4rdm
>>
>
>
>
> --
> Liip AG  // Limmatstrasse 183 //  CH-8005 Zürich
> Tel +41 43 500 39 80 // GnuPG 0x7B588C67 // www.liip.ch
> _______________________________________________
> ckan4rdm mailing list
> ckan4rdm at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan4rdm



-- 
John S. Erickson, Ph.D.
Director of Operations, The Rensselaer IDEA
Deputy Director, Web Science Research Center (RPI)
<http://tw.rpi.edu> <erickj4 at rpi.edu>
Twitter & Skype: olyerickson



More information about the ckan4rdm mailing list