[ckan-dev] Privacy of user activity in CKAN 2.0

Joss Winn jwinn at lincoln.ac.uk
Thu May 30 12:24:00 UTC 2013


Hi,

I originally posted this issue to the list and just want to say that *from
the point of view of the data re-user*, I wholeheartedly agree with Mark's
thoughts below! Keep it simple.

Some further thoughts:

All activity data around a dataset (i.e. its provenance) should be
recorded, regardless of the current public/private status of the dataset.
If CKAN is being used to ingest data at the point of creation (e.g. a
datastore, not simply a file store or catalogue), it should be able keep a
full audit of actions relating to that data.

Publishing of the activity data should retrospectively and entirely follow
the public/private status of the dataset.

However, as an added option, *from the point of view of the data creator*,
it would be helpful to be able to mark certain activities around a dataset
as private to the organisation, as well as certain metadata fields. For
example, a notes field where sensitive information relating to the data
subjects (i.e personal data) can be recorded.

I think this is extending the functionality of CKAN but in a way that will
make it more useful to more use cases (e.g. research data).

I suppose my point is that with the addition of the Datastore, I think the
potential use cases for CKAN has been significantly expanded. CKAN is now
a tool for data that is being actively created/modified/managed, rather
than just a publishing and discovery tool. This is very exciting but it
requires more discussion about the implications of this. I fear that the
Datastore invites new use cases for CKAN but the current permissions model
puts a block on them.

I'm still learning about the extent of what CKAN can do, so I apologise if
I have missed/misrepresented something :-)

Cheers
Joss


>
>Message: 1
>Date: Thu, 30 May 2013 11:30:26 +0100
>From: Mark Wainwright <mark.wainwright at okfn.org>
>Subject: Re: [ckan-dev] Privacy of user activity in CKAN 2.0
>To: CKAN Development Discussions <ckan-dev at lists.okfn.org>
>Message-ID:
>	<CAJhtavawixt8rRTgKzH3t=EHeQLkdE=g8w5jA5XfC8azogxOsA at mail.gmail.com>
>Content-Type: text/plain; charset=ISO-8859-1
>
>I don't see why CKAN needs to even have a concept of whether an
>activity is/was private, separate from whether the dataset is
>currently private.
>
>Activities hidden along with everything else when a dataset is made
>private. After all, what's the point of seeing activities to a dataset
>that you can't see? It is liable to cause confusion ('but this dataset
>doesn't exist') and most unlikely to be useful.
>
>Conversely, if a dataset is made public, everything else about the
>dataset is visible so why not the activities? For one thing, this
>could again lead to confusion when different users, looking at the
>same public dataset, could see different histories.
>
>An important thing in general is that the behaviour should be simple
>enough that users should be able to form a clear model of what's going
>on. For my tuppence, having activities with their own concept of
>public-ness, separate from the dataset's, is complex and subtle enough
>to break this.
>
>Mark
>
>On 30/05/2013, Sean Hammond <sean.hammond at okfn.org> wrote:
>>> My assumption as a naive user is that every edit goes in the activity
>>> stream, but that I am only ever shown activity relating to datasets
>>> I'm authorised to see (either because they're public, or because I'm
>>> in the relevant Organization). If a dataset is made public then so is
>>> its history. After all the history is really just more metadata.
>>
>> Right. IIRC, we wanted to have private activities for private datasets
>> as you describe, but we didn't have time to do that change (at the "last
>> second" before releasing 2.0, this was in January iirc! :) so we just
>> made a simpler change instead: no activities from private datasets.
>>
>> If and when we do have private activities for private datasets, then I'm
>> not sure about the question of what happens to private activities when
>> their dataset becomes public. Do the activities become public as well as
>> you say? Or do the activities remain private (but any further activities
>> that happen while the dataset is public are public)?
>>
>> I think the more conservative option of private activities always remain
>> private, is maybe safer.
>>
>> The same question comes up when a public dataset becomes private: do the
>> dataset's public activities now become private and disappear from
>> activity streams? Or do public activities always remain public?
>>
>> _





More information about the ckan-dev mailing list