[MyData & Open Data] distinctions between personal and open

Wed Jul 24 17:40:09 UTC 2013

Hi Javier,

The notion that "we can publish anything online if its anonymised" is indeed wrong headed and anyone touting such ideas needs to be corrected immediately. I would be grateful if you would let us know of specific instances of such language and we will intervene.

The sensitivity issue is very important and I agree that location data is particularly pernicious. I would happily participate in any attempts to haul mobile data companies into shared good practice; let me know if I can be of any assistance.

Best
Mark

Mark Elliot
Centre for Census and Survey Research
School of Social Sciences
University of Manchester
M13 9PL
t: 0161-275-4257
f: 0161-275-4722

From: javierruizorg at gmail.com [mailto:javierruizorg at gmail.com] On Behalf Of Javier Ruiz
Sent: 24 July 2013 17:16
To: Mark Elliot
Cc: stef; mydata-open-data at lists.okfn.org; Elaine Mackey
Subject: Re: [MyData & Open Data] distinctions between personal and open

I think the problem is that again and again we are told that you can publish anything online if it's "anonymised".

While people in this list may be aware of Paul Ohm (who is a lawyer that only represented other people's research),  most policy makers are covering their ears and singing out loudly.

Besides the context,  there are also specific data types that are more sensitive. Location is one such a case that it is almost impossible to anonymise if you keep individual records. This has huge implications for mobile data reuse.

ORG is chasing UK mobile companies to agree some best practice for their reuse of personal data to produce aggregated data products and services. Besides voluntary industry agreement,  there are some important legal aspects that are typically overlooked.  We will publish a report on this very soon.
On 24 Jul 2013 12:54, "Mark Elliot" <mark.elliot at manchester.ac.uk<mailto:mark.elliot at manchester.ac.uk>> wrote:

Sent from my iPad

On 24 Jul 2013, at 11:13, "stef" <s at ctrlc.hu<mailto:s at ctrlc.hu>> wrote:

> On Wed, Jul 24, 2013 at 10:06:49AM +0000, Mark Elliot wrote:
>> So you really do not make a distinction between publishing full tax records on the internet with full identifiers
>> and a set of univariate aggregate tables held in side safe data lab that only one person sees?
>
> that is a bit more than anonymization, that is also "not publishing" no?
>
>> Ohm, in short, is wrong because he fails to model the context.
>
> can you elaborate that? i think the reason is exactly, that he considers any
> data in a context, and not a vacuum.

Firstly, Context is rather more than simply the publishing/not publishing distinction you have to consider what data environment the data exists on. Something that could be anonymised in one data environment but not In another. Clearly full publication is the most liberal environment therefore requires the highest level of caution.

Secondly, Ohms scenario is unrealistic. He uses differential privacy which assumes that intruder has all but one of the pieces of information in the data set and that all the information that he has is 100% convergent with that in the data.

Put this another way: Ohms paper was rather like the guy who rushes into a room full of car designers and screams "everyone... listen up... we have just discovered that if you drive cars into a brick wall at 100 miles and hour then people get hurt, we have to stop building them!!!!"

> --
> pgp: https://www.ctrlc.hu/~stef/stef.gpg
> pgp fp: FD52 DABD 5224 7F9C 63C6  3C12 FC97 D29F CA05 57EF
> otr fp: https://www.ctrlc.hu/~stef/otr.txt

_______________________________________________
MyData-Open-Data mailing list
MyData-Open-Data at lists.okfn.org<mailto:MyData-Open-Data at lists.okfn.org>
http://lists.okfn.org/mailman/listinfo/mydata-open-data
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/mydata-open-data/attachments/20130724/06eb8718/attachment-0001.html>