[MyData & Open Data] distinctions between personal and open

Wed Jul 24 09:53:42 UTC 2013

Thanks Lancelot,

This is a useful example of a disclosure and of bad anonymisation which we can add to our list.

Mark Elliot
Centre for Census and Survey Research
School of Social Sciences
University of Manchester
M13 9PL
t: 0161-275-4257
f: 0161-275-4722

From: mydata-open-data-bounces at lists.okfn.org [mailto:mydata-open-data-bounces at lists.okfn.org] On Behalf Of Lancelot PECQUET (Will Strategy)
Sent: 24 July 2013 10:22
To: mydata-open-data at lists.okfn.org
Subject: Re: [MyData & Open Data] distinctions between personal and open

Hello,

Regarding what Sam calls "aggrodata", there is definitely a lot to be said.

A French example: (for people who do not speak French, I hope your favourite online translator
will provide a decent result):
http://www.agoravox.fr/actualites/economie/article/exclusif-l-insee-brise-avec-google-131028

In brief, the French Statistical Institute (INSEE) had relased some fiscal
"anonymous aggregated open data" without noticing that by cross-matching
those data with geolocation, in many cases, it was simple to de-anonymize
the information and get individual fiscal data (which is not supposed to be easily
accessible in France).

An investigation newspaper, "Le Canard enchaîné", revealed this
situation in February and the INSEE removed the "anonymous data".

Lancelot
Le 23/07/2013 21:22, Sam Smith a écrit :

>From all the conversations, there's a clear split in interests. In both parts, we need better and clearer examples. You should be able to download your phone records in electronic form, but I should be to stop you paying money to get data containing mine

The two aspects:

1. mydata -- a single individual obtaining data about themselves from companies that hold it.

  - this is the midata programme in the UK run by the UK Government, and similar elsewhere

  - there are a bunch of questions about it, but those are mostly implementation.

It may be that this list wants to put together a set of principles and requirements of this.

Points made so far include:

  - machine readable

  - reuse must be the discretion of the individual (whether apps or upload to a service)

  - copyright issues should be clear

Is anyone involved in the midata pilots? Is it worth OKF putting together a cohort of volunteers who are going to build interesting reuse projects? (ideally ones that wont accidentally torpedo the whole programme, so tread a little carefully in public).

Fundamentally, data given to an individual about that individual is subject to decisions by that individual.

myData/miData is solely about you getting your data; there is a wider version which is getting some access to other people's data. This is an emerging area which got confused into the above, but which is much more important from an open data and a privacy perspective. It got named "aggordata" in a conversation I've had, and I've not seen a better name for it (got one?)

2.Aggrodata

        - publication of customer/transaction data by companies in a way which is "deidentified".

         - this is what Everything Everywhere got caught doing to the police

         - this is what Telefonica do -- http://dynamicinsights.telefonica.com

         - Barclays too.

  - questions of proper anonymisation vs research

  - consent

  - access by individuals within the dataset

The commercial aspects of this are evolving much faster, and there should be a set of principles developed to discuss what it looks like. Some will be open data, some will not be as it's data sold commercially -- although there is a strong case on consent for data in which an individual contributes, to be accessible to them. There is a large question over what anonymous means, in a way which isn't followed by "oops". Tom Steinberg talked about this at open Governemnt Data camp in London in 2010, in a way which is still relevant (video snarfled here: http://www.youtube.com/watch?v=eN0beOAvlGM - text: http://steiny.typepad.com/premise/2010/11/open-data-how-not-to-cock-it-up.html). Privacy International (my day job) has been trying to have some of that conversation for a w

 hile, a

nd some thoughts have gone up (https://t.co/6M626HVxWR). This should be a public conversation, some of which can will take place here, other parts in places of your choosing. What else should it include?

It's also not that new of an area in other ways. Governments have been doing Statistical Disclosure Control on administrative and transactional data they release for decades. Some of this includes differential release - different detail to different audiences. There is a clear case for individuals to have their own control of this data. Gov "travel to work" data gets fiddled around Cheltenham, so that they don't accidentally reveal where all the spooks live. Do O2 do the same for people who leave their phone in their car as they can't take it into the office?

but fundamentally, for both, we need better examples. Have OKF approached any of the new ODI sponsors to look at what they can do in terms of Open? The most likely companies to do this well are the ODI early adopters.

Regards

Sam

--

@smithsam

_______________________________________________

MyData-Open-Data mailing list

MyData-Open-Data at lists.okfn.org<mailto:MyData-Open-Data at lists.okfn.org>

http://lists.okfn.org/mailman/listinfo/mydata-open-data

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/mydata-open-data/attachments/20130724/c7180795/attachment-0001.html>