[MyData & Open Data] Study establishes that de-identification does work

Phil Booth phil at einsteinsattic.com
Mon Jun 16 21:03:07 UTC 2014


I’m not entirely sure who is dismissing de-identification as “generally ineffective” – lawyers, the media? – that seems like a straw man. 

 

De-identification and pseudonymisation are just two of a number of technical approaches to treating data that may help mitigate certain risks. The problem is when less-technical people (e.g. politicians) interpret statements about them as meaning one can make individual-level linked data ‘safe for ever’ – because once it’s out there you can’t ever get it back – merely by the application of such techniques.

 

The 2012 paper again only deals with Sweeney’s very early work, which covered a lot of ground from a high profile figure (Weld) in 1997 to average Joes in local press reports last year - as per my previous link. See Latanya herself talk about it here: 

 

http://www.youtube.com/watch?v=N4HTHyduQzE – about 10 mins in

 

Of course, there are differences between the US and the UK situations but over here, for example, we’ve already seen pharmaceutical marketers start to associate HES-derivatives with social media. 

 

In reality, episodic health data is inherently identifiable by the very events and details about people’s lives contained within it. The identifiers, quasi or direct, are just one vector of attack – which is one reason why data should be de-identified or pseudonymised, properly. But while such treatment is necessary, it is not sufficient. I would carefully examine the motives of anyone who asserts otherwise.

 

For example, given a bit of effort scraping Twitter, Facebook, etc. for birth announcements and with access to linked, pseudonymised HES data (the billion+ records that are already out there, not some hypothetical) it would be relatively straightforward to start reading off the entire hospital histories of women who had given birth in an NHS hospital. You would certainly get overlaps or indeterminates – probably quite a few – but that’d be cold comfort to the tens or hundreds of thousands or more whose hospital records would have been permanently compromised at that point. 

 

Add in the maternity data and data from GP records intended to be extracted and linked under care.data and that’s people’s whole medical records that would be exposed, not just hospital visits.

 

Technical measures alone are not sufficient to keep individuals anonymous, unless data is aggregated and treated to produce proper statistics. That being the case, and given there are benefits to be derived from analysing the data, effort should (also) be focussed on the operational processes, governance, audit, etc., etc. that might permit safe use of data.

 

Which is an entirely different proposition than ‘safe data’.

 

Phil

 

 

From: Sally Deffor [mailto:sally.deffor at okfn.org] 
Sent: 16 June 2014 19:58
To: Phil Booth
Cc: mydata-open-data at lists.okfn.org
Subject: Re: [MyData & Open Data] Study establishes that de-identification does work

 

Interestingly, Dr Barth-Jones (in the second paper http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2076397), acknowledges that those risks exist but make a case (much like Dr Covoukian) that de-identification works sufficiently and should not be out-rightly dismissed as generally ineffective.

 

Sally 

 

On 16 June 2014 19:30, Phil Booth <phil at einsteinsattic.com> wrote:

Not quite. A report dealing with a particular selection of commonly-quoted examples of re-identification points out some facts; concedes that de-identification is complex, evolving and context-specific, has to be done properly by experts to be effective and may impact on the utility of the data so treated.

 

Dr Covoukian fails to address some of Sweeney’s more recent re-identification work, e.g. http://www.bloomberg.com/news/2013-06-05/states-hospital-data-for-sale-puts-privacy-in-jeopardy.html and ducks the reality that there’s an enormous amount of badly de-identified (or pseudonymised) data already out there- which itself may provide vectors for attack.


Phil

 

 

From: mydata-open-data [mailto:mydata-open-data-bounces at lists.okfn.org] On Behalf Of Sally Deffor
Sent: 16 June 2014 17:49
To: mydata-open-data at lists.okfn.org
Subject: [MyData & Open Data] Study establishes that de-identification does work

 

Study provides evidence that re-identification is a myth (http://www.privacybydesign.ca/index.php/paper/big-data-innovation-setting-record-straight-de-identification-work)

 

(http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2076397)


 

-- 

Sally Deffor
Open Data & Privacy Project Coordinator | skype:deffor.selase | @SDeffor | +44 (0)7774 734206 <tel:%2B44%20%280%297774%20734206>  
The  <http://okfn.org/> Open Knowledge Foundation
Empowering through Open Knowledge
 <http://www.okfn.org/> http://www.okfn.org |  <https://twitter.com/OKFN> @okfn |  <http://www.facebook.com/OKFNetwork> OKF on Facebook |  <http://blog.okfn.org/> Blog | Newsletter <http://okfn.org/?s=Newsletter>  

 

Have you bought your  <http://2014.okfestival.org/tickets/> tickets to OKFestival yet? Join us in Berlin in July (15-17)!

See you at  <http://2014.okfestival.org/> OKFestival 15-17 July 2014





 

-- 

Sally Deffor
Open Data & Privacy Project Coordinator | skype:deffor.selase | @SDeffor | +44 (0)7774 734206 
The  <http://okfn.org/> Open Knowledge Foundation
Empowering through Open Knowledge
 <http://www.okfn.org/> http://www.okfn.org |  <https://twitter.com/OKFN> @okfn |  <http://www.facebook.com/OKFNetwork> OKF on Facebook |  <http://blog.okfn.org/> Blog | Newsletter <http://okfn.org/?s=Newsletter>  

 

Have you bought your  <http://2014.okfestival.org/tickets/> tickets to OKFestival yet? Join us in Berlin in July (15-17)!

See you at  <http://2014.okfestival.org/> OKFestival 15-17 July 2014

No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4592 / Virus Database: 3964/7688 - Release Date: 06/16/14

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/mydata-open-data/attachments/20140616/16e2cd78/attachment-0003.html>


More information about the mydata-open-data mailing list