[@OKau] data science meetup sydney
budgetaus at hotmail.com
Wed Oct 21 02:08:39 UTC 2015
My latest blog post https://openaus.net.au/blog/2015/10/21/data-science-for-social-good/Data-Science for social goodOctober 21, 2015 | No CommentsThis morning I attended a new data science meetup in the CBD. The organiser had contacted me regarding presenting OpenAus and mentioned he thought the meetup might help foster collaborations between data scientists and those interested social change.
The subject of presentation at this morning’s meetup was privacy. As it turns out the presentation has direct implications for open data. Earlier in the year, presenter & organiser Anthony Tockar had sparked a number of news articles when he used American taxi-cab meta data to track taxi-routes of a couple of celebrities- an exercise which caught the attention of the media.
In the US, data held by public companies is subject to FOI law and someone had made a request for taxi-cab data. Anthony used this data to demonstrate that it is possible to ‘de-identify’ a single user in a huge data-set. Anthony explains his reasons and methodology here and raises important issues for privacy, especially in the open data community.
Our ability for the government or any other organisation to make data available for re-use relies on how successful we are as a society at anonymising that data. Ensuring privacy for individuals is one of the concerns of government agencies in decisions to release data. This issue doesn’t apply to all data sets, only those that deal with person-level data such as those based on the Census that form the basis of so much ABS data. There are methods used by the ABS to de-anonymise data but there is also concern that this is not always enough to prevent individuals from being identified through that data.
Anthony brought up the potential solution of ‘differential privacy’. This term describes an algorithm that can be applied to queries on data when it is accessed at a person level such that it obscures the data enough (you get to set the level yourself) to prevent de-anonymising that data. Aggregate queries are unaffected so you get a way to interact with the data on an aggregate level (totals, averages etc), but when you attempt to access this data at a person level, the algorithm slightly scrambles the data so that it is not accurate enough to figure out which individual the data describes.
You can read more about how this works at this blog post. The ensuing discussion about open data, privacy and the need for privacy law to keep up with our ability to create and store data is a pressing one. It also begs the question of why it is that when so much is now at stake in terms of protecting the privacy of individuals, the Office of the Information Commissioner (responsible for keeping abreast of and informing policy on such issues) is hanging on by a thread?Privacy is a cost to business and there is not a lot of appetite to harden up rules in favour of protecting the rights of citizens. If we do not have a properly funded and independent Information Commissioner it is hard to understand how citizen rights and the issues affecting them are likely to be upheld?
Rosie Williams BA (Sociology)________________________________________
NoFibs.com.au - Open Data Reporter | OpenAus - Founder and Developer
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the okfn-au