Salinger Privacy

  • About
    • About Salinger Privacy
    • Videos, Podcasts and Media Mentions
  • Consulting
    • Our Consulting Services
    • Privacy Impact Assessments
    • Privacy Compliance Reviews
  • Training
    • Overview
    • Training Calendar
    • Public Courses and Workshops
    • In-house Privacy Training and Workshops
    • Online Training
    • Webinars
    • IAPP Certifications
    • Login
  • Privacy Resources
    • Privacy Resources
    • Compliance Kits
    • Resources on key privacy topics
    • Free Handbook
    • Login
  • Who We Are
    • Anna Johnston
    • Melanie Casley
    • Samantha Floreani
    • Andrea Calleia
    • Stephen Wilson
    • Chris Culnane
  • Blog
  • Contact
  • Compliance Kits

Magic and rocket science: de-identification is the new black

May 21, 2016, Anna Johnston

Share this post

Share this post on twitter Share this post on Linkedin Share this on Facebook

De-identification … it’s the latest buzzword.

With all the press it’s been getting recently, you could be forgiven for thinking that de-identification is the magic solution to all the privacy problems facing open data and Big Data projects.  But like other forms of magic, this may prove to be just an illusion.  Resolving privacy risks is easier said than done.

Increasingly our clients want advice on how to do data-matching, or release datasets under Open Data initiatives, or conduct Big Data analytics, in a privacy-protective manner.  Some are seeking to participate in cross-agency research projects; others are facing requests to hand over their data to the NSW Data Analytics Centre; while others are simply seeking to find policy or operational insights by leveraging their own data via business intelligence systems.  All are worried about the privacy risks.

There is big picture advice available, like the OAIC’s new guide on how the APPs apply to Big Data, and our own guide to resolving the ethical issues raised by data analytics.  But the one aspect of the discussion that I see causing the most angst is de-identification.

Is de-identification the answer?  Is it the same thing as anonymisation?  How do we even do it?

The Australian Privacy Commissioner Timothy Pilgrim recently described de-identification as “privacy’s rocket science – the technology that unlocks the potential of where we want to go, while protecting individual rights”.  But he also warned that just like space flight, “the risks of getting it wrong can be substantial and very public”.

Thud.  Ouch.  That’s the sound of over-excited data analysts falling back to earth.

As a society, we want privacy protection because it is the oil that lubricates trust, and without trust we cannot function.  The fear of being monitored and targeted for what we say or do has a chilling effect on our freedom of speech.  Public health outcomes cannot be realised if people don’t trust the anonymity of their health information; think of the clients of sexual health, mental health and substance abuse services in particular.  But we also want the full value of data to be realised.  If big data analytics can help find a cure for cancer, or prevent child abuse, we’re all for it.  Bring it on, we all say.

And for the organisation holding data, de-identification sounds like a magic solution, because if you can reach a state with your data where it is not possible for any individual to be identified or re-identified from the data, then it no longer meets the legal definition of “personal information”.  And that means you don’t have to comply with the Privacy Act when you collect, store, use or disclose that data.  Legal risks resolved, hooray, let’s all go home.

So de-identification seems to promise that we can have our cake and eat it too.  It’s the holy grail of data management.

BUT … and this is a big but … can true de-identification ever be achieved, without the utility of the data also being lost?

I have written before about how easily an individual’s identity, pattern of behaviour, physical movements and other traits can be extrapolated from a supposedly ‘anonymous’ set of data, published with good intentions in the name of ‘open data’, public transparency or research.  The examples are many: Netflix, AOL, the Thousand Genomes Project, the London bike-sharing scheme, Washington State health data, and my personal favourite, the NYC taxi data.

So should we throw in the towel, and give up on trying to pursue data analytics?  (Or even worse, give up on privacy?)  No, I don’t believe so.  I think we just need to get better at de-identification, because there is more than one way to skin this particular cat.

But we’re not going to get better at de-identification unless we understand it.  Privacy professionals should not be seduced by boffins who whisper techy sweet nothings in our ear like ‘SLK’ and ‘k-anonymity’, ‘differential privacy’ and ‘encryption’.  Instead, we need to better understand the language and the techniques involved in de-identification for ourselves, so that we can perform proper risk assessments, and know which privacy controls to apply when.

(For what it’s worth: SLKs are keys used to link data about people with confidence, using a code generated from details like their name, gender and date of birth.  The code works only as a pseudonym, so don’t even think about describing SLKs as offering true anonymity, or you’ll get a grumpy tweet from me.)

Privacy professionals need to better understand the relative merits and limitations of different de-identification techniques.  Open data advocates and data analysts need to develop deeper understanding of the full spectrum of privacy threats that can impact on individuals.  And we all need clearer guidance on how to balance data utility and data protection, within the scope of privacy law.

The UK’s Data Protection Commissioner has a really useful Anonymisation Code of Practice – but it’s not a light read at 108 pages.  In the US, the National Institute of Standards and Technology has published a 54-page paper on de-identification which laments the absence of standards for testing the effectiveness of de-identification techniques, and just this month academics from the Berkman Center for Internet & Society at Harvard University have produced a 107-page tome proposing “a framework for a modern privacy analysis informed by recent advances in data privacy from disciplines such as computer science, statistics, and law”.

But in the meantime I think we need a brief, lay person’s guide to de-identification.  A non-boffin’s set of crib notes, if you like.

Perhaps that will be my blog for another day.  Just as soon as I’ve mastered pulling a rabbit out of a hat.

 

Photograph (c) Shutterstock

Filed Under: Uncategorized

Recent Posts

  • What’s in store for privacy law in Australia?
  • Location, location, location: online or offline, privacy matters
  • The Data-Sharing Dilemma
  • Putting a price tag on privacy
  • Why privacy is a public good in need of better protection
  • Re-thinking transparency: If notice and consent is broken, what now?
  • Should I download the COVID-Safe app? The privacy pros and cons
  • Privacy in a pandemic: Keep calm, and remember first principles
  • Privacy in design: Tranquil spaces to be ‘let alone’
  • What should we do about facial recognition?

Archive

  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015

Search

Salinger Privacy we know privacy inside out

Salinger Privacy can help you navigate the complexity of the regulatory environment, and ensure the trust of your customers.

CONTACT US

T: 02 9043 2632
PO Box 1250, Manly NSW 1655
Email Enquiry

© Salinger Consulting Pty Ltd
ABN 84 110 386 537

Our Privacy Policy

Subscribe to our newsletter.

These details will be added to our mailing list to receive the Salinger Privacy eNews and Product News newsletters. You can unsubscribe or adjust your preferences at any time, from the bottom of any newsletter.