I had a dream last night – well, more of a nightmare really. I dreamt that my home had been burgled.
As I walked through my home, seeing possessions flung about but nothing obviously missing, I was thinking: what is there to steal anymore anyway? No-one wants scratched Alanis Morissette CDs or pre-loved Wiggles DVDs when you can just stream all the music and movies you want. Surely there is no market anymore for second-hand stereo systems. (In 1990 when my parents’ house was burgled, their Beta video player was left behind. Not stupid, those burglars.) And who needs a fell-off-the-back-of-a-truck TV when new ones cost less than a phone?
But then I realised that the burglars had discovered my diary. Urghhhh. Shudder. All my most private thoughts. Thoughts I would not share with my closest friends, let alone you, dear reader.
In this age of social media and Big Data, where we Instagram our food before eating it, tell the world about our relationship status via Facebook, ask Siri to write our text messages for us, and let the flashlight app on our phones know precisely where we have been, a personal diary may be the last vestige of privacy we have left. Which is why I woke from my nightmare feeling like I had been violated.
They’re funny things, diaries.
Unlike memoirs, which are written with a reader in mind, a personal diary is the one place where we can record our innermost thoughts and feelings, in absolute privacy. It’s a place where freedom of thought and freedom of expression can run wild. The diary is the perfect example of how privacy is an enabler of those other freedoms – even when there is precious little liberty to be found. While hiding from the Nazis in the Secret Annex, Anne Frank wrote in and of her diary that “The nicest part is being able to write down all my thoughts and feelings, otherwise I’d absolutely suffocate”.
But of course, sometimes private diaries became public, to the embarrassment of either the author or their colleagues. In 1992, former NSW Government Minister Terry Metherell’s habit of keeping a diary eventually led to the downfall of Premier Nick Greiner.
What stuck in my memory from that day was the response of Bob Carr, then NSW Opposition Leader, which was to claim that he had burned his diaries, while also seemingly contradicting himself by quoting the line made famous by Mae West: “Keep a diary and someday it’ll keep you”. (Which, in Bob Carr’s case, eventually came true some 22 years later, when he published his Diary of a Foreign Minister.)
More recently, Australian Attorney General George Brandis has been fighting to keep his Ministerial diary private. This involves a somewhat awkward stance, as he is the Minister in charge of both privacy and Freedom of Information laws.
But that’s not the only irony to be found in the tension between privacy and freedom of information. This year, ‘Right to Know Day’ was celebrated on 28 September. Brandis made two announcements. The first was that while Timothy Pilgrim has been appointed as Australian Information Commissioner, neither the FOI Commissioner nor the Privacy Commissioner roles are to be separately filled.
The second was to announce that amendments will be made to the Privacy Act to criminalise the re-identification of published ‘anonymous’ government data. This law reform proposal appeared to have come out of left-field, until the next day it was revealed that academics from the University of Melbourne had been able to re-identify data published at the Federal Government’s data.gov.au website.
Released as open data by the Department of Health in August, the dataset included around 1 billion Medicare claims made between 1984 and 2014, by about 10% of the Australian population.
At the time, the Department said that a number of de-identification techniques had been applied, including “encryption, perturbation and exclusion of rare events”. However using only publicly available information, Dr Vanessa Teague and her colleagues were able to decrypt the service provider ID numbers.
There is surely a risk that patients’ medical histories could be discovered as a result of knowing the identity of each provider. The Department stated that birthdates were replaced with year of birth, locations of the health services described only by the State or Territory, and the dates of each health service provided were “randomly perturbed to within 14 days of the true date”. However imagine if you were one of the 1,500 or so people who downloaded this dataset before it was taken offline; and now imagine that you knew from other sources that a particular patient you were interested in saw a particular service provider on a particular date. (For example, you know your ex-girlfriend saw her GP on a particular date because you drove her to her appointment; or you know a celebrity saw a particular specialist because the paparazzi photographed them coming out of the surgery.) You could at least start to narrow down your search by finding all the patients with the correct year of birth who saw that health service provider within a 14 day window around the correct date. Depending on what other variables are evident from the data, from there you might just be able to identify which patient is the one you are interested in – and then link through to every other Medicare claim they made over 30 years, without even having to decrypt the patient number.
Less than a week after that re-identification scare, the Australian Public Service Commission confirmed that data on 96,000 public servants was downloaded nearly 60 times before they withdrew the published dataset, after realising that identification codes for the employing agencies could potentially be used to identify the public servants who filled in their annual employment survey.
Resolving the tension between extracting the most value from government datasets (part of our ‘right to know’ as citizens) and protecting the privacy of the individuals to whom the data relates is no easy task. However like many other commentators, I would suggest that criminalising the people who find re-identification vulnerabilities is not the best approach.
We should instead focus efforts on improving understanding of de-identification techniques (and re-identification risks) amongst privacy professionals and open data advocates, as well as the research community, so as to minimise the risk of these data breaches occurring in the first place.
Otherwise I expect that we will keep seeing data breaches like these. And unlike me and my fear of stolen diaries, the affected individuals won’t be able to wake up and think: ‘oh thank goodness, it was all just a bad dream’.
(FYI: If you would like to learn more about this topic, I will be joining Information Commissioner Timothy Pilgrim and other experts on a panel workshopping de-identification myths, realties and limits at the GovInnovate Summit next month.)
Photograph (c) Shutterstock