One of the key findings of the OAIC’s latest Australian Community Attitudes to Privacy Survey is that 62% of Australians are uncomfortable with their location being tracked through their mobile or web browser. Our Kiwi cousins are remarkably similar: a survey by the New Zealand Privacy Commissioner in 2016 asked what people found most ‘sensitive’, with 63% responding that they were sensitive about physical location.
Indeed across continents and cultures the message is the same: sharing location data makes the majority of people feel “stressed, nervous or vulnerable, triggering fears of burglaries, spying, stalkers and digital or physical harm”.
And yet, website and app developers routinely collect location data. How do they get away with it? And can location data ever be considered de-identified?
The proliferation of location data
With the advent of mobile phones, telephony providers began to know where we were. With the shift to smartphones, that knowledge has spread well beyond just our phone providers; multiple smartphone apps use a mixture of GPS, Bluetooth and Wi-Fi signals to pinpoint locations whenever we carry our phones.
A global ‘sweep’ of more than 1,200 mobile apps by Privacy Commissioners around the world in 2014 found that three-quarters of all the apps examined requested one or more permissions; the most common was location. Disturbingly, 31% of apps requested information not relevant to the app’s stated functionality. A prominent example was a torch app which tracked users’ precise location, and sold that data to advertisers.
More recently, a scan of 136 Covid-19-related apps for the 2020 Defcon security conference found that three quarters asked for location data, even in apps where the stated functionality was simply to monitor the user’s symptoms.
(Given these findings, perhaps it is no surprise that the outbreak of COVID-19 has also had an impact on the perception of privacy risk and location data, just over the course of 2020. In the OAIC’s community attitudes survey, location tracking was initially perceived as the fifth biggest privacy risk we face at the beginning of the year, but by April it had risen to the third biggest privacy risk, ahead even of government surveillance.)
However it is not only apps we install on our mobile phones which can track our location. Bluetooth signals emitted by wearable devices can be collected by third parties; and venues such as shopping centres and airports (or, briefly, rubbish bins in London) use the MAC addresses broadcast by devices to detect how populations are moving within a space, and to identify repeat visitors.
Bluetooth Beacons can also be used to link online advertising to offline transactions. Having purchased MasterCard transaction data in the US to better tie offline purchases with online advertisements, Google offers advertisers the ability to see whether an ad click or video view results in an in-store purchase within 30 days. Connecting to shopping centre Westfield’s free wifi involves agreeing to a set of terms and conditions which include linking the mobile device ID with the individual’s wifi use.
Location data is highly granular. One study suggested that four points of geolocation data alone can potentially uniquely identify 95% of the population. Mark Pesce, a futurist, inventor and educator, as keynote speaker at the OAIC Business Breakfast for Privacy Awareness Week in 2015, described the geolocation data collected by and broadcast from our smartphones as “almost as unique as fingerprints”.
Data showing where a person has been can reveal not only the obvious, like where they live and work or who they visit, but it may also reveal particularly sensitive information – such as if they have spent time at a church or a needle exchange, a strip club or an abortion clinic. Some app-makers claim they can even tell which floor of a building people are on.
A recent example is the analysis conducted by Singaporean company Near on the movements of workers at an abattoir in Melbourne, which was the centre of an outbreak during the first COVID-19 isolation period. Near claimed that it could track this small cohort of workers to specific locations including shops, restaurants and government offices. (Near uses “anonymous mobile location information” collected “by tapping data collected by apps” to provide insight into the precise movements of individuals, in order to offer advertisers “finer slices of audiences to reach highly qualified prospective customers”. Near boasts of having “the world’s largest data set of people’s behavior in the real-world” consisting of 1.6 billion ‘users’, across 44 countries, processing 5 billion events per day.)
This information can then be used to target individuals. For example anti-abortion activists use geo-fencing to target online ads at women as they enter abortion clinics. Near has reported that it could target individuals with messaging about the Australian Government’s COVIDSafe app: “We can support app adoption, saying to someone you’ve been to a postcode or a high-risk area and encourage them to download the app. That’s quite easy to do”. This is despite the company’s claim that its data is “anonymized to protect privacy”.
None of these technologies – or their ability to impact on people’s private lives or autonomy – depend on the identifiability of the data subject. Nonetheless digital platforms, publishers, advertisers, ad brokers and data brokers often claim to work outside the reach of privacy laws because the data in which they trade is ‘de-identified’ or ‘anonymised’ or ‘non-personal’.
In response to such claims of protecting privacy through anonymity, the New York Times’ Privacy Project used publicly available information about people in positions of power, linked with a dataset of location data drawn from mobile phone apps. The dataset included 50 billion location pings from the phones of more than 12 million Americans in Washington, New York, San Francisco and Los Angeles. The result was highly invasive:
“We followed military officials with security clearances as they drove home at night. We tracked law enforcement officers as they took their kids to school. We watched high-powered lawyers (and their guests) as they traveled from private jets to vacation properties. … We wanted to document the risk of underregulated surveillance. …Watching dots move across a map sometimes revealed hints of faltering marriages, evidence of drug addiction, records of visits to psychological facilities. Connecting a sanitized ping to an actual human in time and place could feel like reading someone else’s diary.”
Harms caused by location data
A number of case studies illustrate how the public release of location data about individuals whose identity was unknown even to the data collector can enable groups or individuals to be singled out for targeting. In each case the dataset had purportedly been ‘de-identified’, but each release created the possibility of serious privacy harms including physical safety risks for some individuals in the dataset.
One disturbing recent example is the finding that publicly disclosed de-identified data about public transport cards used in the city of Melbourne, could be used to find patterns showing young children travelling without an accompanying adult. Those children could be targeted by a violent predator as a result, without the perpetrator needing to know anything about the child’s identity.
In March 2014, the New York City Taxi & Limousine Commission released data recorded by taxis’ GPS systems. The dataset covered more than 173 million individual taxi trips taken in New York City during 2013. The FOI applicant used the data to make a visualisation of a day in the life of a NYC taxi, and published the raw data online for others to use. It took computer scientist Vijay Pandurangan less than an hour to re-identify each vehicle and driver for all 173 million trips. Then postgrad student Anthony Tockar found that the geolocation and timestamp data alone could potentially identify taxi passengers. Using other public data like celebrity gossip blogs, he was able to determine where and when various celebrities got into taxis, thus identifying exactly where named celebrities went, and how much they paid. Tockar also developed an interactive map, showing the drop-off address for each taxi trip which had begun at a notorious strip club. The same could be done to identify the start or end-point for each taxi trip to or from an abortion clinic or a mosque, and target the individuals living at the other addresses as a result – without ever needing to learn their identity.
And the release of Strava fitness data in 2017 famously led to a student pointing out that the heat maps could be used to locate sensitive military sites, because military personnel often jog routes just inside the perimeter of their base. Others have noted that the heat map highlighted patterns of road patrols out of military bases in combat zones including in Afghanistan, Iraq, and Syria. Further, a Strava user has explained how she discovered that her workout routes were accessible to (and commented on by) strangers, even though she had used the privacy settings in the app to prevent public sharing of her data or identity.
The focus should be on preventing harms, not whether or not data is identifiable
Much effort is expended by advertisers and others wishing to track people’s movements, in convincing privacy regulators and consumers that their data is not identifying, and that therefore there is no cause for alarm. Their goal is to avoid identifying anybody, such that the activity can proceed unregulated by data privacy laws.
In fact the real question both companies and governments should be asking is how to avoid harming anybody.
If the end result of an activity is that an individual can be individuated from a dataset, such that they could, at an individual level, be tracked, profiled, targeted, contacted, or subject to a decision or action which impacts upon them, that is a privacy harm which may need protecting against.
Treat location data as personal information
Privacy professionals reviewing the application of privacy laws to their apps, systems, databases and processes should treat with scepticism any claims that data has been ‘de-identified’ to the point that no individual is reasonably identifiable from the data.
Location data in particular is so rich, and so revealing of patterns of movement and behaviour, that notwithstanding an absence of direct identifiers like name or address, location data alone can oftentimes at least individuate, if not also lead to the identification of, individuals.
Given the degree to which community sentiment suggests that location data is considered highly ‘sensitive’ by a large majority of consumers, I suggest that any organisations holding or using location data would do well to treat all unit record level data as ‘personal information’, and apply the relevant privacy principles, regardless of whether de-identification techniques have already been applied.
This blog is an edited version of an article previously published in the Privacy Law Bulletin 17.6 (September 2020). Photograph © Shutterstock