Hey, before we start, can I just ask: are you male, female or other? Are you bristling at even being asked?
Collecting accurate data on gender can, when done appropriately, be a key way to ensure a product, program or policy is designed with gender differences in mind. In fact, poor design which leads to damaging outcomes can occur when data about gender is not collected.
However, there are many instances where the knowledge of someone’s gender is completely irrelevant to the circumstance at hand, and collecting it is not only an invasion of privacy, but can also increase the severity of harm caused by misuse of that personal information, or in the event of a data breach.
Privacy harms, whether caused by data breaches, surveillance, or other invasions of privacy, do not impact everyone equally. While the focus of this piece is on gender, it’s important to always keep in mind the ways that gender intersects with other factors including race, disability, class, and sexuality.
So, read on to explore the friction between collecting gender data and enhancing privacy, and why it is essential that we consider gender when we assess privacy risks.
Language note: where I refer to ‘women’ I mean both trans and cisgender women. Trans women are women. Where it is necessary to differentiate that I am specifically talking about cis or trans women, I will make that distinction clear. While many of the issues in this piece are framed around women, they also often impact non-binary and gender non-conforming people in similar ways, at the same, if not higher rates. However there remains a lack of research regarding the intersection of privacy and gender non-conforming people and I have chosen not to cast the experience for non-binary communities as the same as it is for women.
Privacy harms are not served equal
Women have been surveilled and policed for centuries, to the extent that until relatively recently they have been perceived as having no right to privacy when it came to their sexual life. Even now, we see particularly gendered invasions of privacy like doxing (malicious publication of someone’s personal details), stalking, and non-consensual sharing of intimate images.
Often, the harm caused by privacy loss, such as a data breach, disproportionately impacts those who are already part of a marginalised or vulnerable group, including women.
Let’s take a relatively recent, and local, example of a data breach to explore this point. In 2018, Public Transport Victoria (PTV) released a large dataset containing 15 million de-identified details of Melbourne’s contactless smart card public transport ticketing system known as Myki. Later that year, academics Vanessa Teague, Ben Rubinstein and Chris Culnane were able to re-identify themselves and others in the dataset. The Office of the Victorian Information Commissioner investigated, and found that PTV had failed to address the possibility that individuals in the dataset could be re-identified. (You can read more in OVIC’s investigation report.)
The point I want to make here is how we think about the impact of data breaches. Not everyone is affected equally.
According to the Australian Bureau of Statistics, cisgender women are, on average, more likely to use public transport than men. Women are also more likely to experience stalking than men, with approximately 1 in 6 cis women experiencing stalking since the age of 15 (compared to 1 in 15 cis men). On top of this, research conducted by WESNET, Women’s Legal Service NSW and Domestic Violence Resource Centre Victoria, has found that the issue of perpetrators utilising technological means to facilitate their abuse of women is significant, and on the rise.
So with that in mind, when we consider the possible harms caused by the Myki data breach, the picture looks a lot worse for women when we apply a gendered lens to the risk assessment. The likelihood of individuals being identified from the dataset and their patterns of behaviour analysed, and the ability for perpetrators to use that data to inflict violence or harassment on victims as a result, is much greater for women than for men.
While on the subject of statistics, research conducted by the OAIC showed that when comparing responses between those who identified themselves as men with women, that women are less likely to feel comfortable with location tracking, and significantly more likely to turn off GPS or location sharing on mobile devices. Zeynep Tufekci found that men are three times more likely than women to include their contact details in their social media profiles, even after controlling for privacy and audience concerns, suggesting women are “seeking to avoid the risk of unwanted attention”.
The possible gendered privacy harms compound further when we look outside the gender binary. Trans and gender non-conforming people experience stigma and discrimination at high rates, and many make deliberate choices regarding to whom they disclose details about their gender identity or biological sex characteristics. Organisations wishing to collect data on gender need to very carefully consider the possible harm that could be caused should the personal information of gender diverse individuals be inappropriately or unlawfully accessed, used, or disclosed. In some cases, the very act of attempting to collect gender data inappropriately can cause unnecessary stress for many individuals.
Sexist algorithms
The public and private sectors alike are increasingly incorporating and, in some cases relying upon, algorithmic systems, including use of machine learning and automated decision-making systems. The existence of bias in these kinds of systems is well documented, with an increasing amount of research into the area. Here is just a small handful of examples:
- Amazon’s AI recruitment tool showed bias against women
- Virtual assistants reinforce gender stereotypes
- Microsoft study found gendered connotations in word embeddings (which underpin a lot of machine learning applications which rely on language processing)
- Machine learning algorithm learns sexism from photographs
- Voice recognition cannot understand women’s voices
The harm caused to women by these systems only increases for those who also intersect with other marginalised or minority identities, including in relation to race, disability, class and sexuality.
While upholding privacy cannot solve all the challenges associated with the use of algorithmic systems and associated risks of bias, discrimination or unfair outcomes, a robust Algorithmic Impact Assessment can go a long way to ensure that the personal information being used as inputs into these systems has been tested for fairness and accuracy. If we take an expansive view of privacy, we can use privacy risk assessment as a tool to examine the power structures of these systems, and put safeguards in place to mitigate potential gendered and other discriminatory harms.
Should we even collect gender?
We all know the drill about collection minimisation: only collect personal information that is necessary for a given purpose. But it often seems that many organisations go into a kind of autopilot at this step: yes of course we need name, date of birth, gender. Do you really, though? Collection of gender should not be the default, and it’s worth interrogating when it is actually necessary to know someone’s gender, and for what purpose.
Herein lies another tension: it’s unfortunately not as simple as just not collecting gender data at all. In many cases, a lack of data on gender can cause its own form of harm. In Invisible Women, Caroline Criado Perez highlights the extent to which the world has been designed by and for cisgender men. From medical testing to safety designs and protective clothing, to the size of everyday appliances, Criado Perez emphasises the very real harm that occurs as a result of taking a ‘gender neutral’ approach which actually results in using the ‘standard male’ as the default. While Invisible Women is not without its flaws, and has been criticised for using a male/female binary which ignores other genders and sex variations, it does serve as a useful collection of evidence of how male-default thinking creates real-world problems for anyone who is not a cisgender man.
Collecting accurate gender data in order to ensure a policy, program, or product is designed in a way that meets the needs and experiences of people across all genders is really important. But it always needs to be balanced against the right to privacy, including consideration when it is necessary and proportionate to know someone’s gender.
In a report specifically examining privacy and gender, the UN Special Rapporteur for Privacy suggests that, among other things, any requirement for individuals to provide sex/gender information should be:
- Relevant, reasonable and necessary as required by the law for a legitimate purpose
- Respectful of the right to self-determination of gender, and
- Protected against arbitrary or unwanted disclosure or threatened disclosure of such information.
The report also recognised that “privacy offers protection against gender-based violence, discrimination, and other harms that disproportionately affect women, intersex, and gender non-conforming individuals.”
Once an organisation decides it is indeed necessary to collect gender data, it must also consider carefully how to ask for gender identity in a respectful, inclusive and meaningful way. If you wish to collect accurate data (and meet the requirements of the data quality privacy principle!), then simply offering ‘male’ or ‘female’ options is not good enough.
Here is a non-exhaustive list of tips for organisations to consider when asking for gender details:
- Be really clear what it is you are actually asking people for. For example, do you need to know someone’s biologically assigned sex at birth for a specific medical purpose? Or do you need to understand someone’s gender identity in order to provide them with the correct services?
- Be careful not to confuse gender identity with sexual orientation
- Consider providing an option that enables people to self-determine their gender
- Include a consideration of gendered impacts when assessing and mitigating against privacy risks, including consideration of the possible harms that could occur as a result of inappropriate disclosure of an individual’s gender identity
For more guidance, see this guide to collecting gender data inclusively from the Canberra LGBTIQ Community Consortium, or this one from Monash University.
The 2021 census has provided us with an example of what not to do. While there was an option for people to self-enter their gender in a free-text field, the ABS noted that those who chose the non-binary option would ultimately be randomly assigned a binary sex: male or female. What followed was outcry that this would not capture an accurate picture of the gender diversity in Australia, and in turn erase trans and gender diverse people. Further, while the inclusion of a free-text field was a welcome improvement to earlier iterations of the census, it was not an option on the paper form. This left trans and gender diverse people who wished to complete the form by hand, for reasons including ability and accessibility, with no choice but to misrepresent their gender.
The paper form is also widely regarded as the more privacy-enhancing option, which meant that many were left with a choice: the increased privacy protection of a paper form, or the ability to identify their gender in a way that is meaningful to them. Nobody should have to make that kind of choice. Given that gender diverse people continue to be subject to stigma and discrimination in Australia, the privacy of their personal information should be of utmost importance.
When in doubt, go back to basics
Long established privacy considerations such as necessity and proportionality still go a long way when determining when it is reasonable to collect gender data, and what you may wish to do with it. Collection of gender information should never be the default, as with collating any other personal information. However, organisations should take care to avoid applying ‘male-default thinking’ to their programs and projects. It is not acceptable to cite privacy as the rationale behind avoiding the work of collecting inclusive gender data and ensuring that outcomes do not adversely impact people who are not the considered the ‘male standard’. Regardless as to whether gender data is collected or not, it is always important to consider the impacts on women, as well as trans and gender diverse people, when assessing privacy risk.