Gender Bias & Artificial Intelligence: Questions and Challenges

Gender Bias & Artificial Intelligence: Questions and Challenges

By Deborah Oliveros

  • Abstract
  • Introduction
  • A (Unnecessarily) Gendered Representation of AI
  • A ‘Knowing’ Subject
  • Replicating Human Biases
  • Challenges of Imbedded Biases in AI
  • Possible Approaches to Addressing Bias in AI
  • Conclusion
  • Bibliography


This essay aims to analyze the different ways in which bias is imbedded in AI, machine learning, and deep learning, addressing the challenges from a design perspective to understand how these systems succeed in some respects and how they fail in others. Specifically, this paper focuses on gender bias and the significance of a gendered representation of technology in mass media and marketing as an obstacle to not only understand the process of human interaction with these technologies but to replicate historical biases present in society. Lastly, this analysis strives to present possible approaches necessary to address gender bias in AI.

  • Introduction

Over the last couple of years, prominent figures and big companies in Silicon Valley have participated in public debate around the benefits and concerns about artificial intelligence and machine learning and its possible consequences for humanity. Some embrace technology advancement openly, advocating for a non-limiting environment arguing that, otherwise, it would prevent from progress and innovation in the field. Others offer warnings similar to those of a sci-fi dystopian film; they argue that artificial intelligence could be an existential threat to humanity if -or more likely when- machines become smarter than humans. Although the potentiality of a world controlled by machines as the ones presented in The Matrix (Wachowski sisters, 1999), Blade Runner (Scott, 1982), and 2001: A Space Odyssey (Kubrick, 1968) is unsettling and borderline terrifying, there are more urgent and realistic questions to address around the issues of AI and its impact on society.

2001: A Space Odyssey (Stanley Kubrick, 1968)

Machines learn based on sets of data that humans ‘feed’ them. If machines are learning how to imitate human cognitive processes, what kind of human behaviors, social constructions, and biases are these machines picking up and replicating based on the data provided?

There is a long history of cases in which technology has been designed with unnecessarily deterministic biases on it: the famous low bridges in New York preventing minorities from using public transportation to go to the beach; the long-time perpetuated ‘flesh’ labelling of crayon colors, also on band aids, paint, and ballerina shoes; the famous case of Kodak’s Shirley Cards used by photo labs to calibrate skin tones, shadows and light during the printing process of color film, making it impossible to print darker skin facial expressions and details, among others.

Kodak Shirley card, 1974

We couldn’t expect different than the replication of these patterns when it comes to artificial intelligence and machine learning. In this case, both the design of the technology and the set of data that we are feeding into the machines are primary factors of this issue. There is a systemic, systematic, racist, sexist, gendered, class-oriented -and other axes of discrimination- bias embedded in most data collected by humans, and those patterns and principles are being picked up and replicated by the machines by design. Therefore, instead of erasing divisions through objectivity in decision making, this process is exacerbating inequality in the workplace, the legal and judicial systems, and other spaces of public life in which minorities interact, making it even more difficult to escape from it.

The data fed to the machines is diverse: images, text, audio, etc. The decision of what data is fed to the machine and how to categorize it is entirely human. Based on this, the system will build a model of the world accepted as a unique and stable reality. That is, only what is represented by the data have the meaning attached to it, without room for other ways of ‘being’ in the world. For example, facial recognition trained on data of overwhelmingly white men as successful potential candidates for a job position, will struggle to pick up others that don’t fit into those categories.

Police departments have also used data-driven systems to assess the probability of crime occurring in different areas of a city and, as it was discussed before, this data is polluted with systemic racism and class discrimination of minorities. Therefore, the immediate consequence is over policing of low-income areas and under policing of wealthy neighborhoods. This creates and perpetuates a biased cycle but, more importantly, it creates a false illusion of objectivity and shifting of responsibility from the human to the machine. Crawford says, “predictive programs are only as good as the data they are trained on, and that data has a complex history” (Crawford 2016, June 26).

  • A (Unnecessarily) Gendered Representation of Technology

There is a challenge to analyze how we perceive something that is invisible to us, not only physically but also cognitively. Two aspects need to be taken into account to get to the root of why the general public does not fully understand how these systems work: the lack of transparency from companies to reveal how these systems make data-driven decisions due to intellectual property and market competition; and the gendered marketing of these technologies to the users in combination with a gendered representation in pop culture media that is not only inaccurate but misleading. Let’s start by addressing the latter.

For decades, visual mediated spaces of representation such as movies and tv in the genre of sci-fi, have delved into topics of technology and sentient machines. Irit Sternberg states that these representations tend to ‘gender’ artificial intelligence as female and rebellious: “It goes back to the mother of all Sci-Fi, “Metropolis” (Lang, 1927), which heavily influenced the futuristic aesthetics and concepts of innovative films that came decades later. In two relatively new films, “Her” (Jonze, 2013) and “Ex-Machina” (Garland, 2014), as well as in the TV-series “Westworld” (2016), feminism and AI are intertwined.” (Sternberg 2018, October 8).

Alicia Vikander

Ex Machina (Garland, 2014)

These depictions present a gender power struggle angle between AI and humans, which is at times problematic and at others empowering: “In all three cases, the seductive power of a female body (or voice, which still is an embodiment to a certain extent) plays a pivotal role and leads to either death or heartbreak” (Ibid). This personification of AI invites the viewer to associate the technology with a power struggle that already exists in our own historical context, which in turn makes it difficult for the general public to go beyond the superficial layers of explanations of a regular tech news article that fails to address how these technologies work from a conceptual level. The over-generalizing paranoid headline seems to be catchier than an informative analysis in those cases. On the other hand, the representation of the level of agency in a female-gendered AI offers the imagined possibility that, through technology, systematic patriarchal oppression can be challenged and surpassed by the oppressed.

In spite of these manifestations of gender roles combined with AI, the reality is far from empowering: gender discrimination in algorithms is present in many spaces of social life. Even more problematic are the non-fictional representations of technology, in particular AI, as gendered.

AIs are marketed with feminine identities, names and voices. Examples such as Alexa, Siri, Cortana demonstrate this: even though they enable male identities, the fact that the predetermined setting is female speaks loudly. Another example is the female humanoid robot Sophia, developed by Hanson Robotics in Hong Kong, built as a representation of a white slender woman with no hair (enhancing her humanoid appearance) and, inexplicably, with make up on her lips, eyes and eyebrows. Sophia is the first robot to receive citizenship of any country (Saudi Arabia), it was also named United Nations Development Programme’s first ever Innovation Champion, making it the first non-human to be given any United Nations title.

Sophia The Robot.

These facts are mindboggling. As Sternberg asks, “why is it that a feminine humanoid is accepted as a citizen in a country that would not let women get out of the house without a guardian and a hijab?” (Sternberg 2018, October 8). What reaction do engineers and builders assume the female presence and identification generates during the human-machine interaction?

Sternberg says that, fictional and real decisions of choosing feminine characters are replicas of gender relations and social constructs that already exist in our society: “does giving a personal assistant feminine identity provide the user (male or female) with a sense of control and personal satisfaction, originating in the capability to boss her around?” (Ibid). As a follow up question, is that what we want the machines to learn and replicate?

  • A ‘Knowing’ Subject

Artificial intelligence (and machine learning and deep learning as subcategories) is built and designed to acquire and process human knowledge and improve its decisions on categorization over time.

Gary Marcus says, “Deep learning systems are most often used as classification system in the sense that the mission of a typical network is to decide which of a set of categories (defined by the output units on the neural network) a given input belongs to. With enough imagination, the power of classification is immense; outputs can represent words, places on a Go board, or virtually anything else. In a world with infinite data, and infinite computational resources, there might be little need for any other technique” (p. 4).

However, the data in our world is never infinite and does not necessarily have a definite and unchanging meaning or interpretation, which limits the scope of AI and machine learning and its accuracy on representing the reality of said world, “Instead, systems that rely on deep learning frequently have to generalize beyond the specific data that they have seen, whether to a new pronunciation of a word or to an image that differs from one that the system has seen before, and where data are less than infinite, the ability of formal proofs to guarantee high-quality performance is more limited” (Ibid).

As stated before, these systems will know what we teach it, and the nature of that knowledge and the power dynamics surrounding it are inherently problematic. Early feminist theorists and social critics raised questions about how the knowledge will inform the identity and ‘world view’ of the ‘knowing subject’, offering contrasting takes on gender, class and racial determinism while also presenting the possibility of “un-situated gender-neutral knowledge (“a view from nowhere”) or lack thereof” (Sternberg 2018, October 8).

Critics also pointed out how ambitious projects designed around mastering expertise and knowledge about a topic might be tainted in said ‘expertise’, taking into consideration the origin of the ‘expert’ knowledge being fed to the machines: “the role of the all-male-all-white-centuries-old-academia in defining what knowledge is valuable for a machine to master and what is expertise altogether” (Ibid).

All of these characteristics have to be put in conversation with the fact that we are at the very early stages of AI. However, even at its infancy, AI and machine learning are already impacting the way we function as a society, not only in the technological aspect but social, health, military and employment as well.

  • Replicating Human Biases

A group of researchers from Princeton University and University of Bath conducted a study in which they tested how ordinary human language applied to machine learning results in human-like semantic biases. For this experiment, the authors replicated a set of historically known biased dichotomies of different terms, “using a […] purely statistical machine-learning model trained on a standard corpus of text from the Web.” (Caliskan, A, Bryson, JJ & Narayanan, A 2017, p. 2). “Our results (fig. 1) indicate that text corpora [the machine learning system that was tested] contain recoverable and accurate imprints of our historic biases, whether morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names” (Ibid).

They tested various dichotomous terms that are considered systematically stereotypical, demonstrating that the terms have an underlying historical -and contextual- understanding that the machine might not be processing it as such but is replicating: “these results lend support to the distributional hypothesis in linguistics, namely that the statistical contexts of words capture much of what we mean by meaning. Our work suggests that behavior can be driven by cultural history embedded in a term’s historic use. Such histories can evidently vary between languages” (p. 9).

(Fig. 1) Caliskan, A, Bryson, JJ & Narayanan, A (2017)

Machine learning technologies are already being used in many contexts in which these biases deeply impact minorities, and specifically women. They are used for resume screening resulting in cultural stereotypes and prejudiced outcomes around gender-professions perception. Another study from Carnegie Mellon University found that women were less likely than men to be shown ads on Google for highly paid jobs (Amit Datta, Michael Carl Tschantz, Anupam Datta, 2015).

Karen Hao, for the MIT Technology Review, looks at a study performed by Muhammad Ali and Piotr Sapiezynski at Northeastern University, analyzing the impact of variations on ads in regards to their target audience based on data, finding that those variations have an impact on the audience that is reached by each ad. Unsurprisingly, the decision of who is shown each ad is biased.

Hao says, “bias occurs during problem framing when the objective of a machine-learning model is misaligned with the need to avoid discrimination. Facebook’s advertising tool allows advertisers to select from three optimization objectives: the number of views an ad gets, the number of clicks and amount of engagement it receives, and the quantity of sales it generates. But those business goals have nothing to do with, say, maintaining equal access to housing. As a result, if the algorithm discovered that it could earn more engagement by showing more white users homes for purchase, it would end up discriminating against black users” (Hao 2019, February 4).

However, Hao also explains that the problem cannot be generalized to a biased data issue, “bias can creep in long before the data is collected as well as at many other stages of the deep-learning process” (Hao 2019, April 5). She specifically refers to three stages:

  1. Framing the problem: the goal that the designer plans to achieve and its context might not take into account fairness or discrimination
  2. Collecting the data: “either the data you collect is unrepresentative of reality, or it reflects existing prejudices” (Ibid)
  3. Preparing the data: “selecting which attributes you want the algorithm to consider” (Ibid)
  • Challenges of Imbedded Bias in AI

Gary Marcus offers a very detailed critique of the field of AI in “Deep Learning: A Critical Appraisal”. His article is presented as an intentional self-introspective snapshot of the current state of deep learning. It looks not only at how much has been accomplished but also how it has failed and what that presents for different approaches to deep learning in the future.

He says, “deep learning currently lacks a mechanism for learning abstractions through explicit, verbal definition, and works best when there are thousands, millions or even billions of training examples, as in DeepMind’s work on board games and Atari. As Brenden Lake and his colleagues have recently emphasized in a series of papers, humans are far more efficient in learning complex rules than deep learning systems are (Lake, Salakhutdinov, & Tenenbaum, 2015; Lake, Ullman, Tenenbaum, & Gershman, 2016).” (p. 7)

As it was mentioned many times before, deep learning struggles to offer outputs that accurately reflect complex human concepts that are difficult to represent as computational, as a set of yes and no answers. It can go from translation of languages, to more abstract concepts such as justice. Referring to my personal favorite example of open-ended natural language, Marcus says, “In a problem like that, deep learning becomes a square peg slammed into a round hole, a crude approximation when there must be a solution elsewhere.” (p. 15)

Another observation present in Marcus’ analysis refers to the approach of a real world taken as a ‘set in stone’ reality: “deep learning presumes a largely stable world, in ways that may be problematic: The logic of deep learning is such that it is likely to work best in highly stable worlds, like the board game Go, which has unvarying rules, and less well in systems such as politics and economics that are constantly changing.” (p. 13)

Not only the world and our knowledge of it is constantly changing, but our representation of that reality through data is most of the times inaccurate at best, skewed at worst. To what extent and what are the different ways in which we can see the impact of such flawed outputs? Sternberg presents two aspects to take into consideration:

  • What exists in the data might be a partial representation of reality:

Even that partial representation might not be entirely accurate. For example, the previously mentioned case of Kodak’s film being unable to efficiently capture non-white tones of skin is also present in facial recognition systems. Other more recent controversial cases are of systems mistaking pictures of Asians as ‘blinking’ and identifying black people as gorillas. The social cost of a mistake in any AI system being used by the police for decision-making is higher and more likely to present results that are less accurate with minorities since they were underrepresented and misrepresented in the data-set: “This also calls for transparency regarding representation within the data-set, especially when it is human data, and for the development of tests for accuracy across groups” (Sternberg 2018, October 8).

  • Even if the data does represent reality quite truthfully, our social reality is not a perfectly-balanced and desired state that calls for perpetuation:

Gender and racial biases present in the binary terminology is, after all, based on statistics present in the off-line world, well-documented in history. However, here Sternberg presents an optimistic perspective, “our social reality is not a perfectly-balanced and desired state that calls for perpetuation” (Ibid). This meaning that we are giving a deterministic characteristic to the data in this process when these are not the ideas, concepts and human values we should be preserving or basing our technology on.

In regards to that, Sternberg criticizes the absolute faith in the outcome of these systems regarding them as more objective than humans: “Sexism, gender inequality and lack of fairness arise from the implementation of such biases in automation tools that will replicate them as if they were laws of nature, thus preserving unequal gender-relations, and limiting one’s capability of stepping outside their pre-defined social limits” (Ibid).

Marcus’ premise agrees with Sternberg, focusing more on the problem of thinking about deep learning as the only tool available to understanding and digitalizing the world when, in reality, this tool might not fit every problem we want to fix: “the real problem lies in misunderstanding what deep learning is, and is not, good for. The technique excels at solving closed-end classification problems… And some problems cannot, given real world limitations, be thought of as classification problems at all.” (p. 15)

What is most thought-provoking about Marcus’ article is the proposal to see deep learning beyond this set box in which every human problem/though must be filtered through. To understand that we have to develop other hybrid ways in which we can analyze these problems beyond classifications instead of trying to make the “square leg fit into the round hole” (p. 15).

  • Possible Approaches to Addressing Bias in AI

Now we’ll address the remaining challenge presented a few sections above: it is difficult to analyze our perception and consequently actions regarding something that is invisible to us. The lack of transparency from companies to reveal how these systems make data-driven decisions due to intellectual property and market competition, is one of the main reasons why we don’t have access to this knowledge.

However, even if companies were coerced into sharing this information with the general public or authorities, the reality is that artificial technology is not only extremely young but evolving as we speak. Therefore, it can be said that the use of these technologies is both a process of creation and discovering at the same time. Based on what is public, engineers don’t fully know or understand how artificial technology learns and evolves with time. And whatever they know, they are not willing to share because of the conditions of the market in which they operate.

Crawford explains in regards of the case of women not seeing ads for high-paying jobs, “the complexity of how search engines show ads to internet users makes it hard to say why this happened — whether the advertisers preferred showing the ads to men, or the outcome was an unintended consequence of the algorithms involved. Regardless, algorithmic flaws aren’t easily discoverable: How would a woman know to apply for a job she never saw advertised? How might a black community learn that it was being overpoliced by software?” (Crawford 2016, June 26).

In terms of social actors that are invested and can influence how these technologies are managed, we can find that governments, NGOs and other entities have a stake into the outcomes of artificial intelligence and machine learning. Unfortunately, in the environment that was previously described of lack of information, they all pretty much operate ‘in the dark’ or, at least, at various levels of ‘darkness’.

A great example of how unprepared our politicians are to deal with this reality and attempt to hold tech companies accountable, happened a few months ago in the House Judiciary Committee. During the hearing of Google CEO Sundar Pichai, the members of the committee spent more time on passive-aggressively asking embarrassingly ignorant questions, with a clear partisan tone, than asking urgent, and appropriate questions around Google’s data policies and privacy practices. At one point, Pichai had to repeatedly explain that iPhone was a product of Apple, a different company than Google, and the collective groan of humanity could be heard across the globe.

It is clear that regulation and outsider audits are necessary to address the issue of gender bias in AI. However, it seems unlikely that something even remotely close to a proposal will make its way to congress anytime soon, let alone pass as a bill. Therefore, there is a need to find alternative ways in which actors can collaborate and share information towards the common goal of fixing and preventing the perpetuation of historical bias in AI. Evidently, the ones who have more possibilities of enacting a change are the engineers and companies themselves.

The authors of the language-based study from Princeton University and University of Bath offer: “we recommend addressing this through the explicit characterization of acceptable behavior. One such approach is seen in the nascent field of fairness in machine learning, which specifies and enforces mathematical formulations of non-discrimination in decision-making (19, 20). Another approach can be found in modular AI architectures, such as cognitive systems, in which implicit learning of statistical regularities can be compartmentalized and augmented with explicit instruction of rules of appropriate conduct (21, 22)” (Caliskan, A, Bryson, JJ & Narayanan, A 2017).

However, how can a solution such as this one be enforced and regularly supervised? We need organizations that address issues of technology and human rights to serve as intermediaries with the companies and the civil society, as they have done in the past since the creation of the internet.

If machines are going to replicate a human, what kind of human do we need them to be? This is a more present and already underway threat than a dystopian apocalypse in which humanity is decimated by their own creation, the Frankenstein old tale. As Kate Crawford wrote in the New York Times, the existential threat of a world overtaken by machines rebelling against humans might be frightening to the male white elite that dominates Silicon Valley, “but for those who already face marginalization or bias, the threats are here” (Crawford 2016, June 26).


            Gender bias in AI, machine learning, and deep learning is the result of the replication by design of a deeply systemic, systematic, racist, sexist, gendered, class-oriented -and other axes of discrimination- bias embedded in most data collected by humans. Instead of erasing divisions through objectivity in decision making, this process is exacerbating inequality in the workplace, the legal and judicial systems, and other spaces of public life in which minorities interact. This happens in combination with an inaccurate and gendered representation of technology both in pop culture media as in marketing, making it more difficult for the general public to become aware and understand how these technologies work and their impact in our day-to-day lives. Bias can be introduced in the process by how the problem is framed, how the data is collected, and what meanings are attributed to that data (Hao 2019, April 5). Fixing gender bias in AI is a complex issue that requires the participation of all stakeholders: the companies, the designers, the marketing teams, tech reporters, intermediary collective organizations that advocate for civil society, and politicians. However, the major challenges boil down to the lack of transparency on how these systems make decisions and regarding them as the only filter through which every abstract human problem can be solved.