Design/Ethical Implications of Explainable AI (XAI)


This paper will address the research question: what are the design and ethical implications of explainable AI? This paper will argue that there are three main reasons why XAI is necessary for user trust. These reasons pertrain to accountability/trustworthiness, liability/policy evaluation, and human agency/authority. This paper will use de-blackboxing methods and an analysis of current research and examples to uncover how XAI is defined, why it is necessary, and major benefits and criticisms of XAI models. With supporting evidence from Miller et al, the paper will argue that defining explainability to include human explanation models (cognitive psychology/sciences) will be significant to the development of XAI discourse.

Artificial Intelligence applications that use neural networks are able to produce results (i.e. image classification) with high accuracy, but without explanation for human end users, therefore classifying it as a black box system (Abdallat, 2019). Many articles claim that AI should be explainable but are not clear about how “explainable” is defined. This paper will de-blackbox explainable AI (XAI) by looking at how it is defined in AI research, why we need it, and specific examples of XAI models. Finally, it will address one of major gaps in current XAI models by arguing that explainable AI research should adopt an interdisciplinary research approach by building on frameworks of explanations from social science (Miller, Howe, & Sonenberg, 2017).  

XAI Types & Definitions

Opaque Systems

An opaque systems inner workings are invisible to the user. The system is taking in information and outputting new information or predictions, without clear evidence to why or how the output was chosen. In the case where an algorithm can’t provide the programmer with reasoning behind it’s decision-making process, this is considered a “black box” approach, and classified as opaque. Additionally, opaque systems often emerge when closed-source AI is licensed by an organization, and therefore hidden from the public in protection of IP (Doran, Schulz, & Besold, 2017).  

Interpretable Systems

An interpretable system is a transparent model that allows the user to understand how inputs are mathematically mapped to outputs. One example is a regression model, which is linear and uses weights to rank importance of each feature to the mapping. On the other hand, deep neural networks have input features which are learned from non-linearities, therefore would not be considered an interpretable model (Doran, Schulz, & Besold, 2017).

Comprehensible Systems

Comprehensible systems “emit symbols enabling user-driven explanations of how a conclusion is reached. These symbols (most often words, but also visualizations, etc.) allow the user to relate properties of the inputs to their output. The user is responsible for compiling and comprehending the symbols, relying on her own implicit form of knowledge and reasoning about them” (Doran, Schulz, & Besold, 2017).

Why Do we need XAI?

The three main reasons we need AI are as follows:

  1. Accountability + Trustworthiness
  2. Liability and Policy Evaluation
  3. Human Agency and Authority

Accountability, Liability and Policy Evaluation

Explainable AI is specifically important in cases dealing with human health, safety, and liability issues. In these cases, it is ethical to hold someone accountable for incorrect or discriminatory outcomes. Additionally, the issue of explanablity is a factor that can inform policy on whether AI should be incorporated into certain sensitive fields (Paudyal, 2019). For example, should a process like driving a motor vehicle be automated? These questions illuminate the importance of critical discourse that asks hard questions such as: what we are willing to sacrifice as a society for automation and convenience? In 2018, a self-driving car knocked down and killed a pedestrian in Tempe, Arizona (Paudyal, 2019). “Issues like who is to blame (accountability), who to prevent this (safety) and whether to ban self-driving cars (liability and policy evaluation) all require AI models used to make those decisions to be interpretable (Paudyal, 2019). In this case, I argue that when the safety of the public is concerned, it is clear that XAI is necessary.


Trusting a neural network to make decisions will have different implications depending on the task required. One of the strongest arguments for XAI is within the medical domain. If a neural network is built to predict health outcomes for a patient (risk of cancer or heart disease) based on their records, but can’t provide reasoning for the decision – is it ethical to trust it? The lack of transparency is a problem for the clinician who wants to understand the model’s process, as well as the patient who is interested in the proof and reasoning behind the prediction (Ferris, 2018). According to Ferris, empathy is a strong component to the patient-client relationship that should be taken into account when implementing these systems. In the case of medical predictions, I argue that XAI is necessary to ensure a level of trust with their physician. The point of predictive models and algorithms is to help advance user experience (as well as the experience and knowledge of the experts). In the case of patient-physician relationship, trust should be prioritized and XAI methods should be incorporated to support that.


Reversed Time Attention Model (RETAIN)

The RETAIN explanation model was developed at Georgia Institute of Technology by Edward Choi et at (2017). The model was designed to predict if a patient was at risk for heart failure using patient history (including recorded events of each visit). This model aims to address the performance vs. interpretability issue (mentioned in criticism section). “RETAIN achieves high accuracy while remaining clinically interpretable and is based on a two-level neural attention model that detects influential past visits and significant clinical variables within those visits (e.g. key diagnoses). RETAIN mimics physician practice by attending the EHR data in a reverse time order so that recent clinical visits are likely to receive higher attention” (Choi et al., 2016).  

Image Source:

By splitting the input into two recurrent neural nets (pictured above), the researchers were able to use attention mechanisms to understand what each network was focusing on. The model was “able to make use of the alpha and beta parameters to output which hospital visits (and which events within a visit) influenced its choice” (Ferris, 2018).

Local Interpretable Model-Agnostic Explanations (LIME)

Post-hoc models provide explanations after decisions have been made. The key concept in the LIME model is perturbing the inputs and analyzing the effect on the model’s outputs (Ferris, 2018).  This is an example of an agnostic model, meaning the process can be applied to any model and produce explanations. By looking at the outputs, it can be inferred what aspects the model is focusing on. Ferris uses the example of a CNN image classification to demonstrate how this model works in four steps.

Step 1. Begin with a normal image and use the black-box model to produce a probability distribution over the classes.

Step 2. Alter the image slightly (ex. hiding pixels), then run the black-box model again to determine what probabilities changed.

Step 3. Use an explainable model (such as a decision tree) on the dataset of perturbations and probabilities to extract the key features which explain the changes. “The model is locally weighted (we care more about the perturbations that are most similar to the original image.”

Step 4. Output the features (in this case, pixels) with the greatest weights as the explanation (Ferris, 2018).

Criticism and Challenges of XAI

  • Complexity Argument

There are a few major criticisms of explainable artificial intelligence to consider. Firstly, neural networks and deep learning models are multi-layered and therefore complex and overwhelming to understand. One of the benefits of neural networks it’s ability to store and classify large amounts of data (human brains could not process information in this way). According to Paudyal,  “AI models with good performance have around 100 million numbers that were learned during training” (2019). With this in mind, it is unrealistic to track and understand each layer and process of a neural network, in order to find valid source for explanation.

G-Flops vs accuracy for various models | Image source: Paudyal, 2019

  1. Performance vs. Explainability Argument

The second main criticism of XAI is that the more interpretable a model is, the more the performance lags. This ethical implication of this is that efficiency make take precedence over explanation, which could lead to accountability issues.

“Machine learning in classification works by: 1) transforming the input feature space into a different representation (feature engineering) and 2) searching for a decision boundary to separate the classes in that representation space. (optimization). Modern deep learning approaches perform 1 and 2 jointly by via. a hierarchical representation learning” (Paudyal, 2019).

Image Source: Paudyal, 2019

Performance is the top concern in advancement of the field, therefore explainable models are not favored when performance is affected. This factor supports a need for non-technical stakeholders to be a part of the conversation surrounding XAI (Miller, Howe, & Sonenberg, 2017). If the only people with a voice are concerned with performance, it could lead to focus on short-term outcomes rather than the longer term implications for human agency, trustworthiness of AI, and policy.

An Alternative Method: Incorporating XAI Earlier in the Design Process

In contrast to most current XAI models, Paudyal argues that  deciding if an application needs explanation should be discussed early enough to be incorporated into the architectural design (2019).

Image Source: Paudyal, 2019

As an alternative to using simpler but explainable models with low performance, he proposes that (1) creators should know what explanations are desired through consultation with stakeholders and (2)  the architecture of the learning method should be designed to give intermediate results that pertain to these explanations (2019). This decision process will require an interdisciplinary approach, because it is clear that in defining and understanding what type of explainability is needed for a specific application will require discussion across disciplines (computer science, philosophy, cognitive psychology, sociology). “Explainable AI is more likely to succeed if researchers and practitioners understand, adopt, implement, and improve models from the vast and valuable bodies of research in philosophy, psychology, and cognitive science; and if evaluation of these models is focused more on people than on technology” (Miller, Howe, & Sonenberg, 2017).  These disciplines should be working together to discover what systems require explanation and for what reasons, before implementation and testing begins. In the next section, I will de-blackbox this method further by providing limitations and illustrating the method with an example.

Example & Limitations

Paudyal addresses that for this method different applications will require different explanations (loan application vs. face identification algorithm). Although this method would not be agnostic, it supports the fact that complex systems will not be able to be explained in simple ‘one size fits all’ approaches. It is important to address this challenge in order to come up with realistic XAI models that include the socio-political and ethical implications into the design.


The following case looks at the possibility of a system designed to incorporate explanations for an AI application that teaches sign language words. In a normal black-box application system, the AI would identify an incorrect sign but would not be able to give feedback. In this case, explanation will be equivalent to feedback about what was wrong in the sign.  Paudyal’s research found “Sign Language Linguists have postulated that they way signs differ from each other either in the location of signing, the movement or the hand-shape” ( 2019).

Image Source: Paudyal, 2019

With this information, AI models can be trained to focus on the these three attributes (location of signing, movement and hand-shape). When a new learner makes a mistake, the model will be able to identify which mistake was made and provide the appropriate specific feedback (Paudyal, 2019).

Image Source: Paudyal, 2019

The main insight found through this example is that AI models which use possible outcomes in the design of the application, are easier to understand, interpret, and explain. This is due to the human design knowing what the application will be training for. This example also supports the earlier statement that the design will be specific to the application (this process is specific to sign language CNN).


This paper examined several issues with lack of transparency in machine learning and utilization of deep neural networks, specifically in scenarios where responsibility is hard to determine and analyze for policy. These challenges in the AI field have resulted in efforts to create explainable methods and models. From here, another significant challenge was introduced in defining explainability. Through the examples and cases mentioned, it is clear that explainability will have different meaning depending on various factors including the user’s comprehension, background, and industry. Due to this, I argue (with support from Paudyal’s argument) that explainability should be discussed in the first stages of the design process. In doing so, the process is made more clear and it is easier to develop XAI from the beginning of application design, rather than after it is created. This brings authority and agency back in to the hands of humans, and addresses the argument that explainability will affect performance. Although incorporating explanation earlier in the design does have some limitations, it may ultimately lead to better design practices that do not focus on short-term outcomes. Lastly, I close by arguing explainability calls for interdisciplinary collaboration.  “A strong understanding of how people define, generate, select, evaluate, and present explanations” is essential to creating XAI models that will be understood by users (and not just AI researchers) (Miller, 2017). Further research might explore the questions: who is defining XAI, who is XAI designed to appease, and why aren’t experts in human explanation models at the forefront of approaching these questions?


Abdallat, A. J. (2019, February 22). Explainable AI: Why We Need To Open The Black Box. Retrieved from Forbes website:

Choi, E., Bahadori, M. T., Kulas, J. A., Schuetz, A., Stewart, W. F., & Sun, J. (2016). RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism. ArXiv:1608.05745 [Cs]. Retrieved from

Doran, D., Schulz, S., & Besold, T. R. (2017). What Does Explainable AI Really Mean? A New Conceptualization of Perspectives. ArXiv:1710.00794 [Cs]. Retrieved from

Ferris, P. (2018, August 27). An introduction to explainable AI, and why we need it. Retrieved  from website:

Miller, T., Howe, P., & Sonenberg, L. (2017). Explainable AI: Beware of Inmates Running the Asylum. 7.

Miller, T. (2017). Explanation in Artificial Intelligence: Insights from the Social Sciences. ArXiv:1706.07269 [Cs]. Retrieved from

Paudyal, P. (2019, March 4). Should AI explain itself? or should we design Explainable AI so that it doesn’t have to. Retrieved from Towards Data Science website:

Samek, W., Wiegand, T., & Müller, K.-R. (2017). Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. ArXiv:1708.08296 [Cs, Stat]. Retrieved from