AI Algorithms Interpretation- Possible or Not

Jillian Wu

How to interpret algorithms is a rising problem. Governments, organizations, and individuals are all involved in it. This article is motivated by GDPR and attempts to discuss if all algorithms are explainable and understandable. It will introduce different methods to interpret AI programs and discuss the black box that cannot be explained in different algorithms.  


In recent years, AI has developed extremely quickly. Combining with advanced applications in Big Data (BD), Computer Vision (CV), Machine Learning (ML), and Deep Learning (DL), AI/algorithms have made machines/computers capable. It can drive cars, detect faces, interact with humans. However, for the high speed, there are lots of problems with algorithms. There are cases about algorithms related to financial frauds, racism, and violating privacies all over the world. In China, it is reported that users will get a cheaper deal when they order rides online using Android mobiles instead of Apple ones (Wu, 2021). Dark-skinned people are provided fewer services than white people, whereas white people have lower health risks (Ledford, 2021). A self-driving Uber crashed and killed a pedestrian (Gonzales, 2019). Those algorithms harm both individuals and the society. People do not see principles of how programs work. Terms in the AI domain are also complicated and chaotic. Therefore, people start being afraid of AI and put it in an overstate status. The reasons are pretty complicated. It may because the big companies intend to maintain their monopolies, or media agencies trying to take profits from it, or people have so many fantasies about futures and artificial intelligence (though current AI is still in the weak AI phase and far from strong AI like what is performed in movies). How to deal with potential threats? Governments can help. European Union acts first towards those issues. It issued General Data Protection Regulation (GDPR) in May 2018, which has made seven principles to regulate future AI, including the Right of Explanation of Automated Decision (Vincent, 2019).

It will focus on the Explanation of Automated Decision. With the regulation, people could interpret how AI algorithms work and why the data fed to AI systems perpetuate biases and discrimination. If the decision is explainable, it will be easier to assess algorithms’ pros, cons, and risks, so that people can decide to what extent and on what occasions algorithms can be trusted. Also, practitioners will know what aspects they should work on to improve algorithms.

What is Interpretability?

There is no universal definition for interpretability. According to “interpretability is the degree to which a human can understand the cause of a decision” (Miller, 2018). It also said that “interpretability is the degree to which a human can consistently predict the model’s result” (Kim et al., 2016).

People used to instinctively absorb data from the real world and process them employing brains. With those activities, the human could easily understand the world and make decisions. 

However, with human development, the requirement of data is significantly increasing – data are becoming massively. Tools to collect them and deal with them follow up – humans get computers. Therefore, the interpretation progress has changed. People primarily digitalize those data and then utilize algorithms (the black box) to process them. Even though human beings are always the destination (Molnar, 2019), people are eager to know what happened in the black box. What are its logics to make decisions? (Humans can explain their logic to make decisions). They prefer all algorithms interpretable. Nevertheless, for now, people cannot explain all the black boxes. The reasons that many models and applications are uninterpretable come from the insufficient understanding of tasks and targets. As long as the modeler learns more about the mission of the algorithms, the algorithms will perform better.

How to interpret Algorithms?

Since algorithm interpretability is a relatively new topic, researchers have built different systems and standards for assessing it. It introduces two classifications here. The first one has three stages of interpreting, according to Kabul. This one is more friendly to people who are not experts in AI and ML.

 1) Pre-modeling

“Understanding your data set is very important before you start building models” (Kabul, 2017). Interpretable methods before modeling mainly involve data preprocessing and data cleansing. Machine learning is designed to discover knowledge and rules from data. The algorithm will not work well if the modeler knows little about the data. The key to interpreting before modeling is to comprehensively understand the data distribution characteristics, thereby helping the modeler consider more about potential problems and choose the most reasonable model or approach to get the best possible solution. Data visualization is an effective pre-modeling interpretable method (Kabul, 2018). Some may regard data visualization as the last step in data mining to perform the analysis and mining results. However, when a programmer starts an algorithm project, data is the first and most important. It is necessary to establish a sound understanding of the data by visualization methods, especially when the data volume is large or the data dimension is wide. With visualization, the programmer will fully understand the data, which is highly instrumental for the following coding.

2) Modeling

Kabul categorizes models as “white box (transparent) and black box (opaque) models based on their simplicity, transparency, and explainability” (Kabul, 2017). It is easier to interpret white box algorithms like Decision trees than to interpret black box algorithms like deep neural networks since the latter have many parameters. In Decision tree models, the data movement could be clearly traced. People could easily build the accountability system in this program. For example, it can clearly see that “there is a conceptual error in the “Proceed” calculation of the tree shown below; the error relates to the calculation of ‘costs’ awarded in a legal action”(Wikipedia, n.d.) .


3) Post-modeling

Explanation in this stage is utilizable to “inspect the dynamics between input features and output predictions” (Kabul, 2017). Also, since the distinct rules of models, the interpretation methods are model-based. It is basically same with the post-hoc interpretability, so it will further discuss in the following section. 

The second obtains two groups of interpretability techniques and “distinguishes whether interpretability is achieved by restricting the complexity of the machine learning model (intrinsic) or by applying methods that analyze the model after training (post hoc)” (Molnar, 2019), which could be further divided (Du et al., 2019). It is more professional in explaining algorithms.

1) Intrinsic interpretability

It combines interpretability with algorithms themselves. The self-explanatory model is embedded in their structures. It is simpler than post-hoc one, which includes programs like the decision tree and rule-based model (Molnar, 2019), which are explained in former section. 

2) Post-hoc interpretability

Post-hoc interpretability is flexible. Programmers can use any preferred method to explain different models. It has multiple explanations for the same model. Therefore, there are three advantages of post-hoc interpretability: a) the explanatory models could be applied in different DL models; b) it can get more comprehensive interpretations for certain learning algorithms; c) it could use with all forms like vectors (Ribeiro et al., 2016). However, it has shortcomings as well. “The main difference between these two groups lies in the trade-off between model accuracy and explanation fidelity” (Du et al., 2019). By means of external models/structures will be not only arduous but lead to potential fallacies. The typical example is Local Interpretable Model-agnostic Explanations (LIME). It is a third-party model to explain DL algorithms with focusing on “training local surrogate models to explain individual predictions” (Ribeiro et al., 2016b). In LIME, the modeler will change the data input to analyze how predictions will change accordingly. For a diagnosis DL program. LIME may delete some data columns to see whether the results are different from human decision. If the results changed,  the changed data may vital for the algorithms, vice versa. It can also be used for tabular data, text and images, so it is popular in recent. Nonetheless, it is not perfect. Some argue that it only helps practitioner to pick better data. The method used in LIME – Supervised Learning does useless job. It cannot know how decisions are make and how decisions incentivize behaviors. 


Although the methods to explain ML programs are currently booming, it is still difficult to interpret some Deep Learning algorithms, especially Deep Neural Network (DNNs) algorithms.This fact relates to Reframing AI Discourse. ‘Machine autonomy’ is not equal to human autonomy. Although designers set patterns for the AI system, the AI will become an entity (run by rules that may be unexpected when encountering real problems). This kind of entity does not mean AI can determine where it will go by itself but become an independent program if there is no intervention.

Black box in DNNs Algorithms

After the EU released GDPR, Pedro Domingos, the professor of Computer Science in UW, said on Twitter that “GDPR makes Deep Learning illegal.” From his perspective, DL algorithms are unexplainable. 

The black box is still there. In 2020, the image cropping algorithm of Twitter was found racist. It will “automatically white faces over black faces” (Hern, 2020). Twitter soon apologized and released an investigation and improvement plan. However, in its investigation, the modelers stated that their “analyses to date haven’t shown racial or gender bias” (Agrawal & Davis, 2020), which means they did not figure out what leads to bias. They cannot tell where the potential harm comes from. In the future, they intend to change the design principle to “what you see is what you get” (Agrawal & Davis, 2020). In other words, they give up using the unexplainable algorithm and choose the intrinsically interpretable model. This is not the only example. According to ACLU, Amazon Rekognition shows a strong bias on race issues. Although Amazon responded that ACLU misused and mispresented their algorithm, “researchers at MIT and the Georgetown Center on Privacy and Technology have indicated that Amazon Rekognition is less efficient at identifying people who are not white men” (Melton, 2019).

All those cases happened in DL algorithms. The black box in DNNs algorithms comes from the way they work. They imitate human brains to build neurons and set false neural networks with several layers so that the learning algorithm could develop recognitions on what has been learned/processed  (Alpaydin, 2016). “They are composed of layers upon layers of interconnected variables that become tuned as the network is trained on numerous examples” (Dickson, 2020). The theory of DL algorithms is not difficult to explain. In this video, the host thoroughly describes of principles of it and introduces simple DL algorithms. 

However, real cases are much more complex. In 2020, Microsoft released the largest DL algorithm about NLP– Turing Natural Language Generation (T-NLG). It contains 1.7 billion parameters. There are also other algorithms containing billions of parameters like Megatron LM by Nvidia and GPT-2 by OpenAI. How those large algorithms use parameters and combine them to make decisions is currently impossible to explain. “A popular belief in the AI community is that there’s a tradeoff between accuracy and interpretability: At the expense of being uninterpretable, black-box AI systems such as deep neural networks provide flexibility and accuracy that other types of machine learning algorithms lack” (Dickson, 2020). Therefore, it forms a vicious circle. People are constantly building DNNs algorithms to solve complicated problems, but they cannot clearly explain how their programs make decisions. For that, they, then start building new models try to interpret algorithms. However, for models to explain these DNNs algorithms (the black box), people have fierce disputes over them. Professor Rudin thinks that this approach is fundamentally flawed. The new explanation model is guessing instead of deducing. “Explanations cannot have perfect fidelity with respect to the original model. If the explanation was completely faithful to what the original model computes, the explanation would equal the original model, and one would not need the original model in the first place, only the explanation” (Rudin, 2019). Therefore, it is still hard to de-blackbox the DL algorithms. Moreover, the black box is also in the proprietary algorithms, like ones mentioned above (Twitter and Amazon). Companies hide codes to keep the edge over their competitors (Dickson, 2020). That makes de-blackbox unreachable. Although they are working on their own businesses, the potential risks cannot be automatically eliminated. 


Interpretability (de-blackbox) is required by everyone. Companies need it to improve their algorithm quality so that they can make more profits. Individuals need it to ensure that their rights are not harmed and they are treated equally. Governments need it to construct more reliable institutions for people and the society. Although there are many methods to interpret algorithms, they cannot be used universally. How to make all algorithms interpretable should be explored. Governments and corporations should think more about using DL algorithms. Also, the consensus about the role that algorithms play in shaping society should be reached.  



Agrawal, P., & Davis, D. (2020, October 1). Transparency around image cropping and changes to come.

Alpaydin, E. (2016). Machine learning: The new AI. MIT Press.

Dickson, B. (2020, August 6). AI models need to be ‘interpretable’ rather than just ‘explainable.’ The Next Web.,features%20of%20their%20input%20data.

Du, M., Liu, N., & Hu, X. (2019). Techniques for interpretable machine learning. Communications of the ACM, 63(1), 68–77.

Gonzales, R. (2019, November 7). Feds Say Self-Driving Uber SUV Did Not Recognize Jaywalking Pedestrian In Fatal Crash. NPR.

Hern, A. (2020, September 21). Twitter apologises for “racist” image-cropping algorithm. The Guardian.

Kabul, I. K. (2017, December 18). Interpretability is crucial for trusting AI and machine learning. The SAS Data Science Blog.

Kabul, I. K. (2018, March 9). Understanding and interpreting your data set. The SAS Data Science Blog.

Kim, B., Koyejo, O., & Khanna, R. (2016). Examples are not enough, learn to criticize! Criticism for Interpretability. Neural Information Processing Systems.

Ledford, H. (2021, May 8). Millions of black people affected by racial bias in health-care algorithms. Nature.

Melton, M. (2019, August 13). Amazon Rekognition Falsely Matches 26 Lawmakers To Mugshots As California Bill To Block Moves Forward. Forbes.

Miller, T. (2018). Explanation in Artificial Intelligence: Insights from the Social Sciences. ArXiv:1706.07269 [Cs].

Molnar, C. (2019). Interpretable machine learning. A Guide for Making Black Box Models Explainable.

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016a). Model-Agnostic Interpretability of Machine Learning. ArXiv:1606.05386 [Cs, Stat].

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016b). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. ArXiv:1602.04938 [Cs, Stat].

Vincent, J. (2019, April 8). AI systems should be accountable, explainable, and unbiased, says EU. The Verge.

Wikipedia. (n.d.). Decision tree.

Wu, T. (2021, April 21). Apple users are charged more on odering a cab than Android users. The Paper.