Pattern Recognition: The Foundations of AI/ML- Chirin Dirani

The readings and videos for this week add more level of understanding to the foundations of AI and ML. Learning about pattern recognition is another step toward deblackboxing computing systems and AI. According to Geoff Dougherty, pattern recognition is when we put many samples of an image into a program for analysis, this program should recognize a pattern specific to the input image and to identify the pattern as a member of a category or class this program already knows. Because there are many categories or classes, we have to classify a particular image into a certain class, and this is what we call classification. The recognition process happens by training convolutional neural networks algorithms (ConvNets) to help the program recognize the pattern. These ConvNets can be applied to many image recognition problems like recognizing handwriting text, spotting tumors in CT scan, monitoring traffic on roads and much more. Dougherty emphasizes the fact that pattern recognition “is used to include all objects that we might want to classify.” The materials for this class provide many case studies for the applications of pattern recognition through ConvNets. I will start with Andrej Karpathy piece on how to take the best selfie, then will elaborate on digital image analysis as I understood it from the crash course; computer vision, and will end with the crash course video on using pattern recognition for Python code to read our handwriting. 

The first case is the interesting article by Andrej Karpathy. He tried to find what makes a perfect selfie by using convolutional neural networks (ConvNets). For Karpathy, ConvNets “recognize things, places and people in personal photos, signs, people and lights in self-driving cars, crops, forests and traffic in aerial imagery, various anomalies in medical images and all kinds of other useful things.” Karpathy introduced the basics of convolutional neural networks job and was more focused on his applied techniques of using pattern recognition in digital image analysis. By  training ConvNets, the program was able to recognize the best 100 selfies. Despite the fact that this case is an ideal case study for pattern recognition using ConvNets, however, I finished the article with more questions than the ones I had when I started it. The fact that we can feed ConvNets with images and labels of whatever we like! Made me convinced that these ConvNets will learn to recognize the labels that we want. This fact pushed me to question whether objective or subjective these ConvNets are! In other words, would the outputs change according to the gender, race, orientation and motives of the human feeding inputs?

To bridge the gap in understanding the convolutional neural networks algorithms, missing in Karpathy article, and how it works in decision making and pattern recognition (facial recognition here), I relied on the crash course episode on ML/AL. According to this video, The ultimate objective of ML is to use computers to make decisions about data. The decision is taken by using algorithms that give computers the ability to learn from data then make decisions. To start with, the decision process is called classification and the algorithm that does it is called classifier. To train machine learning classifiers to make good predictions, we need training data. Machine learning algorithms separate the labeled data by decision boundaries. At this stage, ML algorithms work on maximizing correct classifications and minimizing wrong ones. Decision tree is one example of ML techniques and it represents dividing the decision space into boxes. The ML algorithm that produces a decision can depend on statistics for making confident decisions or could have no origins in statistics. The decision tree in this case is called artificial neural networks inspired by the neurons in our brains. Similar to brain neurons, artificial neurons receive inputs from other cells, process those signals and then release their own signal to other cells. These cells form into huge interconnected networks able to process complex information. Rather than chemical and electrical signals, artificial neurons take numbers input and release numbers. They are organized into layers connected by links forming a network of Neurons. There are three levels of layers; Input layer, hidden layer/s and output layer. Hidden layers can be many layers and this is where Deep Learning comes from. There are two kinds of algorithms. The first one is sophisticated algorithms but not intelligent (weak or narrow) because they do one thing and they are intelligent at specific tasks such as finding faces or translating texts. The second kind is the general purpose AI algorithms (Strong AI). These algorithms pick up large amounts of information and learn faster than humans (Reinforcement learning).  

As for the second case of image analysis process. We feed an image as an input into a program, once a face in an image is isolated, more specialized computer vision algorithms layers can be applied to pinpoint facial landmarks. Emotion recognition algorithms can also interpret emotion and give computers the ability to understand when the face is happy, sad or maybe frustrated. Facial landmarks capture the geometry of the face, like the distance between eyes, nose or lips size. As the levels of abstraction are used in building complicated computing systems, similarly, they are used for facial recognition. Cameras (hardware level) provide improved sights then camera data is used to train algorithms to crunch pixels to recognize a face and process outputs from those algorithms to interpret  facial expressions. 

The last case is about crash course video on programming ConvNets to recognize handwritten letters and convert them into typed text. In this case, a language called python is used to write codes. The issue here is what Ethem Alpaydin called “The Additional Problem of segmentation,” which is how to write a code that figures out  where one letter ends and another begins. In this case, the neural network are programmed to recognize a pattern instead of memorizing a specific shape. To do so, the following steps should be implemented:

  1. Create a labeled dataset to train the neural network by splitting data into training sets and testing sets. 
  2. Create a neural network. AI should be configured with an input layer, some number of hidden layers and the ability to output a number corresponding to its letters prediction. 
  3. Train, test and tweak the code until l it’s accurate enough.
  4. Scan handwritten pages and use the newly trained neural network to convert into typed text.

In conclusion and to reemphasize what we said in previous classes, computer systems, AI and ML are useful but can not be intelligent like humans. It is all about understanding computing design layers. By understanding the process of pattern recognition today, we reveal  another level in this system and as Professor Irvine says, “There is no magic, no mysteries — only human design for complex systems.”

References:

Crash Course Computer Science, no. 34: Machine Learning & Artificial Intelligence

Crash Course Computer Science, no. 35: Computer Vision

Crash Course AI, no. 5: Training an AI to Read Your Handwriting

Ethem Alpaydin, Machine Learning: The New AI. Cambridge, MA: The MIT Press, 2016.

Geoff Dougherty, Pattern Recognition and Classification: An Introduction (New York: Springer, 2012).

Professor Irvine Introduction Intro to Computing Design Principles & AI/ML Design