Before delving into the key issues and points from Karpathy’s article, we need to deconstruct pattern recognition to its basics. Pattern Recognition is a subset of Machine Learning because it is a process that gives computers the ability to learn from data that can then be used to make predictions and decisions. This process composes of classifying data into categories determined by decision boundaries. The goal is to maximize correct classification while minimizing errors. To do so it goes through a step-by-step process most notable from Dougherty’s reading regarding the image below. The entire method boils down to this:
- Sensing/Acquisition – uses a transducer such as a camera or microphone to capture signals (e.g., an image) with enough distinguishing features.
- Preprocessing – makes the data easier to segment like numerating pixels into a digit by dividing the RGB code of the pixel by 256.
- Segmentation – partitions a signal into regions that are meaningful for a particular task—the foreground, comprising the objects of interest, and the background, everything else.
- Region-based = similarities are detected.
- Boundary-based = discontinuities are detected.
- Feature Extraction –
- Features are characteristic properties of the objects whose value should be similar for objects in a particular class, and different from the values for objects in another class (or from the background). Examples: Continuous (numbers) or Categorical (nominal, ordinal)
- Classification – assigns objects to certain categories based on the feature information by evaluating the evidence presented and decides regarding the class each object should be assigned, depending on whether the values of its features fall inside or outside the tolerance of that class.
The first four steps I interpret as preparing the data and features that the algorithm will apply to the data, and the final step is the where the action occurs in a simple and fast manner. Using a picture for an example, this happens by sending the data in each pixel through this process. Now that we know what pattern recognition consist of, we can further now examine Karpathy’s explanation of Convolution Neural Networks (ConvNet) which is just another form of pattern recognition specifically a type of classification method. Other methods include decision trees, forest (which are just compilations of decision trees), support vector machines, and neural networks.
To understand ConvNets we should start with understanding neural networks. Neural networks are organized in layers connected as links that take a series of inputs and combines them to then emit a signal as an output, both inputs and outputs are represented as numbers. Between the input and output are hidden layers that sum the weighted inputs and then apply a bias. These are initially set to random numbers when a neural network is created, then an algorithm starts training the neural network using labeled data from the training data. The training starts from scratch by initializing filters at random and then changing the filters slightly using a mathematical process by telling the system what the actual image is e.g. a toad vs a frog (supervised learning?). Next it applies the activation function (transfer function) that gets applied to an output performing a final mathematical modification to get the result. ConvNet follows the same principle but has more hidden layers performing more data analysis to recognize complex objects and scenes, this is also termed deep learning.
Karpathy was able to highlight this through a practical example using selfies which I found both amusing and enjoyable. I think the key points he raises that are echoed in the other readings is that pattern recognition is not 100% accurate. The choosing of the features that create the decision boundaries and space result in a confusion matrix that tells what the algorithm got right and wrong. This inability to be 100% accurate is termed the “Curse of Dimensionality” in which the more features we add to make the decisions more precise the more complicated the classification become and as such experts employ the K.I.S.S. method. However, we can program algorithms like ConvNet to be mostly right by identifying features and through repetitive training assist the algorithm to gradually figure out what to look for, this I believe is termed supervised learning or maybe reinforcement learning? In sum ConvNet is a form of pattern recognition used as a tool for machine learning that still has obstacles to overcome but is now being used to interpret data to convert handwriting into text, spot tumors in CT scans, monitor traffic flows on road, propel self-driving car, possibilities are endless!
Questions:
Understanding the definitions of supervised vs unsupervised does that mean supervised learning is pattern recognition? Does that then mean unsupervised learning does not exist, if so, what are some examples?
Where does reinforcement learning fall under supervised or unsupervised?
Are features another term for bias and weights?
References:
Alapaydin, Ethem. 2016. Machine Learning-The New AI. MIT Press Essential Knowledge Series. Cambridge, MA: MIT Press. https://drive.google.com/file/d/1iZM2zQxQZcVRkMkLsxlsibOupWntjZ7b/view?usp=drive_open&usp=embed_facebook.
CrashCourse. 2017a. Machine Learning & Artificial Intelligence: Crash Course Computer Science #34. https://www.youtube.com/watch?v=z-EtmaFJieY&t=2s.
———. 2017b. Computer Vision: Crash Course Computer Science #35. https://www.youtube.com/watch?v=-4E2-0sxVUM.
———. 2019. How to Make an AI Read Your Handwriting (LAB) : Crash Course Ai #5. https://www.youtube.com/watch?list=PL8dPuuaLjXtO65LeD2p4_Sb5XQ51par_b&t=67&v=6nGCGYWMObE&feature=youtu.be.
“Dougherty-Pattern Recognition and Classification-an Introduction-2013-Excerpt-1-2.Pdf.” n.d. Google Docs. Accessed February 26, 2021. https://drive.google.com/file/d/1BT-rDW-mvnCOtUvvm-2xBwzF8_KJcKyI/view?usp=drive_open&usp=embed_facebook.
“What a Deep Neural Network Thinks about Your #selfie.” n.d. Accessed February 26, 2021. https://karpathy.github.io/2015/10/25/selfie/.