Face recognition and convolutional neural network
By Beiyue Wang, Shahin Rafikian and Kevin Ackermann
Face recognition is currently very common in our lives. An increasing number of smart phones replace passwords with face recognition to increase security. Besides, law enforcement agencies are using face recognition more and more frequently in routine policing. Once criminals‘ faces were captured by street cameras, the police are able to immediately compare that photo against one or more face recognition databases to attempt an identification. For instance, last year AI security system had been launched in almost every metro station in Shanghai to track hundreds of wanted criminals. The technology can scan photos from the national database and identify a person from at least 2 billion people in seconds. It was reported that within 3 months, the technology helped the police successfully catch about 500 criminals.
As we know, humans have always had the innate ability to recognize and distinguish between faces, yet computers only recently have shown the same ability. Face recognition needs to be able to handle different expressions, lighting, and occlusions. From this week‘s reading, we know that the realization of face recognition must be attributed to convolutional neural network.
For me, it is very hard to fully understand this kind of technology and I hope to get more information in class. Based on the reading, convolutional neural network where the operation of each unit is considered to be a convolution—that is, a matching—of its input with its weight. (Machine learning) For instance, in CNN network, starting from pixels, we then get to edges, and then to corners, and so on, until we get to an image. The whole process contains mathematics and statistics. The picture below presents the process of classification or recognition, from sensing an image, pre–processing, segment the foreground from background and labeling, feature extraction, post processing, classification to decision.
Indeed, face recognition brings us lots of benefits. However, it also has many shortcomings and problems.
First, with the number of faces into database going up, face recognition is prone to error, because many people look alike in the world. As the likelihood of similar faces increases, matching accuracy decreases. It has been proved that face recognition is especially bad at recognizing the minority, young people and women. Actually, my cell phone always couldn’t recognize my face to open. In my view, solving the problem of accuracy still has a long way to go.
Second, a study purporting to infer criminality from a dataset based on existing prisoners‘ and non–prisoners‘ faces has serious endogeneity problems. Prison itself may advance aging, or affect routine expressions, or even lead to disproportionate risk of facial damage. Besides, many people questioned that training data consisting of prisoners‘ faces is not representative of crime, but rather, represents which criminals have been caught, jailed, and photographed. Indeed, how a classifier operates leaves it vulnerable to a critique of the representativeness of its training data.
Third, as we discuss in last class, face recognition no doubt brings some discrimination problem. For example, the police now use machine learning to get a demographic pattern of criminals so they are likely to watch those people more than others, which causes some discrimination problems.
Facial Recognition and the Inner Workings of Neural Networks
Have you ever stopped to think what it is that determines when you recognize a face? You could break down the constituent elements – eyes, ears, mouth and nose – but why is it that when you see a face, you can immediately recognize it as a face? What’s more, what are the minute details and changes that separate one face from another?
To explicitly tell a machine all of the rules and definitions that make one face unique would be challenging, time-consuming and virtually impossible. Enter convolutional neural networks. Using data pools of thousands of faces, the machine learning program begins to “learn” what differentiates one face from another. Then, once the machine has learned how to recognize a face, it can apply this knowledge to faces that it sees in the future.
Let’s take Apple’s Face ID as an example to further illustrate how this process happens. Face ID is a form of facial recognition built into iPhone models past the iPhone X. The basic concept of Face ID is that the iPhone can recognize its owner’s face, and then use that recognition as a password on the device. According to Apple, “Face ID uses advanced machine learning to recognize changes in your appearance. Wear a hat. Put on glasses. It even works with many types of sunglasses” (iPhone XS – Face ID).
To recognize a face, the first step is to “see” or gather input. To do this, the iPhone projects 30,000 infrared dots on a person’s face, and then uses an infrared camera to take a picture of the facial dot map. This facial map is sent to a chip in the iPhone that uses a neural network to perform machine learning. Basically, what this means is that the chip is able to view the patterns of dots that make up someone’s face and learn these dot maps to recognize the face. The chip is learning to perform a task – recognize a face – by analyzing training examples – the initial Face ID setup (Hardesty, 2017).
To clarify, neural networks, which draw inspiration from neurons in the brain’s structure, could be described as the architecture of the machine. Machine learning is a method of “learning” within a neural network (Nielsen, 2015).
Computer vision is the “broad parent name for any computations involving visual content – that means images, videos, icons, and anything else with pixels involved” (Introduction to Computer Vision, 2018).
So, how exactly does a neural network work to learn and make decisions? Speaking broadly, imagine three sections to the neural network: an input layer, hidden layers, and an output layer. If we’re working with an image, the input layer might be made up of each individual pixel as a node to the input layer. In the case of the iPhone’s Face ID, we can assume that each infrared dot might be a node on the input layer (Machine Learning & Artificial Intelligence…, 2017). Once input data is entered into the neural network, weights are assigned to nodes within the hidden layers. These weights are multiplied together and added in complex ways depending on the input data. Each node within the hidden layer has a certain threshold that, if the threshold is met or exceeded, will “fire” just like an actual neuron. This data, fed into the input layer travels as it fires through the hidden layers “until it finally arrives, radically transformed, at the output layer” (Hardesty, 2017).
As I was reading the way in which every human-technology interaction is a programmed computer function, I began to think about the ways in which future programmed technology interactions (computer and beyond) can become both more personal and intelligent enough to adapt to our needs. But then I realized that our technology and technology programmed systems are already there — smart home devices keep a history of our interaction data for future smart predictions and vocal recognitions, mobile phone keyboards are capable of making keyboard predictions for users, streaming services recommend subscribers with certain shows/movies based on viewing history. It’s all data collection of user experiences. The Alpaydin reading can be in a dialogue with the Karpathy blog post, in that Convolution Neural Networks and scanning algorithms are trained similarly as if you were to train a child new actions and informations. The more you allow for a child to learn and experience something in particular, the more they will be familiar with it. Similarly with the predictive text keyboard functionality on the iPhone, the more an iPhone users sends text messages, the more data the iPhone will store in order to make smart predictions.
Similar e-behavior (courtesy of algorithms designed to collect data and make such predictions) can be seen in the iPhone’s Face-ID, where Apple’s technology is able to scan the registered face in various mediums (e.g. bearded, with makeup, new hair style). It can be explored, however, that similar Convolution Neural Network data-imaging processes can be applied to facial recognition technology to even further strengthen facial recognition capabilities. In regards to Alpaydin’s discussion on social media data, is it possible that there is a breach in security with facial recognition due to how accessible imaging data is on the internet? And with access to technologies such as 3D-printing, it doesn’t seem far from impossible to be able to break into someone’s phone by 3D-printing a face based on predictions of one’s face/head structure, and access technologies that are locked via the 3D-printed face?
Andrej Karpathy, “What a Deep Neural Network Thinks About Your #selfie,” Andrej Karpathy Blog (blog), October 25, 2015, http://karpathy.github.io/2015/10/25/selfie/.
Ethem Alpaydin, Machine Learning: The New AI. Cambridge, MA: The MIT Press, 2016.
Frank Pasquale ,When Machine Learning is Facially Invalid, https://cacm.acm.org/magazines/2018/9/230569-when-machine-learning-is-facially-invalid/fulltext, 2018
Geoff Dougherty, Pattern Recognition and Classification: An Introduction (New York: Springer, 2012). Excerpt: Chaps. 1-2.
Hardesty, L. (2017, April 14). Explained: Neural networks. Retrieved February 6, 2019, from http://news.mit.edu/2017/explained-neural-networks-deep-learning-0414
Introduction to Computer Vision. (2018, April 2). Retrieved February 6, 2019, from https://blog.algorithmia.com/introduction-to-computer-vision/
iPhone XS – Face ID. (n.d.). Retrieved February 6, 2019, from https://www.apple.com/iphone-xs/face-id/
Machine Learning & Artificial Intelligence: Crash Course Computer Science #34 – YouTube. (2017). PBS Digital Studios. Retrieved from https://www.youtube.com/watch?v=z-EtmaFJieY
Nielsen, M. A. (2015). Neural Networks and Deep Learning. Retrieved from http://neuralnetworksanddeeplearning.com