Discussion Notes – Tianyi Zhao and Adey Zegeye
Case Study: What a Deep Neural Network thinks about your #selfie – Andrej Karpathy
ConvNet Training in Karpathy
- Data collection: defining a quick script to gather images tagged with #selfie. (5 million)
- Convolutional networks trained to pick images with at least one face. (2 million)
To decide if the selfie is good or bad:
- ranked the users by the number of followers
- Divided into groups of 100, and sorted by the number of likes
- Top 50 = positive, bottom 50 = negative
- Train a ConvNet with the binary split.
- The ConvNet can be well customized based on different demands of the trainers.
- The database is so large and various. There should be a more specific explanation about the statistical preferences or a detailed discussion.
- The patterns should be strictly selected and apply to any selfie
- The accuracy of ConvNet depends on:
- How well and precise the trainer defines patterns
- Different neural network architectures (Caffe, VGGNet, ImageNet, etc.)
- ConvNet in NLP, in machine translation for example:
According to Ethem Alpaydin, the neural machine translation ends the era of phrase-based statistical translation, because it translates an entire sentence at a time rather than cutting it into words. Recurrent neural networks (RNNs) are prevalent in this field. However, ConvNet has gradually replaced RNNs in language translation. On the one hand, ConvNet can make computation fully parallelized in GPU. ConvNet computes all elements simultaneously, while RNNs operates in a strict left-to-righting or right-to-left order (one word at a time) in which each word must wait until the network finishes the previous one. On the other hand, ConvNet processes information hierarchically, making it “easier to capture complex relationships in the data.” (Gehring & Auli, 2017)
Case Analysis : Key Points and Issues
Although ConvNets can be very useful in picking up pattern in large amounts of data, one of the main issues is that they don’t tell the whole picture. They can only select data based off of set parameters and rules – which does not translate into human problem solving or decision making.
- A ConvNet is a large collection of filters that are applied on top of each other
- They will learn to recognize the labels that we give them
- Context is important when comparing what a neural network can do vs. a human brain
- In the Karpathy article, the 1/3 rule problem resulted in the network choosing a “logical” but not accurate for the purposes of human perception (the way a human would determine whether the selfie is good or not)
- “What we lack in knowledge we make up for in data” – Alpaydin
- Still includes limitations: the problem of binary (two-value logic)
- It is not always true/false , yes/no, we think and speak in more complex patterns
- In some of the selfies considered “good” by the convnet, the entire person is cut out of the image. The network does not have the ability to use context in the way humans do, to know or understand that selfies can be taken in different “moods” or patterns recognized in human language / images
- “we live and think with multi-valued(not true or false, but “all depends…”) reasoning and multimodal(different kinds of statements: hypothetical, imaginary, contrary to fact, ironic, etc.)” – Irvine, 2019
- Cropped Selfies
- Interactive machines can replicate “intelligence” by copying and learning / adjusting but it is not in itself “intelligent”
Another main issue is inaccurate language used to describe what a neural network CAN do. Neural networks don’t think, so the neural network doesn’t “think” your selfie is good or bad – it simply uses the information it is given within a set of parameters to decide if the image is good or bad (using two-value logic). This language proves confusing without the background readings that explain the process behind how a neural network works in comparison to
- The parameters set are very important and heavily influence accuracy
- This can lead to discrimination and ethical issues, who is deciding what features a classifier should be trained to extract?
Andrej Karpathy, “What a Deep Neural Network Thinks About Your #selfie,” Andrej Karpathy Blog (blog), October 25, 2015, http://karpathy.github.io/2015/10/25/selfie/.
Ethem Alpaydin, Machine Learning: The New AI. Cambridge, MA: The MIT Press, 2016.
Gehring, Jonas and Auli, Michael. “A novel approach to neural machine translation. ” Facebook AI Research. May 9, 2017.
Geoff Dougherty, Pattern Recognition and Classification: An Introduction (New York: Springer, 2012). Excerpt: Chaps. 1-2.
Martin Irvine, Introduction: Topics and Key Concepts of the Course (Presentation), 2019.
Peter Wegner, “Why Interaction Is More Powerful Than Algorithms.” Communications of the ACM 40, no. 5 (May 1, 1997): 80–91.