Machine Learning (ML) and Deep Learning (DL) can be used to analyze a tremendous number of images, extract useful information and make decisions about them (Machine_Learning&Artificial_Intelligence, 2017) like classifying E-mails, recommending videos, diseases prediction, recognizing handwriting ((LAB):CrashCourseAi#5, 2019) etc. ML gives computers the ability to extract high-level understanding from digital images (CrashCourseComputerScience#35, 2017).
The first appearance of such a model back in 1993, but the first actual use was in 2012 due to the GPUs’ development and the massive increase in data sizes (ImageNet, for example) (Karpathy, 2015).
ConvNet takes a 256x256x3 image as input and produces a probability of each output (class). The class with the highest probability will be chosen. At each layer, ConvNet performs convolution using filters, getting information like edges, color, etc. (CrashCourseComputerScience#35, 2017). More complex features will be extracted when we go deeper and deeper into the network. At the training process, filters are initialized randomly and trained until the network learns to match the image with the correct class (Karpathy, 2015). The training process of a deep network is complicated and takes much more time than traditional ones. Still, the accuracy is much better than the deep networks’ ability to handle massive data (ALPAYDIN, 2016).
Karpathy ConvNet to Classify Selfie Images
Karpathy applied the following vital steps to classify selfie images into good and bad:
- Gathering images tagged with #Selfie word (5 million images).
- Organizing the dataset: Karpathy divided the dataset into 1-million good and 1-million bad selfies based on some factors like the number of people that have seen the selfie, number of likes, number of followers and number of tags. 100-based groups were stored as good selfies while the rest ones stored as bad ones.
- Training: Karpathy selected the VGGNet pre-trained model and used Caffe to train it on the collected selfie dataset. ConvNet tuned its filters in a way that best allows the separation of the good and bad selfies under a well-known method called supervised learning (Dougherty, 2013).
- Results: The author selected the best 100 selfies out of 50000 selected by ConvNet. He introduced some advice to take a good selfie based on ConvNet results like females occupying about 30% of the image, cutting off the forehand, showing long hair, etc. He concluded that the style of the image was the key feature to make a good selfie.
- Extensions: The author also performed three different tasks; the first was the classification of celebrities’ selfies. Although there were specific factors to select the best selfies, oppose examples like including men and illumination problems appeared in some of the best selfies. The second task was to apply the t-SNE algorithm taking images and making some clustering by grouping them into categories based on similar conditions like the L2 norm. Results showed clusters like sunglasses, full-parts and mirror-included. The third task was to discover the best crop of a selfie. Karpathy randomly cropped image and introduced fragments to ConvNet, which decided the best crop. He found that ConvNet prefers selfies with heads taking about 30% of the image and chops off the forehead.
In some cases, ConvNet selected rude crops. Karpathy inserted a spatial transformation layer before the ConvNet and backpropped into six parameters defining an arbitrary crop. This extension didn’t work well. It sometimes was stuck. He also tried to constraint the transform, but it wasn’t helpful. The good news is that no global search is needed if the transform has three bounded parameters (Karpathy, 2015).
- Availability: Anyone on Twitter can use the “deepself” bot designed by karpathy to analyze his/her selfie and get the score of goodness his/her selfie is.
(LAB):CrashCourseAi#5. (2019). Retrieved from YouTube: https://www.youtube.com/watch?list=PL8dPuuaLjXtO65LeD2p4_Sb5XQ51par_b&t=67&v=6nGCGYWMObE&feature=youtu.be
ALPAYDIN, E. (2016). Machine Learning: The New Al . Cambridge: Massachusetts Institute of Technology.
CrashCourseComputerScience#35. (2017). Retrieved from Youtube: https://www.youtube.com/watch?v=-4E2-0sxVUM
Dougherty, G. (2013). Pattern Recognition and Classification. New York: Springer Science+Business Media.
Karpathy, A. (2015). https://karpathy.github.io/2015/10/25/selfie/. Retrieved 2020, from karpathy.github.io/2015/10/25/selfie
Machine_Learning&Artificial_Intelligence. (2017). Machine Learning & Artificial Intelligence. Retrieved from YouTube: https://www.youtube.com/watch?v=z-EtmaFJieY&t=2s