I decided to choose to explore how Google Assistant works because I am an android user. I have a Samsung Galaxy S9 and never used my virtual assistant to the point that I had to Google how to turn it on. After playing around with it I tested all the functions Google Assistant said it could do: search the Internet, schedule events and alarms, adjust hardware settings on the user’s device, show information from the user’s Google account, engage in two-way conversations, and much more!
Google assistant can do all of this through its own natural language processers. From my understanding it follows the same kind of logic that we’ve been learning in the last couple of weeks. The premise is this:
- Using speech to text platform google assistant first converts spoken language to text that the system can understand (Week 6; Crash Course #36). Quick rundown on Speech recognition, using a spectrogram spoken vowels and whole words are convert into frequencies. These frequencies are the same for each vowel and creates what is termed a phoneme. Knowing these phonemes computers can convert speech into text. This text is then further broken down into the data components identifiable through Unicode.
- Once identifying the command or questions Google assistant takes the users inputs and runs it through a complex neural network with multiple hidden layers. I’m unsure what specific type of neural network Google uses, but for a quick rundown on neural networks: there is an input layer, hidden layer(s), and output layer connected through links like brain neurons. Algorithms learn on data sets in the hidden layer to create an output from the inputs given (Week 5; Machine Learning 3+4).
- Google goes through different process for different inputs whether it is a command or question. Producing the output and other required actions. Using the speech synthesis process, reverse of speech recognition process, to present an output to the users.
**I appreciated the figures presented in the beginning and took time to understand them a little more and I think the best in terms of understanding are Fig.1, Fig. 39, and Fig. 47 (I tried to paste them in my post but I don’t think it worked.
Some defining notes from the Google’s Patent:
- Google assistant has various embodiments of a computing device that can work independently or interact with each other.
- The various embodiments Google assistant can take on allows it to have access, process, and/or otherwise utilize information from various devices as well as store memory in these different embodiments.
- Google’s assistant adapts to its users by applying personal information, previous interactions, and physical context to provide more personalized results and improve efficiency.
- Using active ontology and its adaptability to the user mentioned, it can predict and anticipate the next text using active input elicitation technique.
*I short google assistant knows a lot about us and is constantly gathering more data to improve its interface and understanding based on patterns.
Compared to the other virtual assistance like Apple’s Siri or Amazon’s Alexa, Google is more intelligent because it uses its own servers that capable of searching Google’s entire knowledge base for answers. However, Google is not as smart as GPT-3, I use the term smart loosely. GPT-3 is the most advance natural language processing system on the planet. Develop by OpenAI and released last year this is the closest humans have to a machine capable of producing responses coherent responses to any English task. It can do it because it has more parameters, about 175 billion more, to train and learn from. It really is just a bigger version of its predecessor GPT-2 and thus has the same shortfalls that GPT-2 faced regarding comprehension and understanding.
There are a lot of metaphors out there regarding what GPT-3 is and the one I like the most is that it is an improv actor. It can write articulate response that mimic a coherent entity, but it does not understand the meaning behind the text it is writing. The lack of logic and reasoning is evident in the shortfalls regarding semantics and culture. I do not want to completely detract from this momentous step but after further reading I agree with scientist that maybe a new approach is warranted there comes a point when the bigger thing will not work. The computerphile video put it into an interesting context if you want to get to space you get just continue building bigger rockets you have to re-approach the situation. I think this is the point we are at, especially when faced with issues arising from the amount of energy needed to conduct more training computations as well as the inherent racist and sexist bias within this data.
References: