Whether or not a person has had direct experience with virtual assistants, many can describe the voice and even some of the quirky comebacks that virtual assistants can give to a user. This is largely due to the integration of virtual assistants into popular culture and media. However, the operations of virtual assistants are wildly unknown, except for the need to be connected to the internet to work properly. A deeper look into the process of how virtual assistants work shows that the voice speaking to the virtual assistant is the input which is then converted into a sequence of frames. Then a deep neural network processes the input in order to produce an output. This process is designed to assess the probability that the input matches the existing patterns of sequence and produces an output. The output can be an answer to a question, an action or a negative response indicating an error with the initial input.
It is interesting that regardless of how advanced that virtual assistants may seem, the input needs to follow an existing set of rules in order to produce a positive output. These rules are based on grammar rules as well as common phrases. The negative outputs have provided developers with room to creatively present the “personality” of the assistant. This can be clearly demonstrated when an iPhone user asks Siri what is zero divided by zero or even when an Amazon Echo asks Alexa where to buy a Google Home. The designed personalities of virtual assistants seem to be the main product differentiators across brands. As we know from several classes, the actual operations of these systems are the same, it is the brand that makes each virtual assistant seem unique.
Apple Machine Learning Journal (1/9, April 2018): “Personalized ‘Hey Siri’.“