How Siri Works

Natalie Guo

Q1: What is Siri?

A1: Siri is an Intelligent Virtual Assistant (IPA) or Intelligent Personal Assistant (IVA), or a Chatbot in common words.

Q2: What does Siri do?

A2: Siri can perform phone actions and natural language interface based on voice/verbal command. It can also perform remote instructions or ganged with third-party apps to better satisfy users’ needs.

Q3: What techniques does Siri need to accomplish the tasks above?

A3: Speech Recognition Engine + Advanced ML tech + Convolutional Neural Network + Long Short-term Memory + Knowledge Navigator + text-to-speech voice based on deep learning technology.

Siri doesn’t “recognize our voice or understand our commends”, it translates the info into digital data/test messages that it can process and match with its database. One possible solution is “recognizing” the info as pieces of sound waves. And notice, each cluster may represent a specific word, when those clusters combine, it generated into a sentence. Inside the Blackbox, a huge database collects a massive amount of “voice wave” samples to let Siri select and learn which cluster represents what natural language meaning How does Siri work? (2011, December 20).

Then, the algorithm behind Siri, the Natural Language Processing is driven by ML techniques, takes away (please correct me if I’m wrong). Siri was made to pick up keywords and important phrases. During the text – speech process, a function called PRAAT, which is developed by Nuance, can take the waveform, turn it into a spectrogram, and create phonetic labels (which recognize the vowel), stress labels, pitch labels, and further decide which part get selected during the interface.

The article of Hey Siri: An On-device DNN-powered Voice Trigger for Apple’s Personal Assistant, it explains in detail. First, the DNN-power (Deep Natural Network) voice trigger keeps Siri “in the cloud” which can hear the user’s command of “Hey Siri” at any moment, then it computes the confidence score to identify if you actually want to wake Siri up.  The two layers used in Siri, one is for detection and the other is for checking.

Question: Siri seems to have trouble when the user suddenly needs to change the command, or punctuate a run-on sentence when there are several subjects occur in the same sentences. Why does it happen and what do we need to work on to make it better?  

References:

Hey Siri: An On-device DNN-powered Voice Trigger for Apple’s Personal Assistant. (n.d.). Apple Machine Learning Research. Retrieved March 15, 2021, from https://machinelearning.apple.com/research/hey-siri

How does Siri work ? (2011, December 20). [Video]. YouTube. https://www.youtube.com/watch?v=loOHmMFVJcE

Inside Nuance: the art and science of how Siri speaks. (2013, September 17). [Video]. YouTube. https://www.youtube.com/watch?v=wu-tIqnUUX8

This Is The Algorithm That Lets Siri Understand Your Questions | Mach | NBC News. (2017, June 28). [Video]. YouTube. https://www.youtube.com/watch?v=uE_WJTnqUwA