3 May 2019
The Real Siri Unpacked: Past, Present, and Future
In today’s pro-tech world, virtual assistants are becoming highly prevalent for at home technology. One assistant in particular, Siri, has transformed the structure and capability of cell phone services for Apple Products and acts as a third party to all of your needs at the touch of a button. The rise and reputation of this technology has inspired further investigation for this research topic. Based off this, a research question to be looked into further would be, ‘Does this type of technology, Siri in particular, pose a serious threat to your personal information, privacy, and overall data ethics’? Some main points to hypothesize would be Siri is a virtual assistant that is always listening to its ‘owner’ or ‘master voice’ and Siri technology is invasive for personal data ownership, usage, and distribution to Apple INC.
One main approach to understanding and answering this question would be to use a socio-historic analysis of Siri in terms of how it evolved, what was the initial goal and purpose, pro’s and con’s to its advancements, and where the technology is headed in terms of re-modeling and further implications for the future. For the analysis, being inclusive as possible, I aim to implement current journal articles on this technology, websites on the history, community blogs for outside opinions, and previous personal blog post insight from our course on virtual assistant technology in general. Having content to unpack from various platforms will be informative and exclusively educational towards de-black-boxing Siri and the future Siri is creating for the world.
In response to my research question, a common reply would be: why does this matter or why should we care? The answer to this question is simple: your privacy is entirely at risk when virtual assistants are being used on a regular basis. For example, if a person is using Siri within their apple device, it acts as a virtual contract that is able to listen and access information about you which is then stored by Apple to benefit their brand and grant ownership to your data. As most would agree, if more research was conducted on the purpose, function, ethics, and effects of Siri, I hypothesize there would be an increased desire for further legal revisions or privacy regulations for the future.
In this paper, I address the origins of Siri, how it was refined, projected in the media commercials, the mechanics to perform its operations, algorithmic functions, cloud connectivity, natural language processing, understanding the connection to the deep neural network, and voice control wake word controversy as the technology continually updates in newer versions. All of these features were dense to de-black-box but more can be understood when we combine all of these components in a socio-technical lens with the combination of its history and how it all came together. Future implications are necessary if we do not want virtual assistants to slowly start governing and controlling our lives in terms of day to day tasks. From the literature I gathered, it is evident that other researchers would agree with my claims and provide support that this technology would indeed pose a serious threat to our personal data, usage, and listening when we are not fully aware of it. It poses an unethical standpoint on its purpose and integration in our lives. Further legal revisions and implications would need to be implemented if we chose to not let artificial intelligence replace human discourse and interface.
When it comes to Apple products, most people would assume Apple masterminds are behind patenting their technology and features independently. But for Siri, this was actually not the case. The idea of Siri technology dates back to the early 1980’s and was fully implemented in Apple near 2003, and wasn’t fully installed until their recent iPhone 6S model which was released in 2013-2014 (Cult of Mac, 2018, pg. 1). During this time, the 80’s were all about technology, innovation, and industrial work. In the 1980’s it was ironically mentioned as a neat feature they would eventually like to acquire or wish they were able to (Cult of Mac, 2018, pg. 1). Another unknown fact was when this idea became rolling, Steve Jobs put most of his final efforts into the Siri feature before his passing, which Apple holds most-high in his honor. Some say this was his final gift in really improving the way cell phones were intended to be used.
As technology was advancing, so was our military and national security. Near 2003, The Defense Advanced Research Projects Agency, formally known as DARPA, “began working on an AI assistant project that would help military commanders deal with the overwhelming amount of data they received on a daily basis” (Cult of Mac, 2018, pg. 1). In order for this pro-type idea to see if it was possible, the DARPA reached out the “Stanford Research Institute for further input and testimonial research” (Cult of Mac, 2018, pg. 1). The SRI decided to jump onboard with their proposal and test further ideas to see what else this platform would be able to handle. This is where the classic ‘Siri’ name is rooted from. Many people do not know this is how and where the technology originated. “The SRI decided to create a spin-off called Siri, a phonetic version of their company name, and was launched in the app store in 2010.” (Cult of Mac, 2018, pg. 1). The most interesting part to this history of Siri was it was developed as its own separate app and unattached to anything else. It was designed originally to tackle multiple things such as, “Order taxis through Taxi Magic (which was the original form of Uber in 2010), pull concert data from StubHub, movie reviews from Rotten Tomatoes, or restaurant reviews from Yelp” (Cult of Mac, 2018, pg. 1).
As impressive as this prototype was, Apple decided to buy out the app and partner with the SRI for a “200 million-dollar deal” (Cult of Mac, 2018, pg. 1). that would soon change the game for Apple iPhones forever. According to Apple INC AI specialist, Scott Forstall, “Apple decided to partner with the same groups SRI did, including Wolfram Alpha, Wikipedia, Yelp, Weather Apps, Yahoo, NASDAQ, Dow, and local news stations for traffic and current time zones, and many more. Scott makes it very clear that Apple wanted this feature to be as accurate, fast, and hands-free as possible, also with a virtual voice that was friendly, respectful, and reliable” (Cult of Mac, 2018, pg. 1). In doing this, it pro-creates a positive Apple user experience and makes the user feel confident in the phone’s capability. This was their initial goal and they prioritized making this the most innovative phone technology to date.
During this acquisition process, Apple’s first version of Siri once purchased was unable to originally speak back to the user. It was originally used to provide answers through the other brands it partnered with to provide quick help or feedback on a question. In order to improve this, what’s called as Natural Language Processing, or NLP, was implemented in the newer version of Siri to allow the verbal connection to the technology and have Siri fully understand the words, connotation, voice pitch, voice que’s, and pronunciation of what the user is saying or asking itself. In other words, Siri will be able to understand what the user is saying, and voice the correct response back in the correct style, language, and structure. This was the ideal model for the iPhone 4S that was able to perform Speech Recognition and received a large amount of media attention when it was released. Further Speech Recognition ideology will be explained later below when the mechanics are unpacked.
The first version of Siri in the iPhone 4S provided sample questions in their first commercial showing real world and hands-free scenarios asking tasks to be done such as:
‘Can you reschedule my meeting to 5pm, What is the weather in New York looking like for this weekend, Call Mom, Text back I will be there soon, What time is it in Paris, What is the currency for 60 Euros in dollars, etc’. Types of questions like these require the applications they have partnered with to provide the fastest and most accurate responses in a matter of seconds, such as Wolfram Alpha, Yelp, Wiki, and many more. The reaction to this new feature was so positive and popular, Apple then created this feature accessible on their other devices such as their computer, Mac and now iPad technology (Cult of Mac, 2018, pg. 1). Siri is the fast, accessible medium between the user and all of the other apps it has partnered with to be the first platform necessary to answer these types of questions or tasks.
As this 200 million-dollar deal has progressed, many capabilities to this component have now been researched further in terms of its’ NLP processing, data collection, and personalization functions in newer iPhones today. As every year passes, Apple is known for bettering each device they produce by making faster and smarter improvements in their mechanics and AI functions in their products. For their iPhones, this is priority since their phone models have dominated the cell phone industry since the first Siri feature. In recent development, the progression of Siri has now become gendered, more accurate, and geographically diverse.
As Siri continued to progress in each newer model of the iPhone, the voice responding back to the user eventually became more noticeably female rather than male. This is very controversial for virtual assistants in general since they are artificial it can be difficult to create a voice that isn’t one gender or the other. This feature did receive some backlash in terms of a female voice in earlier models, but Apple has now created a feature to Siri that doesn’t have to be a She; it can be a He or an It. The problem with this dates simply back to earlier stages of 1950’s gender roles and norms when women were “Ready to answer serious inquiries and deflect ridiculous ones. Though they lack bodies, they embody what we think of when we picture a personal assistant: a component, efficient, and reliable woman. She gets you to meetings on time with reminders and directions, serves up a reading material for commute, and delivers relevant information on the way, like weather and traffic, etc.” (The Real Reason, 2018, pg.1). Apple released a statement in 2013 now saying “both options are available for voice preference” (The Real Reason, 2018, pg.1). Small changes like this allow Apple to create room for improvement to present Siri as the best virtual assistant on the market that is not only smart technology but is customizable per each user. Such categories would be voice volume, gender voice, accent, and notification preferences.
As if Siri cannot be any more of a personal experience and technology, Apple’s newer feature of cloud computing and cloud capability has transformed Siri even further than before. The feature of Cloud Computing was originally released in 2011 (Apple Privacy, 2019, pg.1). Apple’s Press Release Statement thus follows, “Apple today introduced iCloud as a breakthrough set of free new cloud services that work seamlessly with applications on your iPhone, iPad, iPod touch, Mac, or PC to automatically and wirelessly store your content in iCloud and automatically and wirelessly push it to all your devices. When anything changes on one of your devices, all of your devices are wirelessly instantly” (Apple Privacy, 2019, pg.1).
Some operations of the cloud include, “cloud computing, cloud storage, cloud backups, or access to photos, documents, files, contacts, reminders, music, etc.” (Apple Privacy, 2019, pg.1). In terms to Siri, it is able to perform NLP through cloud computing and virtually store your data the more you use the device. As stated in the press release, the key here is virtual, wireless, automatic service which is easily accessible via the Siri component. Some tasks to ask Siri on newer models of the iPhone would be, “Siri, can you save my email in the cloud, can you add my song to my playlist, or can you save these documents in my work folder, Send this to the cloud, etc.”
A common question in response to this would be: How does the architecture of NLP and Cloud Computing work? In order for Siri to be used correctly, it first must be used from your device, have wireless connection to use other platforms, and access to the cloud feature in order to store your data to process the important information get to know the user better. When this happens, your data becomes personalized, which is then stored away virtually. The cloud component is needed for your user profile to be understood and processed.
The next question would be: How does Siri actually listen to the user in order to then be able to function with the Cloud? What are the mechanics that make all of this possible? The answer to this question is complex, but it can be de-black-boxed. From a historic standpoint, Apple improved this technology by allowing it to perform speech recognition using speech patterns and sound waves which can be computed and understood through NLP and Cloud computing and then sent back to the user.
There is a constant signal sent back and forth in order for Siri to hear, understand, save, and respond to you. All of this is done in seconds; sometimes milliseconds, depending on the complexity of your request. As mentioned previously, a hands-free experience is priority, so when we de-black-box cloud computing, the required mechanics for Siri include the ability to perform text-to-speech and speech-to-text recognition and access to the DNN. All of this is done through layers of the deep neural network which is explained in the next step below.
From the design standpoint, there are many designs and layers to Siri that must be understood. Once you have asked Siri a question with the button, or asked “Hey Siri”, there are signals being sent via cloud computing and the deep neural network that record your questions and determine the correct answer, which is then recorded in text, and presented back to the user by voice. To make this as clear and simple as possible, according to Apple’s own Siri Team site, they said, “The ‘Hey Siri’ feature allows users to invoke Siri hands-free. A very small speech recognizer runs all the time and listens for just those two words. When it detects “Hey Siri”, the rest of Siri parses the following speech as a command or query. The “Hey Siri” detector uses a Deep Neural Network (DNN) to convert the acoustic pattern of your voice at each instant into a probability distribution over speech sounds. It then uses a temporal integration process to compute a confidence score that the phrase you uttered was “Hey Siri”. If the score is high enough, Siri wakes up” (Apple Hey Siri, 2019, pg.1). Here is a picture below to visualize the layers and further understand where the speech waves travel.
For newer updates of virtual assistants to fulfill the hands-free experience, the wake word is required. For Siri, as mentioned before, it is now in newer models known as, ‘Hey Siri’. Siri must be turned on in the settings of your phone in order to be always listening and awaiting your attempt to wake it with the wake word. To address the second part of this question, the mechanics that make all of this understood and possible happens within what’s called, Speech Synthesis. This is a very interesting layer to Siri. Speech synthesis is the layer that is able to then understand and voice back to the user the proper response once the initial question was heard, understood, and processed through NLP and Cloud Computing (Apple Siri Voices, 2019, pg.1).
According to Apple’s Siri Team site, they say, “Starting in iOS 10 and continuing with new features in iOS 11, we base Siri voices on deep learning. The resulting voices are the more natural, smoother, and allow Siri’s personality to shine though” (Apple Hey Siri, 2019, pg.1). In the picture below, provides a clear representation of how text-to-speech synthesis looks, and operates. Starting from the left, text is used as the input, text analysis occurs, then followed by the prosody model, which deals with rhythm, and then signal processing begins with unit selection and wave from concatenation, which deals with the sequence of chain of code deliverable back in speech form. To be clear, this is where the predictive feature can be further explained once it goes through each of these units in the model (Apple Hey Siri, 2019, pg.1).
This is great example of how the user is able to speak to Siri, and how Siri is able to respond to the user and get to know them through this process of machine learning and deep neural networks with NLP discourse. All of this virtual assistant process is entirely possible with help from NLP, the deep neural network, speech recognition ability, machine learning and algorithmic implementation, and speech synthesis, and many more complex features. (insert other pic here)
Literature Continued: Pro’s and Con’s
With the general socio-technical history, mechanics, and layers to Siri understood, this is where my original research question initially began since Siri is posed as complex and positive technology, but one must question what are the negatives within the positives? With progress in any form of technology, there are always some drawbacks since nothing is deemed perfect. The entire process of Siri is non-visible and a lot is going on that most users are not aware of. With this, it is important to lay out the framework for unpacking the pro’s vs. the con’s to Siri. Another question to also address would be, what other things can Siri do?
Some pro’s to Siri technology would be, “Siri can act as a personal scribe, she can write texts for you, post to your social media accounts, solve complex math equations, finding emails, and converting measurements, even Morse code” (UK Norton, 2018, pg.1). Other things include, “booking your evening out for you with certain apps, like food apps, or Yelp for food reviews. Siri can also be used for Open-Table and automatically book your reservation” (UK Norton, 2018, pg.1). As mentioned before, in the newer software updates such as iOS 8 or iOS 9, the “Hey Siri feature must be turned on and you can accomplish any task with Siri” if you start your interaction with Hey Siri as the wake word (UK Norton, 2018, pg.1).
Some con’s to Siri technology would be, “Siri has listening problems, is always listening to you if turned on, or if your Wi-Fi dies, Siri dies with it” (UK Norton, 2018, pg.1). When we say listening problems, sometimes if your question is too complex, Siri might not be able to fully understand you or the answer you need. If the pitch, tone, or acoustics of your ‘master voice’ are off, it can also be difficult for Siri to hear you properly or register that it is still your voice. Some common replies Siri can re-iterate back to you would be, “I’m sorry, I don’t understand you, I don’t know how to respond to that, or Can you repeat your question?” (UK Norton, 2018, pg.1). In terms of Wi-Fi connectivity, which is highly important in order for Siri to operate, this connection gives Siri the power to access the sub-platforms that are already installed within her mechanics. Without Wi-Fi, Siri isn’t able to access the Apple Server to store and collect your data or reach the networks needed in order to answer your question. When connection lessens or worsens, it becomes increasingly difficult for a speedy-accurate answer to be delivered to you normally.
Throughout the course of the semester with Professor Irvine, some quick pro’s and con’s I personally have gathered with this technology would be: Pro’s: quick virtual help, hands-free, audio enabled, customizable, personable, free, accurate, useful when needed, and proficient. Con’s: Risk to your privacy, data is owned and accessible by Apple, the microphone is always listening to some degree in order to respond to the initial wake word, Hey Siri, and for all people especially very private people, this poses a threat to ethics of data usage, storage, and collection. All of this combined results in the next big question, Is Siri functionality ethical or unethical and does this put our privacy at risk?
To further answer this, in the Stucke and Ezrachi 2017 study, they discuss, “The digital assistant with the users’ trust and consent will likely become the key gateway to the internet. Because of personalization and customization, consumers will likely relinquish other less personal and useful interfaces and increasingly rely on digital assistants to anticipate and fulfill their needs. They transform and improve the lives of consumers yet come at a cost” (Stucke and Ezrachi, 2017, pg. 1243). They found these types of assistants, especially Siri, follow a learning-by-doing model, and this is where the voice recognition and NLP happens that gets personally stored to each user profile. The more it is used, the more it learns about you (Stucke and Ezrachi, 2017, pg. 1249). They also say, the more someone uses Siri, the more it is able to predict the type of apps it needs to answer you, and the more it can start to personalize your data and formulate search bias (Stucke and Ezrachi, 2017, pg. 1242). Their concluding argument suggested it is nearly impossible to create an organic algorithm and not have a super-personalized experience that isn’t virtually stored and owned by the company (Stucke and Ezrachi, 2017, pg. 1296).
In a similar article, the Hoy 2018 study discusses what virtual assistants are and how they can pose a threat to privacy and need immense future regulation if they ever were to be used for other things than just your cell phone. He argues this because they already have so much access and ability to own your data, it would be extremely vast to think about the complexity and ability of Siri in a real-life large internet hosted setting. Hoy says, “Currently available voice assistant products from Apple, Amazon, Google, and Microsoft allow users to ask questions and issue commands to computers in natural language. There are many possible future uses of this technology, from home automation to translation to companionship and support for the elderly. However, there are also several problems with the currently available voice assistant products. Privacy and security controls will need to be improved before voice assistants should be used for anything that requires confidentiality” (Hoy, 2018, pg. 1).
In relation to these studies, consumers then want to know, Does Siri actually always listen to you and what can be done about this? As impressive as this feature is in the new iOS 8 software update, where a user can say, ‘Hey Siri’ in a hands-free conversation, what they don’t know is Siri to some degree is always listening and is fully listens once woken.
As this product has improved, a recent conversation with Apple CEO, Tim Cook, took place between the House of Representatives and their legal team. The house wanted to know more of what’s really going on in their updates with Siri, the user location to pin point data, and the listening feature of what she is collecting and is this potentially harmful or against their policy (Sophos, 2018, pg.1). Tim Cook responded and said, “We are not Google or Facebook. The customer is not our product, and our business model does not depend on collecting vast amounts of personally identifiable information to enrich targeted profiles marketed to advertising” (Sophos, 2018, pg.1). To back this up further, Apple’s own director of Federal Government Affairs chimed in and wrote a formal letter that says, “We believe privacy is a fundamental human right and purposely design our products and services to minimize our collection of consumer data. When we do collect data, we’re transparent about it and work to disassociate it from the user” (Apple Response Letter, 2019, pg.1).
Even though Apple claims they nor Siri do not always listen to you, the answer to this question is still up for debate. In recent reports, other news sites would argue the opposite. In a recent USA Today article, they say, “With iOS 8, Apple introduced the ‘Hey Siri’ wake phrase, so you can summon Siri without even touching your iPhone. If you turn this feature on, this means your iPhone’s mic is always listening, waiting for the phrase, ‘Hey Siri’ to occur (USA Today, 2017, pg.1).
In response to these reports such as this one, and similar ones, Apple claims the Siri microphone does not start to listen to you until the wake word is used, but it wouldn’t take much math involved to understand that in order for that to happen or be pronounced true, the device has to be listening to some degree in order for it to fully wake and then proceed with processing your information and provide an answer to your request. Whether the company owns up to this feature or not, either way, this poses a threat to privacy and one’s personal data with Siri.
Looking ahead towards the future, one might then ask, Where is Siri going and what implications are needed for the future, if any? This is the most-dense component to my entire initial research question because Siri is already able to do so much, what more does she need to do? When thinking ahead for the next decade and beyond this is a mind-blowing thought process to experience. In order for the privacy threat controversy to disappear, there must definitely be more regulations reinforced in terms of more listening rights, protocol, access to full disclosure of how Siri listening fully works and is visually understood by all users who decide to turn it on.
Another regulation to reinforce for the future would be that we cannot let virtual assistants control too much of our lives. I suggest this strongly, but looking in terms of the current projection of Siri on the Apple Siri website, it is evident that she will be running the world in other ways than just our phones. On their site they say, “Now you can control your smart appliances in your home, check their status, or even do many things at once—just using your voice. In the Home App, you can create a page ‘I’m Home’ that opens the garage, unlocks the front door, and turns on the lights” (Apple Siri, 2019, pg.1). Some common questions we can ask it to do inside our homes would be: ‘Did I close the garage, Show me the driveway camera, or tell it to redirect your smart TV remote to record a show for you when you’re not home’. (Apple Siri, 2019, pg.1).
As if At-Home assistance like this isn’t overwhelming enough, Siri is now accessible within smart cars and newer models of cars across all brands. You can also ask questions related to your car such as, ‘Did I close my door, Where did I park, What song is this, Play 92.5 FM Radio, Answer phone call, etc’. all at the power of your voice, hands-free inside your moving vehicle (Apple Siri, 2019, pg.1). This feature is gaining a lot of speed and is now easily used and accessible to enhance not just your phone experience, but at-home, on-the-go, music, or even car related experiences.
According to VOX Media, the future for voice assistants is looking extremely bright in terms of running the ways in which we use technology professionally, socially, economically, industrially, and personally. They pulled some statistics that say, “There are 90.1 million people who own and use smart phone technology, 77.1 million people who use it inside cars, and 45.7 million people who use it on speakers” (Vox Media, 2019, pg.1). These statics also can suggest that the future of virtual assistants, especially Siri, will become the new face of voice automated technology, and for the other categories previously mentioned from Apple.
As a frequent user of Siri and Apple consumer, this topic sparked many interests in my participation with their products and I wanted to learn more about Siri and how it all actually works. The most controversial part to this entire works was the listening section where Apple’s privacy statements were up for debate in recent news. From what I have gathered in further research, it’s still hazy to argue that Apple is not fully always listening to you, because most people would argue the opposite despite what they continue to legally market and disclose in their statements.
It is increasingly difficult to know the truth regarding this matter, but what I have gathered in this AI course this semester and the research done to answer my initial research question, I could argue my hypothesis as pending true due to the fact that the technology is listening to the user to some degree in order to process the wake word. Understandable in terms of legal issues, Apple would never fully disclose this performance, but from understanding the NLP architecture and algorithmic cloud computing features, I would confidently stand on the always listening side to this argument due to the fact in the privacy statements they never even fully and clearly shut down the possibility of that being possible.
Virtual Assistants are technology that is designed to enhance our lives. Siri in particular is a highly skilled AI virtual assistant that can act as a large key component to our inquiries, questions, or requests in order to achieve a certain task. In nature, the main goal of Siri is genius, and extremely convenient. However, as it continually progresses, we can find and see the leverage it is slowly gaining on our lives, not just our phones. The purpose of this progression is to keep users relying on this technology and Apple products. In this, it reinforces Siri has the answers we deeply desire, but AI and Siri in particular is taking a route that could be going too far in replacing human actions in human life.
As exciting as the future looks, all of this overlapping control and capability for all areas of our lives such as our phones, homes, cars, businesses models, software, and more is agreeably innovative, but extremely inconclusive and terrifying at the same exact time. These technologies are a privilege and we must use them to our own degree when necessary but not let it overpower the meaning of life. No artificial technology is better than real authentic life choices and actions.
Apple Introduces iCloud. (2019, April 05). Retrieved May 5, 2019.
Apple Response to July 9 Letter. 2019. Retrieved May 5, 2019, from SCRIBD
Daniel Jurafsky and James H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd ed. (Upper Saddle River, N.J: Prentice Hall, 2008).
Deep Learning for Siri’s Voice: On-device Deep Mixture Density Networks for Hybrid Unit Selection Synthesis – Apple. Retrieved May 5, 2019, from Apple Siri Voices.
Hey Siri: An On-device DNN-powered Voice Trigger for Apple’s Personal Assistant – Apple. Retrieved May 5, 2019, from Hey Siri.
Hey Siri: The Pros and Cons of Voice Commands. 2018. Retrieved May 5, 2019, from UK Norton Blog.
Komando, K. (2017, September 29). How to stop your devices from listening to (and saving) what you say. Retrieved May 5, 2019, from USA TODAY
Matthew B. Hoy (2018) Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants, Medical Reference Services Quarterly, 37:1, 81-88, DOI: 1080/02763869.2018.1404391
Molla, R. (2019, January 15). The future of voice assistants like Alexa and Siri isn’t just in homes – it’s in cars. Retrieved May 5, 2019, from VOX media.
Siri. (Retrieved May 5, 2019, from Apple Siri.
Stucke, M. E.; Ezrachi, A. (2017). How digital assistants can harm our economy, privacy, and democracy. Berkeley Technology Law Journal, 32(3), 1239-1300.
The Real Reason Voice Assistants Are Female (and Why it Matters). (2018, January 29). Retrieved May 5, 2019.
Today in Apple history: Siri debuts on iPhone 4s. Cult of Mac. (2018, October 04). Retrieved May 5, 2019.