Author Archives: Heba Khashogji

About Heba Khashogji

As a true believer in the seeds of obedience that blossom in our lives my life found happiness in honoring my parents. This leads me to the passion I’ve been fulfilling, to be an agent of change both in the corporate and societal environment. I advocate to work on social services to create and promote equity, opportunity and improvement of the people and the community. I offer more than a decade of experience and accomplishment in human resource, driving implementation in employee development, quality management systems, salary standardization, compensation and benefits management, personnel services management and company reorganization and realignment. One of my achievements is the creation of a quality management procedures and policies as an strategic and tactical efforts that drove our company, Khashoggi Holding Company in its International recognition as Quality Crown Gold Awardee in 2014. Going back, when I started working as a volunteer accountant/admin to setup Dar AlHekma College, the first private college for ladies in the Saudi Arabia and my first official career in King Fahad Armed Forces Hospital, I developed an interest in human relations and developed this interest into my participation to the implementation of quality management and standardization of policy management systems in these organizations. Demonstrating initiative in the start, I applied and implemented integration programs in Personnel Section leading to employees' satisfaction by delivering fair and reasonable benefits to all. Throughout my career, I had the opportunity to establish a strong network contacts in and out of the country through my active participation in several seminars and workshops. The scope of my experience has spanned practically in all aspects of HR as well as leadership. Another passion I am in love with is the aiding to the propagation of young Saudi generation be with better traits and characters created children books, converted to animated videos shown in local TV channels to help reinforcing behavioral change in the Arab region bringing them to be more well-mannered individuals and be more diplomatic among them as well as with their foreign friends exercising tact and courtesy in every encounter. Just recently, another 2 things in my wish list are achieved, to skydive and take Master course. Skydiving made me challenge myself and conquer my fears that can help me overcome obstacles in my future. I am not stopping to dream and I am not stopping to learn. I still see myself in a class, for 23 years from now, physical or virtual. I thirst for knowledge and I always crave for new ideas not even in the time of pandemic.

Deblackboxing “Translation” Paper on Amazon Echo Plus

Deblacking “Translation” Paper on Amazon Echo Plus (with cover) (.docx)

Deblacking “Translation” Paper on Amazon Echo Plus 

Heba Khashogji

Abstract

Smart speakers have gained wide popularity among many users because it provides more luxury. Still, some users feel that this type of device constitutes a violation of privacy, and there is no point in using it. In this paper, we will talk about the Amazon Echo Plus, its main components, and how it works step-by-step by following the “deblackboxing” method.

Introduction

At present, smart speakers have become widely used among people, and according to Jakob and Wilhelm (2020), Amazon dominates the smart speaker market along with Google, Xiaomi, and others, and among these speakers is the Amazon Echo family.

In this paper, we will talk specifically about the Amazon Echo Plus, the “smart speaker.” This smart speaker is powered by Amazon’s cloud-based voice service known by the name Alexa. Smart speakers have many uses, including the field of healthcare for the elderly. Ries and Sugihara, 2018 and Robinson et al., 2014 claimed that the technology itself has proven that it is able to provide healthcare thanks to the existing technologies and its current functions as Amazon Echo Plus is used as an alternative healthcare provider to humans in the early stages of people with dementia. These devices also use the Internet of Things (IoT) that helps control home appliances by voice recognition. You can also listen to music on demand by any artist or genre from many platforms such as Spotify, Amazon Music, and Apple Music through such devices.

  1. Systems Thinking

Amazon Echo Plus starts working when it hears the word “Alexa” from a user. The word Alexa refers to the virtual assistant from Amazon. The alert word “Alexa” can be changed later to “Echo,” “Amazon,” or “Computer.”

When the virtual assistant hears the alert word, it starts working, and the ring at the top lights up in blue color. Then Echo Plus can be asked any question, for example, about the weather, and it answers the weather with a summary of what the weather will be like during the day.

Echo Plus has a built-in hub. This hub supports and controls ZigBee smart devices, such as light bulbs and door locks, which can be bound to the home assistant asking Alexa to “discover the devices.” Similarly, when the user asks Amazon Echo Plus any question by voice command, Echo Plus records the audio and sends it through the Amazon cloud servers. These servers convert the recorded voice into text which will be analyzed, and therefore Alexa finds the best way to answer this text. This answer is converted back to audio, and this information is sent to the Echo Plus smart speaker to show the audio response (Rak et al., 2020).

Amazon Echo Plus features local voice control and allows us to control our home devices without any internet connection. However, if one needs to listen to music from Spotify or Amazon, an internet connection is required.

  1. Design Thinking and Semiotic Thinking

Below is a simple example that shows how the Amazon Echo Plus works. We will assume in this example that the user says “Hello world” for the purpose of examination:

First, to start the device, the user says, “Hello world.” When the device hears the wakeup word “Alexa,” it starts to listen. Second, the Amazon Echo Plus device sends the speech to the Alexa service via the cloud to recognize speech. After which, it converts it into text, and the natural language processing operations are performed to identify the purpose of the request. Third, Alexa sends a JSON file that contains the demand to Lambda Function to handle the request. Lambda function is one of Amazon Web services that run user’s code only when needed, so there is no need to run servers continuously. In our example, the lambda function will return “Welcome to the Hello world” and send it to the Alexa service. Fourth, Alexa receives a JSON response and converts the resulting text into an audio file. Finally, the Amazon Echo Plus receives and plays audio for the user. As you can see below, figure 1 shows how the user interacts with the Amazon Echo Plus device (Amazon Alexa, n.d.).

Figure 1: User Interaction with Amazon Echo Plus (Alexa Developer, n.d.)

 

  1. JSON (Intent/ Response)

“JavaScript Object Notation” is one way of formatting that structures data used chiefly by web applications for communication. JSON syntax is created based on JavaScript object notation syntax (Wazeed, 2018):

  • Data is in name/value pairs. Example: {“fruit”:” Banana”}.
  • Data is separated by commas. Example: {“fruit”: ”Banana”, “color”: ”yellow” }
  • Curly braces hold objects.

Figure 2 shows an example of JSON code; inside the Intents array, there’s a HelloWorldIntent and one of the built-in intents: AMAZON.HelpIntent. AMAZON.HelpIntent responds to sentences that contain words or phrases indicating that the user needs help, such as “help.” Alexa creates an intent JSON File after it converts speech to text.

Figure 2: An Example of JSON Code (Ralevic, 2018)

  1. Text to Speech System

Text-to-speech is done in several stages. The input Text to Speech System (TTS) is a text that is analyzed, then that text is converted into an audio description, after which a tone is generated. The main units of the text-to-speech architecture are as follows (Isewon et al., 2014). Figure 3 shows text to speech System:

Figure 3: Text to Speech System (Isewon et al., 2014)

  • Natural Language Processing Unit (NLP): It produces an audio version of the text on the input. The primary operations of the NLP unit are as follows:
    • Text analysis: First, the text is decomposed into tokens. Token-to-word conversion creates the orthographic form of the token. For example, the token “Mr” is transformed to “Mister”; it is constituted by expansion.
    • Application of the pronunciation rules: after the first stage is complete; the pronunciation rules are applied. In some cases, the letter can correspond to no sound (for example, “g” in “sign”), or multiple characters correspond to a single phoneme (such as: “ch” in “teacher”). There are two approaches to determine pronunciation:
      • Dictionary-based with morphological components: as many as possible words are stored in a dictionary. Pronunciation rules determine the pronunciation of words that are not found in the dictionary.
      • Rule-based: pronunciations are created from the phonological knowledge of dictionaries. Only words whose pronunciation is an exception are included in the dictionary.

If the dictionary-based method has a large and enough phonetic dictionary, it will be more exact than the rule-based method.

  • Prosody Generation: after the pronunciation is specified, the prosody is created. Prosody is essential for specifying an affective state. If any person says, “It is a delicious pizza,” it can reflect whether that person likes the pizza or not, which depends on a person’s intonation. Text to Speech system (TTS) is based on many factors such as intonation modeling (phrasing and accentuation), amplitude, and length modeling (including sound length and pause, which determine syllable length and speech tempos) (Isewon et al., 2014).
  • Digital Signal Processing Unit (DSP): It converts the symbolic information received from NLP into understandable speech.
  1. Convert Text to Tokens

Alexa divides Speech into tokens according to the following (Gonfalonieri, 2018) (Trivedi et al., 2018) :

  1. The wake-up word: The wake-up word “Alexa” tells the Amazon Echo Plus to start by listening to the user’s commands.
  2. Launch word: The word launch is a transitional action word indicating to Alexa that a skill summons will likely follow. Typical launch words include “tell, ask and open.”
  3. Invocation name: To initiate an interaction with a skill, the user says the skill’s recall name. For example, to use the weather skill, a user could say, “Alexa, what’s the weather today?”
  4. Utterance: Simply put, spoken speech is a user’s spoken request. These spoken requests can invoke a skill and provide input to a skill.
  5. Prompt: A string of text that must be pronounced to the user to request information. You include prompt text in your response to a user request.
  6. Intent: an action that fulfills the user’s spoken request. Intents can optionally contain arguments called apertures.
  7. Slot value: slots are input values ​​that are provided in the spoken user request. These values ​​help Alexa in knowing the user’s intent.

Figure 4 shows that the user is giving the entry information, the travel date for Friday. This value is an intent slot, which Alexa will transfer to Lambda to process the skill code.

Figure 4: Dividing Words Into Tokens ( Amazon Alexa, n.d.)

  1. Speech Recognition

Speech recognition is the machine’s ability to identify words and phrases in the spoken language and convert these words or phrases into text that the machine can handle (Trivedi et al., 2018). There are three ways that computer performs matching speech with stored phonetics:

  • Acoustic phonetic approach: Hidden Markov Model (HMM) is used in this approach. Hidden Markov Model develops a non-deterministic probability model for speech recognition. HMM consists of two variables: the hidden states of the phonemes stored in computer memory and the visible frequency segment of the digital signal. Each phoneme has a probability, and the syllable is matched with the phoneme according to the probability. Then, the matched phonemes are collected together to form the correct words according to the language’s grammar rules, which are stored previously.
  • Pattern recognition approach: Speech recognition is one of the areas of pattern recognition. It falls under what is known as supervised learning. In a supervised learning system, we have a dataset where the input (audio signal) and output (text corresponding to the audio signal) of the dataset is known. The dataset is divided into two sets: a training set and a testing set. Supervised learning is also divided into two phases: the training phase and the testing phase. In the training phase, the training set is used and entered into a specified model and trained with a certain number of iterations to produce our trainer model. The trainer model is tested by a test set to ensure that it is operating properly. In the speech recognition stage, the user’s voice is matched with the previously trained pattern and so on until the recognized sentence is produced as a text (Trivedi et al., 2018).
  • Artificial intelligence approach: it is based on the use of main knowledge sources such as sounds, spoken knowledge based on spectral measurements, proper meaningful knowledge, and syntactical words knowledge.

Figure 5 shows a typical speech recognition system.

Figure 5: Typical Speech Recognition System (Samudravijaya, 2002)

  1. User- Speaker Interaction

Amazon Echo Plus has powerful microphones. The device needs to be activated; the microphone always works and waits for the wake-up word “Alexa” to be activated (Jakob, 2020). Figure 6 shows the voice processing system. Microphones in Echo plus convert voice signal, which is a continuous signal to digital signal.  The process of converting analog signal to digital signal has three stages:

  • Sampling: Samples are taken at equal time periods, and a frequency samples a periodic signal called a cutoff frequency. The cutoff frequency must be equal to more than twice the maximum frequency of the input signal. This is called Nyquist’s theorem.
  • Quantization: The second step assigns a numerical value to the voltage level. This process searches for the closest value corresponding to the signal amplitude out of a specific number of possible values, covering the whole amplitude range. The size of the quantizer scope must be a power of 2 (such as 128, 256 …).
  • Coding: After the closest discrete value is identified, a binary numerical value is assigned for each discrete value. Quantizer identifies the discrete value, and a numerical value is assigned corresponding to each discrete value, then it is encoded as a binary number. The quantization and encoding process cannot be entirely correct and can only provide an approximation of the real values. AS higher as possible of the quantizer resolution, the closer this approximation will be to the real value of the signal (Pandey, 2019).
  •  

Figure 6: Voice Processing System (Abdullah et al., 2019)

According to Abdullah et al. (2019), audio is processed to remove noise and then passed to the signal processing phase. Preprocessing involves applying a low pass filter to remove noise from the voice background. A low pass filter can be defined as a frequency filter that passes signals with a lower frequency than cutoff frequency and prevents higher frequency than cutoff frequency, as shown in figure 7.

On the other hand, Signal Processing is considered a major component in voice processing. It captures the most important part of the input signal. Where the major component of signal processing is Fast Fourier Transform (FFT): Fourier transforms a signal from the time domain to the frequency domain. Fast Fourier transform is an algorithm used to calculate discrete input faster than computing it directly (Maklin, 2019). After which an FFT and its magnitude are taken, which generates a frequency domain representation of the audio called a magnitude spectrum.

Figure 7: Ideal Low Pass Filter (Obeid et al., 2017)

  1. Ethics and Policy

Intelligent systems, including Internet of Things (IoT) systems, manage a very large amount of personal data, which is unknown to many users with limited experience. Also, these devices control most home appliances, such as home air conditioning systems, home lighting, washing machines …, which makes this type of system questionable in terms of security and privacy (Rak et al., 2020). One of the main reasons that prevent users from increasing the use of IoT systems is because they collect, process, and share personal user data with other parties. There are many IoT systems that collect user data without their knowledge or consent (Thorburn et al., 2019).

A woman from Oregon discovered that her smart assistant had recorded a voice call between her and her husband, and the recorded call was sent to one of her contacts on her phone. The existence of many of these violations has led to the adoption of many privacy regulations, such as the European General Data Protection Regulation (GDPR). GDPR is in European Union law. The main aim of GDPR is to give people control over their personal data and prevent their data from sharing without their consent. The GDPR consists of provisions as well as requirements related to the personal processing data of people located in the EU (Thorburn et al., 2019).

To this end, Echo Plus always listens to his alert word “Alexa” and starts to work when it thinks that it heard this word, then it begins to record the voice and receive the commands, which can be seen through the blue light of the ring at the upper part. It does not record anything except that it is waiting for a word of alert from the user. Amazon uses encryption to protect the audio recordings that Alexa uploads. These audio files can be deleted at any time that the user wants. Amazon Echo Plus also allows the user to stop recording via the microphone by pressing the “mute button” and prevent him/ her from hearing anything, even the alert word, and then the ring turns red (Crist and Gebhart, 2018).

Conclusion:

This paper discussed the most popular smart speaker device, “Amazon Echo plus.” The paper explained how the device works and its main components. The main discussion points and concepts were tackled, including; Natural Language Processing, converting speech to text and converting text to speech. In the end, the paper elaborated on the ethics and how the device tries to provide more privacy for users.

Bibliography

  1. Abdullah, H., Garcia, W., Peeters, C., Traynor, P., Butler, K. R., & Wilson, J. (2019). Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems. arXiv:1904.05734v1.
  2. Amazon Alexa (n.d.). Build an Engaging Alexa Skill Tutorial. Retrieved from https://developer.amazon.com/en-US/alexa/alexa-skills-kit/get-deeper/tutorials-code-samples/build-an-engaging-alexa-skill/module-2.
  3. Crist, R., & Gebhart, A. (2018, September 21). Retrieved from https://www.cnet.com/home/smart-home/amazon-echo-alexa-everything-you-need-to-know/.
  4. Gonfalonieri, A. (2018, November 21). How Amazon Alexa works? Your guide to Natural Language Processing (AI). Retrieved from towards data science: https://towardsdatascience.com/how-amazon-alexa-works-your-guide-to-natural-language-processing-ai-7506004709d3.
  5. Isewon, I., Oyelade, J., & Oladipupo, O. (2014). Design and Implementation of Text To Speech Conversion for Visually Impaired People. International Journal of Applied Information Systems (IJAIS).
  6. Abdullah, H., Garcia, W., Peeter, C., Traynor, P., Butler, K. R., & Wil, J. (2019). Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems. arXiv:1904.05734v1
  7. Alexa Developer (n.d.). Build an Engaging Alexa Skill Tutorial. Retrieved from https://developer.amazon.com/en-US/alexa/alexa-skills-kit/get-deeper/tutorials-code-samples/build-an-engaging-alexa-skill/module-1.
  8. Alexa Developer (n.d.). Build an Engaging Alexa Skill Tutorial. Retrieved from https://developer.amazon.com/en-US/alexa/alexa-skills-kit/get-deeper/tutorials-code-samples/build-an-engaging-alexa-skill/module-1.
  9. Isewon, I., Oyelade, J., & Oladipupo, O. (2014). Design and Implementation of Text To Speech Conversion for Visually Impaired People. International Journal of Applied Information Systems (IJAIS).
  10. Jakob, D. &. Wilhelm, S. (2020). Amazon Echo: A Benchmarking Model Review. Retrieved from https://www.researchgate.net/profile/Sebastian-Wilhelm/publication/343280283_Amazon_Echo_A_Benchmarking_Model_Review/links/5f21125ba6fdcc9626bc9691/Amazon-Echo-A-Benchmarking-Model-Review.pdf.
  11. Maklin, C. (2019, December 19). Fast Fourier Transform. Retrieved from https://towardsdatascience.com/fast-fourier-transform-937926e591cb.
  12. Obeid, H., Khettab, H., Marais, L., & Hallab, M. (2017). Evaluation of Arterial Stiffness by Finger-Toe Pulse Wave Velocity: Optimization of Signal Processing and Clinical Validation. Journal of Hypertension. DOI:10.1097/HJH.0000000000001371.
  13. Pandey, H. (2019, November 25). Analog to Digital Conversion. Retrieved from https://www.geeksforgeeks.org/analog-to-digital-conversion/.
  14. Rak, M., Salzillo, G., & Romeo, C. (2020). Systematic IoT Penetration Testing: Alexa Case Study. Italian Conference on Cyber Security, (pp. 190-200). Ancona.
  15. Ralevic, U. (2018, July 24). How To Build A Custom Amazon Alexa Skill, Step-By-Step: My Favorite Chess Player. Retrieved from https://medium.com/crowdbotics/how-to-build-a-custom-amazon-alexa-skill-step-by-step-my-favorite-chess-player-dcc0edae53fb.
  16. Ralevic, U. (2018, July 24). How To Build A Custom Amazon Alexa Skill, Step-By-Step: My Favorite Chess Player. Retrieved from https://medium.com: https://medium.com/crowdbotics/how-to-build-a-custom-amazon-alexa-skill-step-by-step-my-favorite-chess-player-dcc0edae53fb
  17. Ries, N. &. (2018, December 10). Robot revolution: Why technology for older people must be designed with care and respect. Retrieved from https://theconversation.com/robot-revolution-why-technology-for-older-people-must-be-designed-with-care-and-respect-71082.
  18. Robinson, H., MacDonald, B., & Broadbent, E. (2014). The role of healthcare robots for older people at home: A review. International Journal of Social Robotics, 6(4), 575-591.
  19. Samudravijaya, K. (2002). Automatic Speech Recognition. Tata Institute of Fundamental Research. Retrieved from http://www.iitg.ac.in/samudravijaya/tutorials/asrTutorial.pdf.
  20. Thorburn, R., Margheri, A., & Paci, F. (2019). Towards an integrated privacy protection framework for IoT: contextualising regulatory requirements with industry best practices. DOI:10.1049/cp.2019.0170.
  21. Trivedi, A., Pant, N., Pinal, P., Sonik, S., & Agrawal, S. (2018). Speech to text and text to speech recognition systems-A review. IOSR Journal of Computer Engineering (IOSR-JCE), 36-43.
  22. (2018, June 6). JavaScript JSON. Retrieved from https://www.geeksforgeeks.org/javascript-json/.

Artificial intelligence (general concepts overview, application, ethics and future concerns)

The continuous interest in the field of Artificial Intelligence AI has been the main reason for the constant progress in this field and its improvements and the transition with qualitative steps from simple learning that requires a lot of effort and time to deep learning models and self-learning models that led to benefit from this field with all its capabilities for social good. Many successful AI factors are needed for good social usage, like preventing falsifiability, data protection, Situational fairness, Human-Friendly Semanticisation, etc. (Floridi, 2020).

All the capabilities and developments of AI must be at the service of the human process first, in the way that it is safe and not harmful to the environmental environment and always benefits in the long term and the optimal goal, and most importantly, the living organisms that must be carefully and carefully considered to harness this field to serve them (Shaping Europe’s digital future, 2019).

In order to safely exploit artificial intelligence, it was necessary to search for the best methods for this, especially since human trials must be subject to close scrutiny. Therefore it was essential to send a survey and quantitative analysis to all those registered in any experimental step with recording all the detailed notes and in a way that reaches an experimental model, trustworthy to carry out the practical application of industrial techniques with guidelines, ethical controls and self-assessment processes for these applications (Vincent, 2019).

Best AI Applications

Different distinctive AI applications are now available in all fields, such as machine translation (Google translator), big data analysis (deep learning for manipulation of large image dataset like Flickr and google), decision support systems, especially in the medical field, virtual assistant (like Siri and Alexa which can be used for multiple purposes such as setting alarms, suggesting a film-watch list, reminding appointments, querying about the weather, suggesting the best restaurants, etc.), education AI application, scene understanding and image captioning algorithms used by many platforms like Facebook and Twitter, face and speech recognition application, etc. (Useche, 2019).

Through a set of smart algorithms based on humans’ thinking process, AI can reach a similar result to a human think when given the same information. All this falls within the framework of supporting neural networks to provide virtual services that contribute to more sophistication.

Ethical controls for using artificial intelligence (GELMAN, 2019):

These ethical controls address a serious problem in the field of the application of artificial intelligence techniques in the lives of individuals. AI continues to improve the human reality as a whole, and to achieve this, AI must have barriers that prevent it from restricting human freedom, and as it achieves after all that accuracy, durability and security, especially since the issue here is the lives of individuals, their safety and their personal information, which should not at any moment be subject to theft or breach of privacy. All of this must fall under the concept of transparency and ease in taking advantage of these important services provided by AI (Marcus, 2017). For example, the deep fake is one of the bad usages of AI that can be used to create virtual fake images and videos of somebody or even create a virtual fake human.

Preventing the technological exploitation of artificial intelligence:

The human field and its advancement is the first thing that any company with profit-oriented goals thinks about. In the event that AI has a significant impact in the future in technological progress, all of this must be controlled so that it is not profitable and must be subject to control and accountability standards that make it protected from everyone who thinks to politicize its work.

All of these controls must consider digital privacy and the freedom to benefit from artificial intelligence for purposes that serve humanity. Still, the form in which these control methods must be pursued must be effective in a way that does not always depend on censorship (State for Digital, Culture, Media & Sport and the Secretary of State for the Home Department, 2019). The primary reliance on the immunization of artificial intelligence and its uses in a purposeful, protected, and accessible manner to everyone without any harm may result from it (Ballarchive, 2019).

Questions to be analyzed and focus on

Many important questions should be introduced in AI, like who is creating AI systems and why they are created for? Who can control these AI applications? Can we create AI useful applications, but bad usage of them is possible? Can deep-learning algorithms produce deep-learning students? And how can we get useful results from data? Are the virtual assistants totally safe so that my data cannot be accessed by anyone else? Is my cloud data secure and cannot be violated by others in the cloud? (Useche, 2019).

References:

Ballarchive, J. (2019, 4 8). The UK’s online laws could be the future of the internet—and that’s got people worried. Retrieved from technology review: https://www.technologyreview.com/2019/04/08/136157/the-uks-online-laws-could-be-the-future-of-the-internetand-thats-got-people-worried/

Floridi, L. (2020, 4 3). How to Design AI for Social Good: Seven Essential Factors. Science and Engineering Ethics, pp. 1771–1796.

GELMAN, A. (2019, 4 3). From Overconfidence in Research to Over Certainty in Policy Analysis: Can We Escape the Cycle of Hype and Disappointment? Retrieved from Shorenstein centre: https://ai.shorensteincenter.org/ideas/2019/4/3/from-overconfidence-in-research-to-over-certainty-in-policy-analysis-can-we-escape-the-cycle-of-hype-and-disappointment

Marcus, G. (2017). Deep Learning: A Critical Appraisal. New York: New York University.

Shaping Europe’s digital future. (2019, 4 8). Ethics guidelines for trustworthy AI. Retrieved from Europa: https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai

Solon, O. (2019). facial recognition dirty little secret. Retrieved from https://www.nbcnews.com/tech/internet/facial-recognition-s-dirty-little-secret-millions-online-photos-scraped-n981921

State for Digital, Culture, Media & Sport and the Secretary of State for the Home Department. (2019). Online Harms White Paper. UK: APS Group.

Useche, D. O. (2019). CCTP-607: “Big Ideas”: AI to the Cloud. Retrieved from Georgetown: https://blogs.commons.georgetown.edu/cctp-607-spring2019/category/week-12/

Vincent, J. (2019). AI systems should be accountable, explainable, and unbiased, says EU. Retrieved from Theverge.com: https://www.theverge.com/2019/4/8/18300149/eu-artificial-intelligence-ai-ethical-guidelines-recommendations

Big Data (Analysis, Application, Challenges, and Ethics)

1.   Introduction

Big data is a field of AI that introduces ways to analyze, systematically acquire information from, or otherwise manipulate data sets that are too large or complex to be dealt with by traditional data-processing application software.  Big data includes datasets with huge sizes exceeding traditional programs’ capacity to handle appropriate time and value (Wikipedia, 2020). The characteristics of Big Data are (Kitchin, 2014):

  1. Enormous volume, consisting of terabytes or petabytes of data.
  2. High velocity, being created in or near real-time;
  3. variety, being structured and unstructured in nature.
  4. Exhaustive in scope, striving to capture entire populations or systems.
  5. Fine-grained resolution and uniquely indexical in identification.
  6. Relation in nature, containing common fields that enable the conjoining of different data sets.
  7. Flexible, holding the traits of extensionality and scalability.

2.   Applications of Big Data

Big data is a sign that everything is changing. Every portfolio is affected: finance, transport, housing, food, environment, industry, health, welfare, defence, education, science, and more (Johnson, Big data: big data, digitization, and social change, 2017). Here some of the applications in big data (Wikipedia, 2020):

  1. Government

The use and modification of big data inside governmental applications allow getting the benefit, especially in terms of cost, productivity, and innovation.

  1. Healthcare

Providing personalized medication, clinical risk intervention, and medical prediction systems using big data analysis has improved healthcare very well.

  1. Media

The industry moves away from the traditional approach of using specific media such as newspapers, magazines, or television shows. Instead, it taps into consumers with technologies that reach targeted people at optimal times in optimal locations.

  1. Insurance

Health insurance providers gather data on social “determinants of health” such as food and TV consumption, clothing size, marital status and purchase habits. This information can be used to make predictions on health costs in order to spot health issues in their clients.

  1. Internet of Things (IoT)

The IoT devices provide information that is used to make a mapping of device interconnectivity. The media industry, special companies, and governments have been using these mappings in order to reach their audience more effectively and increase the efficiency of their media.

 

  1. Information technology (IT)

Big data has been used as a helpful tool for employees in their work, making “big data” significant within business operations. Big data helpful application in IT made the collection and distribution of information technology (IT) more efficient. Applying big data processes with Machine learning and deep learning makes IT departments more powerful in predicting potential issues and providing solutions before the problems even happen.

3.   Benefits of Big Data

The impact of big data, open data, and data infrastructures can be seen clearly in science, business, government, and civil society (Huberman, 2017). Here some of the benefits of Big data:

  • Businesses can analyze customer traffic to calculate precisely how many employees they will need each hour of the day. The goal is to spend as little money as possible (Arslan, 2016).
  • Geographical coverage: global sources delivered sizable and comparable data for all countries, no matter their size (Wikipedia, 2020).
  • Level of detail: providing fine-grained data with many interrelated variables and new concepts, such as network connections (Wikipedia, 2020).
  • Timeliness and time series: graphs can be produced within days of being collected (Wikipedia, 2020).

4.   Big Data Challenges

Big Data challenge is not a technical problem of transferring the maximum number of bits in the minimum amount of time, but also the scientific challenge of formulating approaches to perform the complex and twisted systems that must design and manage to run the modern world (Johnson, Big data: big data, digitization, and social change, 2017).

Another challenge is coping with its abundance and exhaustivity (including sizeable amounts of data with low utility and value), timeliness and dynamism, messiness and uncertainty, semi-structured or unstructured nature, and the fact that much of big data is generated with no specific question in mind or is a by-product of another activity. The tools for linking diverse datasets together and analyzing big data were poorly developed until recently because they have been too computationally challenging to implement (Huberman, 2017), (Useche, 2019).

5.   Ethics of AI, Data science and big data

Millions of people use the web for their social, informational, and consumer needs. During that, they publish their information through all social networks (Huberman, 2017).

The problem is that with real-world data, there is often information in there that you did not intend to be in there, but it is captured because of the bias in the data collection process. Human beings can have very diverse motives for why they make something. We need to put checks and control in place like any technology that it has utilized to benefit us (Askell, 2020).

Big commercial companies gather troves of private data claiming no interest in personal details, while in reality selling, exchanging, or misusing such data (Johnson, Big Data: Big Data or Big Brother, 2018).

 

References

Arslan, F. (2016). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, by Cathy O’Neil. Journal of Information Privacy and Security, pp. 157-159.

Askell, A. (2020, 12, 1). Ethics & AI: Privacy & the Future of Work. Retrieved from youtube: https://www.youtube.com/watch?v=zNxw5gJtHLc&list=PLzdnOPI1iJNeehd1RXhnVMBFi1WhWLx_Y&index=7

Huberman, B. (2017, 12). Big Data and the Attention Economy. ACM Digital Library, pp. 1-7.

Johnson, J. (2017, 12). Big data: big data, digitization, and social change. Ubiquity, an ACM publication.

Johnson, J. (2018, 8). Big Data: Big Data or Big Brothe. Ubiquity, pp. 2-10.

Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society.

Useche, D. O. (2019, 4). Challenges of Interpreting Big Data. Retrieved from “Big Ideas”: AI to the Cloud: https://blogs.commons.georgetown.edu/cctp-607-spring2019/category/week-11/

Wikipedia. (2020). Big data. Retrieved from Wikipedia: https://en.wikipedia.org/wiki/Big_data

Convergence in the Design and Use of AI/ML and Data Systems of the Cloud Computing Architecture

Cloud computing is the delivery of computer resources like data (cloud) storage and computing power on-demand, without direct management or deep knowledge of the user, and with pay, as we go (Wikipedia, 2021). Cloud removes the need for owning and maintaining physical data centres (Amazon Web Services, 2019).

Several companies are using the cloud for many use cases like backup, information recovery, emails, virtual desktops, big data analysis and software development (Amazon Web Services, 2019).

Cloud Architecture based on the convergence with AI/ML

Convergence between AI and data is a reality, not just at the macro level but also within specific industries and technologies. Cloud computing simple architecture consists of four basic parts: the cloud service, the cloud infrastructure, the cloud platform, and the cloud storage (Databases). AI/ML systems manipulate large amounts of data, so they need infrastructures that can scale based on the computation needs. This is where cloud computes infrastructure that can be scaled horizontally and on-demand becomes important (Roe, 2019). Building AI/ML systems need large volumes of training, validation and test data. So, big data analysis tools (Hadoop, Spark, etc.) are required in order to handle these types of volumes.

Another convergence is the Internet of Things IoT in which there are equipment generating data through hundreds of connected sensors, and the enterprise is deciding on maintenance. We need to store, transport, process (using AI/ML) and decide whether the machine needs maintenance. These processes require merging different technologies such as cloud, AI/ML and big data processing to work together to deliver the final result (Jarrahi, 2018).

Virtual assistance, for example, needs a cloud system so that they can connect multiple systems and data sources through the cloud wherever they need. Cloud services that have been created by service providers such as Amazon, Microsoft, Google, and Rackspace (Dong, Xiong, Castañe, & Morrison, 2018), are divided into three layers: the infrastructure layer, the cloud management layer and the service delivery layer. 

Cloud infrastructure design balances requirements ensuring data centre scalability, maintaining server fault tolerance and minimizing costs. Traditional data centre infrastructure stands on a hierarchical structure consisting of a three-tier design, including the Access Layer, the Aggregation Layer, and the Core Layer. The access layer connects servers residing in the same rack. In contrast, the aggregation layer is a multi-purpose system connecting the access and core layers to keep various communication domains separated. The last infrastructure layer is the core layer which provides high-speed, scalable and reliable communications among the entire data centre (Dong, Xiong, Castañe, & Morrison, 2018).

Cloud management platforms follow the same design, but their implementations differ significantly. Cloud management consists of three main components: the security part (privacy, authentication and data protection), the management services (monitoring, computational management, networking resource management, storage resource management), and the user interface part. 

The last layer in the cloud computing system is the service delivery layer. There are three basic service delivery models (Dong, Xiong, Castañe, & Morrison, 2018) (Ecourse Review, 2017) known as Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). IaaS gives end-users the ability to tangible infrastructures (physical servers, storage databases, network equipment). IaaS provides powerful flexibility to the end-users so that they can access their virtual machines. IaaS targets end-users who are interested in building Information Technology infrastructure. PaaS, on the other hand, reduces the configuration complexity and operational costs by providing pre-configured platforms and offer ready-to-use platforms to end-users such as operating systems (Linux, Windows), workflow engines (Apache Director Engine), Messaging frameworks (RabbitMQ, ZeroMQ), programming-language execution environment and Web application servers (e.g., Apache Tomcat, Oracle Glassfish Red Hat JBoss). SaaS gives users access to application software and databases. It is known as “on-demand software” and priced on a pay-per-use basis. SaaS providers install and operate software in the cloud so that users access them from cloud clients (Wikipedia, 2021) (Ecourse Review, 2017) (Rountree, Castrillo, & Hai Jiang, 2014). Figure1 includes the cloud design layers.

Fig.1 Cloud system layers

Another service called Mobile “backend” as a service (MBaaS) provides web application and mobile application developers with a flexible approach to link their applications to the cloud computing servers and storage. This service includes user management, integration with social network services and other push notifications (Roe, 2019).

ReferencesWeek 10

Amazon Web Services. (1 21, 2019). What is Cloud Computing? Retrieved from: Youtube: https://www.youtube.com/watch?v=dH0yz-Osy54

Dong, D., Xiong, H.. Castañe, G. G. & Morrison. J.P. (2018) Cloud Architectures and Management Approaches. Palgrave Studies in Digital Business & Enabling Technologies (p 31-61).

David Roe. (8 8, 2019). Why Big Data, IoT, AI and Cloud Are Converging in the Enterprise.  Retrieved from: CMS Wire: https://www.cmswire.com/digital-workplace/why-big-data-iot-ai-and-cloud-are-converging-in-the-enterprise/

Rountree, D., Castrillo, I.  & Jiang. H. (2014) The Basics of Cloud Computing. Waltham- USA: Elsevier.

Ecourse Review. (6 4, 2017). Cloud Computing Services Models – IaaS PaaS SaaS Explained. Retrieved from:Youtube: https://www.youtube.com/watch?v=36zducUX16w

Jarrahi, M.H. (2018) Artificial Intelligence and the Future of Work. Business Horizons، 4.

Wikipedia. (, 2021). Cloud_computing. Retrieved from: Wikipedia: https://en.wikipedia.org/wiki/Cloud_computing

The ethical, political, and ideological issues surrounding AI/ML applications (is that real or exaggerated?)

Although Artificial Intelligence (AI) and its subfields like Machine Learning (ML) and Deep Learning (DL) have very various good uses, they were used in a bad way to surveil and target colored and minority communities (Solon, 2019).

AI facial recognition systems were used to track possible violent people. Unfortunately, the bad training dataset made some of those systems detect dark-skinned people as a potential threat (Hao, StopAIethics-washing-and-actually-do-something, 2019) (Hao, 2018).

Some experts expressed their concerns about the future effects of AI technologies. Two important issues of ethical, political, and ideological issues surrounding AI/ML applications are data abuse and the deep fake (ANDERSON & RAINIE, 2018).

Data Abuse

Facebook and Twitter gather massive information from users and use them to suggest recommendations based on the user’s interests. Facebook AD preference, for example, uses data such as political leaning and racial/ethnic affinities to generate materials for them. 60% of users assigned a multicultural affinity class said they have a very strong affinity for their assigned group. Most social medial users believe that these platforms can detect their main features like race, political opinion, religious beliefs, etc. However, there are variations between what platforms say about users’ political ideologies and what users are (Hitlin & Rainie, 2019).

Some advertisers use multicultural affinity AI tools to exclude certain groups of races through work interviews. Some studies say that AI is responsible for these problems. We can say that not AI in specific, but the wrong use of AI application causes those problems.

Variations of the training database affect the performance of AI systems. In a study of gender identification based on deep learning (WOJCIK & REMY, 2019), DL algorithms failed to detect dark-skinned people. In fact, the size of their database was very low. The study also did not take into account all possible races and ages. Therefore, the results of this study cannot be considered as accurate results.

Flickr website images used by IBM Company to train their face recognition. The problem is that you don’t know if your images were used by IBM or not, but the fact is that IBM can use your photos because you used Creative Common License, allowing nonprofits to use your photos for free!. Some people were annoyed about using their photos, while others said they could enhance the face recognition systems. In some countries, if IBM did not respond to your request for removing photos, you can complain to your data protection authority systems (Solon, 2019).

Deep Fake

The “thisPersonDoesntExist.com” website was developed by Philip Wang to generate an infinite number of fake images. His technique was based on AI and used a very large dataset of real images. StyleGAN networks that were used in this website can accept not only humans’ faces but also any source helping graphical and animation designers to develop their applications (Games, Films Tricks, etc.). However, this technique can create fake videos by pasting people’s faces on target videos (Vincent, 2019). Trump appeared in a video offering advice to people of Belgium in case of climate change which was a fake film constructed by these deep fake networks. Some kind of bad use of AI can cause political criticisms and even mayhem (Schwartz, 2018). Maybe traditional Photoshop fake images would have the same bad effects by this AI technology.

Optimistic Future

Although all previous bad usage of AI, new detection methods have arisen. Fortunately, large groups of AI researchers are aware of AI ethics and have taken many approaches to solve this problem, like developing algorithms to reduce hidden biases within training datasets. They also focus on applying a process that holds AI companies responsible for fairer results (Hao, This_is_howAI_ bias_really_happens, 2019). Facebook committed to developing an ML algorithm detecting deep fakes (Schwartz, 2018). Some other AI researchers developed approaches to detect and reduce hidden biases within datasets (Hao, This_is_howAI_ bias_really_happens, 2019), (Hao, StopAIethics-washing-and-actually-do-something, 2019). AI companies protect user privacy, combating deep fake and taking into account wider datasets.

AI is the digital future of the world. Its benefits are obvious in all fields (Medical diagnosis, Data mining, Robotics, Big data analysis, image recognition, military application, security application, etc.). Deep fake also has positive benefits, like creating digital voices for people who lose theirs to diseases (Baker & Capestany, 2018).

Many high-profile initiatives established in the interest of socially beneficial AI and highly reputable. Like Montreal and IEEE, some principles said that the development of AI should ultimately promote the well-being of all humans. Other principles focused on the common good or benefit of AI applications’ humanity (Floridi & Cowls, 2019).

References:

Baker, H & Capestany. C2018). It’s Getting Harder to Spot a Deep Fake Video. Retrieved from: https://www.youtube.com/watch?v=gLoI9hAX9dw

Vincent, J. (2019). ThisPersonDoesNotExist.com uses AI to generate endless fake faces. Retrieved from: https://www.theverge.com/tldr/2019/2/15/18226005/ai-generated-fake-people-portraits-thispersondoesnotexist-stylegan

Anderson, J. & Raini, L. (2018). artificial-intelligence-and-the-future-of-humans. Retrieved from: https://www.pewresearch.org/internet/2018/12/10/artificial-intelligence-and-the-future-of-humans/

Hao, K.. (2018). Retrieved from: https://www.technologyreview.com/2018/10/21/139647/establishing-an-ai-code-of-ethics-will-be-harder-than-people-think/

Hao, K. (2019). StopAIethics-washing-and-actually-do-something. Retrieved from: https://www.technologyreview.com/2019/12/27/57/ai-ethics-washing-time-to-act/

Hao, K. (2019). This_is_howAI_ bias_really_happens. Retrieved from: https://www.technologyreview.com/2019/02/04/137602/this-is-how-ai-bias-really-happensand-why-its-so-hard-to-fix/

Floridi, L. & Cowls, J.. (2019). A Unified Framework of Five Principles for AI in Society. Harvard Data Science Review.

Solon, O.. (2019). facial recognition, dirty little secret. Retrieved from:https://www.nbcnews.com/tech/internet/facial-recognition-s-dirty-little-secret-millions-online-photos-scraped-n981921

Schwartz, O.. (2018). deep-fakes-fake-news-truth. Retrieved from: https://www.theguardian.com/technology/2018/nov/12/deep-fakes-fake-news-truth

Hitlin.P & Rainie,L.. (2019). Facebook-algorithms-acknowledgements. Retrieved from: https://www.pewresearch.org/internet/2019/01/16/facebook-algorithms-aknowledgments/

Wojic, S. & Remy, R. (2019). The challenges of using machine learning to identify gender in images. Retrieved from: https://www.pewresearch.org/internet/2019/09/05/the-challenges-of-using-machine-learning-to-identify-gender-in-images/

De-black boxing of virtual assistant (Alexa)

Heba

Week 8

Alexa is a well-known virtual assistant developed by amazon using AI in 2014 (Wikipedia, 2021). Alexa can play music, interact with our voices, make to-do lists, setting alarms, provide weather information, etc. (wikipedia, 2021). We can use Alexa as a home automation system controlling our smart devices (Wikipedia, 2021) (Amazon, 2021). Besides that, we can install extension functionality called skills, adding them to Alexa. Device manufactures can integrate Alexa voice capabilities in their products using the Alex voice service. In this way, any products built with this cloud-based service have access to a list of automatic speech recognition and natural language processing capabilities. Amazon uses the long-short term memory LSTM for generating voices (Amazon, 2021). In 2016, Amazon released Lex, making the speech recognition and natural processing language NLP available for developers to create their chat-bots (Barr, 2016). Less than a year later, Lex became generally available (Barr, AmazonLex–NowGenerallyAvailable, 2017). Now, web and mobile chat is available using Amazon connect (Hunt, 2019).

Any virtual assistant’s main components include a light ring, volume ring to control voice level, microphone array used to detect, record and listen to our voices, power port to charge the device and audio output. Virtual assistance, after that, recognize voice and store conversation in the cloud.

De-black boxing of virtual assistant (United States Patent No. US2012/0016678 A1, 2012)

Level 0:

Here, the virtual assistant is just a black box whose input is a voice commands from the user while the output is the voice response. Fig.1 includes the black box of the virtual assistant (Alexa, for example).

Fig1. Black box of Virtual Assistant

Level 1:

For level-1 de-black boxing, we can see the following components:

  • ASR (Automatic Speech Recognition): returns Speech as Text.
  • NLU (Natural Language Understanding): Interpret text as a list of possible intents (Commands).
  • Dialog manager: Look at intent and determine if it can handle it. The specified rules define which speechlet to be processed.
  • Data store: Includes the voice in a text response.
  • Text to speech: Translates skill outputs into an audible voice.
  • The third-party skill: The third party writes and is responsible for skill actions and operations. Fig.2 shows the level-1 de-black boxing of a virtual assistant (Alexa).
  •  

Fig2. Level-1 of De-black boxing of the Alex System

Level 2:

De-black box the ASR

The acoustic front-end takes care of converting the speech signal into corresponding features (speech parameters) via a process called feature extraction. The parameters of word/phone models are estimated from the acoustic vectors of training data. The decoder functions though the search of all possible word sequences to find the sequence of words that is most likely to generate. In a training phase, the operator will read all the vocabulary words and the word patterns are stored. Later, when for the recognition step, the word pattern is compared to the stored patterns and the word that gives the best match is selected. Fig3 illustrates the de-black box of ASR.

Fig3. Level2 of De-black box the ASR

De-black box of NLU (Natural Language Understanding)

Intent Classification (IC) and Named Entity Recognition (NER) use machine learning to recognize natural language variation. So, to identify and categorize key information (entities) in text, we need the NER of NLU. NER is a form of NLP, including two steps: detecting the named entity and the categorizing step. In step1, NER detects a word or thread of words that form a whole entity. Each word signifies a token: “The Great Lakes” is a thread of three tokens representing one entity. The second step requires the creation of entity categories like a person, organization, location, etc. IC labels the utterances of an NLP from a predetermined set of intents. Domain Classification is a text classification model that determines the target domain for a given query. It is trained using many labelled queries across all domains in an application. Entity Resolution is the last part of NLU that disambiguate records that correspond to real-world entities across and within datasets. So, to play “Creedence Clearwater Revival”, the NER will be “CCR (ArtistName)”, the Domain classifier is “music”, the IC is “PlayMusicIntent”, and the entity resolution will be ” Creedence Clearwater Revival”. Fig.4 includes the de-Blackbox of the NLU.

Fig4. Level2 of De-black box the NLU

Dialog Manager (DM)

DM selects what to report or say back to the user, whether to take any measure and decide to handle any conversation. DM includes two parts: dialog state tracking that estimates the user’s goals tracking the dialog context as input, and dialog policy which generates the next system action. Dialog state tracking can be done using RNN and neural belief tracker (NBT), while the dialog policy can be done using reinforcement learning (RL). Fig.5 shows Level2 of De-black box the DM.

Fig5. Level2 of De-black box the DM

De-black box of Text-To-speech TTS

The last part of the Virtual assistant allows computers to read text aloud. The linguistic front-end is used to convert input text to a sequence of features such as phonemes and sentence type. The prosody model predicts pattern and melody to form the expressive qualities of natural speech. The acoustic model is used to transform linguistic and prosodic information into the frame-rate spectral feature. Those features are fed into the neural vocoder and used to train a lighter and smaller vocoder. Neural Vocoder generates 24 kHz speech waveform. It consists of a convolutional neural network expanding the input feature vectors from frame rate into sample rate and a recurrent neural network synthesizing audio samples auto-regressively at 24,000 samples per second. Fig6 includes the details of TTS.

Fig6. Level2 of De-black box the TTS

Fig7 shows the De-black boxing of the Alex Echo system.

Fig7. Level2 of De-black boxing of the Alex System (De-black boxing of Echo system)

References:

Amazon. (, 2021). Amazon Lex.  Retrieved from: https://aws.amazon.com/lex/

Gruber.et.al. (2012). United States. Patent no. US2012/0016678 A1.

Jeff Barr. (2016). amazon-lex-build-conversational-voice-text-interfaces Retrieved from: AWSNewsBlog: https://aws.amazon.com/ar/blogs/aws/amazon-lex-build-conversational-voice-text-interfaces/

Jeff Barr. (2017). AmazonLex–NowGenerallyAvailable. Retrieved from: AWSNewsBlog: https://aws.amazon.com/blogs/aws/amazon-lex-now-generally-available/

Randall Hunt. (2019). Amazon-Connect. Retrieved from: AWS Contact Center: https://aws.amazon.com/ar/blogs/contact-center/reaching-more-customers-with-web-and-mobile-chat-on-amazon-connect/

Wikipedia. (2021). Amazon_Alexa Retrieved from: Wikipedia: https://en.wikipedia.org/wiki/Amazon_Alexa

wikipedia. (2021). Virtual_assistant.  Retrieved from: wikipedia_Virtual_assistant: https://en.wikipedia.org/wiki/Virtual_assistant

Machine Translation (Example: Google Translator)

Giving computers the ability to understand and speak a language is called Natural Language Processing (NLP) (NLP:CrashCourseComputerScience#36, 2017) (Daniel Jurafsky, 2000). NLP is considered an interdisciplinary field that fuses computer science and linguistics. NLP explores two ideas: natural language understanding (NLU) and natural language generation (NLG). While NLU deals with how to get the meaning of combinations of letters (AI that filters spam, Amazon search etc.), NLG generates language from knowledge (AI that performs translation, summarize documents, chatting bots etc.) (NLP-CrashCourseAI#7, 2019). There is an infinite number of approaches to arrange word in a single sentence, which cannot be given to computer as a dictionary. In addition to that, there are many words having multiple-meaning, like “leaves”, causing ambiguity, so computers need to learn grammar (NLP:CrashCourseComputerScience#36, 2017). To take grammar into account while building any language translator, we should first ensure the syntax analysis. Second, the semantic analysis must be applied to ensure that sentence make sense (Daniel Jurafsky, 2000), (How-Google-Translate-Works, 2019).

Language translation

Machine translation (MT) is a “sub-field of computational linguistics that uses computer software to translate text or speech from one language to another” (Wikipedia, 2021). Language translation (like Google Translator) is one of the most important NLP applications depending currently on neural networks. It takes texts as input in some language and produces the result in another language.

The first NLP method of language translation is the phrase structure rules-based which is designed to encapsulate the grammar of a language producing many rules and constituting the entire language grammar rules. Using these rules constructs a parse tree that tags words with a likely part of speech and reveals how the sentence is built (Daniel Jurafsky, 2000), (Wikipedia, 2021). Treating languages as Lego makes computers adept at the NLP tasks (The question “where’s the nearest pizza” can be recognized as “where”, “nearest”, and “pizza”). By using this phrase structure, computers can answer questions like: “what’s the weather today?” or executing commands like “set the alarm at 2 pm”. Computers can also use phrase translation to generate natural languages text, especially in the case when data is stored in the web of semantic information (NLP:CrashCourseComputerScience#36, 2017). The knowledge graph is Google’s version of phrase structure processing which contains 70 billion facts about and relationships between various entities. This methodology used to create chat-bots that were primarily rule-based. This approach’s main problem is the need to define all possible variation and erroneous input in rules, making the translation model more complex and slower. Fortunately, the Google Neural Machine Translation system (GMTS) has been arisen and replaced the rule-based approach since 2016.

Deep Neural Network (DNN) Architecture for language translation

Translation requires a profound understanding of the text to be translated (Poibeau, 2017), which can be done using DNN. The language deep learning model (Google Translator, for example) consists of the following parts (How-Google-Translate-Works, 2019), (NLP-CrashCourseAI#7, 2019):

  • Sentence to Vector Mapper (Encoder): which converts words into a vector of numbers representing them. For this part, we can use the Recurrent Neural Networks (RNN), like in Google Translator, to encode words and transform them into representations (Vectors) in order to be understood by computers.
  • Combine representations into a shared vector for the complete training sentence.
  • Vector to Sentence mapper (Decoder), which also another RNN used to convert representation into words.

Those both RNNs are Long-Short Term Memories (LSTM) dealing with long sentences. This architecture works well for medium length sentences (15-20) words, but they failed when the grammar goes more complex. The word in a sentence depends on the word before and the word that comes after. Replacing those RNNs with bi-directional ones solved the problem.

Another problem is what word should we focus on more in a long sentence. Translation now uses the alignment process (Poibeau, 2017), in which they align inputs and outputs together. These alignments are learned using an extra unit located between the encoder and decoder and called the attention mechanism (How-Google-Translate-Works, 2019). Therefore, the decoder will produce a translation of one word simultaneously, focusing on the word defined by the attention mechanism. Google translator (for example) uses eight LSTM bidirectional units supported by the attention mechanism.

However, until now, machine translation models based on deep learning performed well on simple sentences but, the more complex sentence, the less accurate translation (Poibeau, 2017).

References:

  1. How-Google-Translate-Works. (2019). Machine Learning & Artificial Intelligence Retrieved from YouTube: https://www.youtube.com/watch?v=AIpXjFwVdIE&ab_channel=CSDojoCommunity
  2. James H. Martin Daniel Jurafsky. (2000). Speech and Language Processing. New Jersy: Prentice Hall.
  3. NLP:CrashCourseComputerScience#36. (2017). Retrieved from YouTube: https://www.youtube.com/watch?v=fOvTtapxa9c
  4. NLP-CrashCourseAI#7. (2019). Retrieved from Youtube: https://www.youtube.com/watch?v=oi0JXuL19TA&ab_channel=CrashCourse
  5. Thierry Poibeau. (2017). Machine Translation. London, England: The MIT Press,Cambridge, Massachusetts.
  6. wikipedia. (2021). Machine_translation, .2020 Retrieved from https://en.wikipedia.org/wiki/Machine_translation

Analysis of Karpathy’s Article Key Points – Heba Khashogji

Machine Learning (ML) and Deep Learning (DL) can be used to analyze a tremendous number of images, extract useful information and make decisions about them (Machine_Learning&Artificial_Intelligence, 2017) like classifying E-mails, recommending videos, diseases prediction, recognizing handwriting ((LAB):CrashCourseAi#5, 2019) etc. ML gives computers the ability to extract high-level understanding from digital images (CrashCourseComputerScience#35, 2017).

The first appearance of such a model back in 1993, but the first actual use was in 2012 due to the GPUs’ development and the massive increase in data sizes (ImageNet, for example) (Karpathy, 2015).

ConvNet takes a 256x256x3 image as input and produces a probability of each output (class). The class with the highest probability will be chosen. At each layer, ConvNet performs convolution using filters, getting information like edges, color, etc. (CrashCourseComputerScience#35, 2017). More complex features will be extracted when we go deeper and deeper into the network. At the training process, filters are initialized randomly and trained until the network learns to match the image with the correct class (Karpathy, 2015). The training process of a deep network is complicated and takes much more time than traditional ones. Still, the accuracy is much better than the deep networks’ ability to handle massive data (ALPAYDIN, 2016).

Karpathy ConvNet to Classify Selfie Images

Karpathy applied the following vital steps to classify selfie images into good and bad:

  1. Gathering images tagged with #Selfie word (5 million images).
  2. Organizing the dataset: Karpathy divided the dataset into 1-million good and 1-million bad selfies based on some factors like the number of people that have seen the selfie, number of likes, number of followers and number of tags. 100-based groups were stored as good selfies while the rest ones stored as bad ones.
  3. Training: Karpathy selected the VGGNet pre-trained model and used Caffe to train it on the collected selfie dataset. ConvNet tuned its filters in a way that best allows the separation of the good and bad selfies under a well-known method called supervised learning (Dougherty, 2013).
  4. Results: The author selected the best 100 selfies out of 50000 selected by ConvNet. He introduced some advice to take a good selfie based on ConvNet results like females occupying about 30% of the image, cutting off the forehand, showing long hair, etc. He concluded that the style of the image was the key feature to make a good selfie.
  5. Extensions: The author also performed three different tasks; the first was the classification of celebrities’ selfies. Although there were specific factors to select the best selfies, oppose examples like including men and illumination problems appeared in some of the best selfies. The second task was to apply the t-SNE algorithm taking images and making some clustering by grouping them into categories based on similar conditions like the L2 norm. Results showed clusters like sunglasses, full-parts and mirror-included. The third task was to discover the best crop of a selfie. Karpathy randomly cropped image and introduced fragments to ConvNet, which decided the best crop. He found that ConvNet prefers selfies with heads taking about 30% of the image and chops off the forehead.

In some cases, ConvNet selected rude crops. Karpathy inserted a spatial transformation layer before the ConvNet and backpropped into six parameters defining an arbitrary crop. This extension didn’t work well. It sometimes was stuck. He also tried to constraint the transform, but it wasn’t helpful. The good news is that no global search is needed if the transform has three bounded parameters (Karpathy, 2015).

  1. Availability: Anyone on Twitter can use the “deepself” bot designed by karpathy to analyze his/her selfie and get the score of goodness his/her selfie is.

References: Link:

(LAB):CrashCourseAi#5. (2019). Retrieved from YouTube: https://www.youtube.com/watch?list=PL8dPuuaLjXtO65LeD2p4_Sb5XQ51par_b&t=67&v=6nGCGYWMObE&feature=youtu.be

ALPAYDIN, E. (2016). Machine Learning: The New Al . Cambridge: Massachusetts Institute of Technology.

CrashCourseComputerScience#35. (2017). Retrieved from Youtube: https://www.youtube.com/watch?v=-4E2-0sxVUM

Dougherty, G. (2013). Pattern Recognition and Classification. New York: Springer Science+Business Media.

Karpathy, A. (2015). https://karpathy.github.io/2015/10/25/selfie/. Retrieved 2020, from karpathy.github.io/2015/10/25/selfie

Machine_Learning&Artificial_Intelligence. (2017). Machine Learning & Artificial Intelligence. Retrieved from YouTube: https://www.youtube.com/watch?v=z-EtmaFJieY&t=2s

Technical Document for Data Science, Coding, and Use in Database, Computing and AI – Heba Khashogji

The many contexts and uses of the terms “information” and “data” make these terms perplexing and confusing outside an understood context. Using the method of thinking in levels and our contexts for defining data concepts, outline for yourself the concept of “data” and its meaning in two of the data systems we review this week. One “system” is the encoding of text data in Unicode for all applications in which text “data” is used; others are database management systems.

What is Data Science?

       Data science incorporates a set of principles, problem identification, algorithms, and processes for extracting unapparent and helpful patterns from large data sets. Many of the data science elements have been developed in related fields, such as machine learning and data mining. In fact, the terms data science, machine learning, and data mining are often used interchangeably. The commonality across these disciplines is a focus on improving decision making through the analysis of data. However, although data science borrows from these other fields, it is broader in scope. Machine learning (ML) emphases on the design and assessment of algorithms for extracting patterns from data. Data mining typically handle the examination of structured data and often suggests a focus on commercial applications.

A Brief History of Data Science.   

The term data science can be traced back to the 1990s. Nevertheless, the fields that it profits by having a much longer history. One thread in this more extended history is data collection history; another is the history of data analysis. In this section, we review the main developments in these threads and describe how and why they converged into the field of data science. Of necessity, this review introduces new terminology as we define and name the important technical innovations as they arose. For each new term, we provide a brief explanation of its meaning; we return to many of these terms later in the book and give a more detailed description of them. We begin with a history of data collection, then provide a history of data analysis, and, finally, cover data science development.  (Kelleher and Tierney, 2018). 

Document and Evidence

The word information commonly refers to bits, bytes, books, and other signifying objects, and it is convenient to refer to this class of objects as documents, using a broad sense of that word. Documents are essential because they are considered evidence. 

The Rise of Data Sets.

Academic research projects typically generate data sets, but in practice, it is generally impractical for anyone else to attempt to make further use of these data, even though significant research funders now mandate that researchers have a data management plan to preserve generated data sets and make them accessible.

Naming

Finding operations depend heavily on the names assigned to document descriptions and the named categories to which documents are assigned. Naming is a language activity and so inherently a cultural activity. For that, we introduce a brief overview of the issues, tensions, and compromises involved in describing collected documents. The notation can be codes or ordinary words. Linguistic expressions are necessarily culturally grounded and so unstable and, for that reason, are in conflict with the need to have stable, unambiguous marks if systems are to perform efficiently.

The First Purpose of Metadata: Description

The primary and original use of metadata is to describe documents. There are various types of descriptive metadata:  technical (to describe the format, encoding standards, etc.); administrative. These descriptions help in understanding a document’s character and in deciding whether to make use of it. Description can be instrumental, even if nonstandard terminology is used.

The Second Use of Metadata: Search

Thinking of metadata to describe individual documents reflects only one of the two roles of metadata. The second use of metadata is different: it emerges when you start with a query or with the description rather than the document—with the metadata rather than the data— when searching in an index. This second use of metadata is for finding, search and discovery. (Buckland, 2017). 

Both “information” and “data” are used in general and undifferentiated ways in ordinary and popular discourse. Still, to advance in our learning for AI and all the data science topics that we will study, we all need to be clear on these terms and concepts’ specific meanings. The term “data” in ordinary language is a vague, ambiguous term. We must also untangle and differentiate the uses and contexts for “data,” a key term in everything computational, AI, and ML.

No Data without Representation.

In whatever context and application, “data” is inseparable from the concept of representation. A good slogan should be “no data without representation” (which can be said of computation in general). By “representation”, we mean a computable structure, usually of “tokens” (instances of something representable) corresponding to “types” (categories or classes of representation, roughly corresponding to a symbolic class like text character, text string, number type, matrix/array of number values, etc.). (Irvine, 2021).

Knowledge of database technology increases in importance every day. Databases are used everywhere: They are fundamental components of e-commerce and other Web-based applications. They lay at the core across the organization’s operational and decision support applications. Databases are also used by thousands of workgroups and millions of individuals. It is assessed that there are more than 10 million active databases in the world today.

This book aims to teach the essential relational database concepts, technology, and techniques that you need to start a career as a database developer. This book fails to teach everything that matters in relational database technology. Still, it will give you adequate scope to create your databases and participate as a group member in developing a more immense, more complex database. (Kroenke et al., 2017).

The data type attribute (numeric, ordinal, nominal) affect the methods we can use to analyse and understand the data. Use to describe the distribution of values that an attribute takes and the more complex algorithms we use to identify the patterns of relationships between attributes. At the most basic level of analysis, numeric attributes allow arithmetic operations. The typical statistical analysis applied to numeric attributes is to measure the central tendency (using the mean value of the attribute) and the dispersion of the attributes’ values (using the variance or standard deviation statistics).

Machine Learning 101

The primary tasks for a data scientist are defining the problem, designing the data set, preparing the data, deciding on the type of data analysis to apply and evaluating, and interpreting the data analysis results. What the computer brings to this partnership is processing data and searching for patterns in the data. Machine learning is the field of study that develops the algorithms that computers follow to identify and extract data patterns. ML algorithms and techniques are applied primarily during the modelling stage of CRISP-DM. ML involves a two-step process.

First, an ML algorithm is applied to a data set to identify useful patterns in the data. Second, once a model has been created, it is used for analysis. (Kelleher and Tierney, 2018). 

References :

  1. Kelleher, J & Tierney, B (2018). Data Science. The MIT Press: London.
  2. Buckland, M. (2017). Information and Society. The MIT Press: London.
  3. Irvine, M. (2021). Universes of Data: Distinguishing Kinds and User of “Data” in Computing and Al Applications.
  4. Kroenke, D., Auer, D., Vandenberg, S. L., & Yodeer, R.C. (2017). Database Concept. Pearson: NY.

The signal transmission theory of information – Heba Khashogji

Computer and communication engineers specialize in systems that transmit information encoded as electromagnetic signals. For example, a microphone generates an electric signal as someone speaks, a magnetic disk records a copy of the signal, and a speaker generates a sound wave from that signal. A radio transmitter superimposes an audio signal on a radio frequency (RF) signal so that the RF amplitude tracks the audio signal, and a receiver subtracts out the RF signal to extract the audio. Engineers must be very precise and unambiguous about how they encode representations and their intended meanings. Otherwise, the physical systems they build will not work. Computer and communication engineers settled on the bit (short for binary digit) as their primary information unit. Claude Shannon introduced the term “ bit ”(Martell & Denning, 2015).

All we care about is what all that engineering adds up to when it succeeds in transmitting signals so that they become the physical basis for constituting the perceptible, meaningful patterns of our sign and symbol systems.

So we have an essential, foundational principle: the engineering principles for E-information (signals transmission and reception) form an intentionally designed subsystem for our primary meaning systems, that is, our sign and symbol systems (language and writing, mathematics, graphics, images, sound, film/video), which can be represented as data in the computing and E-information context (more on data later). In our contemporary electronics environment, we need the knowledge provided by both semiotics (the study of human symbolic cognition and sign systems) and the engineering theory of E-information (mathematics + physics) for all information systems engineered to use sections of the electromagnetic energy spectrum (electricity, radio waves, light waves (Irvine, 2021).

In the world of information science, the meaning is often tied to the notion of representation. A principle that underlies the whole concept of computation is that one state can be represented by another state. The states need not be in the same system. To give two examples: the words you type on a keyboard can be represented by voltages and current flows inside an electronic computer; music performed by a human artist can be characterized by a pattern of silver and black dote on a DVD. When a particular representation affects you, it has meaning to you. (Mayfield, 2013).

We need to understand the core concepts and design principles for E-information as a subsystem, and then go on to explain how the E-information subsystem is designed to serve our larger symbolic systems. We complete the whole picture with the knowledge provided by other fields (linguistics, semantics, pragmatics, semiotics, and other communication approaches), but not modelling those fields on the E-information transmission model.

As we’ve just reviewed, the signal-code-transmission model of information theory was initially developed as a set of models for transmitting error-free electronic signals in telecommunication systems where networks and radio frequencies’ physical limits and capacity could be precisely defined and engineered. This model provides an essential abstraction layer in the designs of all electronic and digital systems. It does not provide an extensible model for the larger sense of communication and meaning systems that these symbolic cognitive technologies allow us to implement. The meanings and social uses of communication are left out of the signal transmission model because they are assumed or presupposed as what motivates using signals and E-information at all. This is why we need to understand that the designs and engineering techniques for E-information are used for creating a data or semiotic subsystem using binary electronics.

“Information” in this context is thus primarily unobservable (we cannot observe energy fields, electronic pulses, or signals used as binary representations). (Irvine, 2021).

Why is the information theory model essential for everything electronic and digital, but insufficient for extending to models for meanings, uses, and purposes of our sign and symbol systems?

The transmission model of E-information is essential to understand. Still, it cannot be used for extrapolating to a model for communication and meaning more generally (though some schools of thought have tried unsuccessfully to use the model this way). The signal transmission theory is constrained by a signal-unit, point-to-point model, with the “conduit” and “container.”

The larger context surrounding E-information also includes what cognitive science research calls “meta-symbolic” knowledge, the understanding of meaning frameworks for meanings, and the essential meta-information (information about the information in the generic information sense) known to all communicators using a symbolic medium. This includes cultural knowledge of various kinds/genres of messages, social conventions, categories of meanings or cultural codes, and assumed background knowledge, which, of course, as meta-information is not – and cannot be — represented in the signal information, the E-information, itself. (Barwise, 1986 in Gleick, 2011).

References :

  1. Gleick, J. (2011). The Information: A History, a Theory, a Flood. Bantheon Books: NY
  2. Irvine, M. (2021). Introducing Information Theory: The Context of Electrical Signals Engineering and Digital Encoding.
  3. Martell, C & Denning, P (2015). Great Principles of Computing. The MIT Press: London
  4. Mayfield, J. E., (2013). The Engine of Complexity: Evolution as Computation. NY: Columbia University Press: NY

Computing Design Principles – Heba Khashogji

Deblackboxing the logic of how and why computers embody some specific kinds of computer system designs may lead us to the main concept of computing process. Today, we understand that the “Modern Computing is about designs for implementing human symbolic thought delegated to physical(electronic) structures for automating symbolic processes that we can represent in digital form”.

According to Prof. Irvine[1], The logical design, implemented physically, for automated controlled sequencing of input encoded symbols to output encoded symbols is what makes a computer a computer.

On the other hand, using the ideas raised by Alpaydin and Kelleher, a recent paper written by Brian Haney[2] can illustrate a way for understanding the “bottom up” system design approach. The paper explains how the “Scholars, lawyers, and commentators are predicting the end of the legal profession, citing specific examples of artificial intelligence (AI) systems out-performing lawyers in certain legal tasks.“. The article shows that “technology’s role in the practice of law is nothing new. The Internet, email, and databases like Westlaw and Lexis have been altering legal practice for decades.”. The increasing demand on the automated service in the field of legal profession was the main reason behind more and more bottom-up required designs. Similarly, we can find many other examples from other professions like accounting and statistics which will face the same destiny.

Until now, working through the main principles and learning precise definitions of terms helped us to “deblackbox” what seems closed and inaccessible in understanding the sophisticated concept of computer design and computing process. However, I would stay wondering why the scientists, producers, and engineers did not find a “more-comprehensive” terms since “Computing” is the term that came from a pure mathematical and accounting background, although the computer is a type of device/technology used for more than computing issues? Is it difficult to use a broader term in referring to the real nature of this technology because it became commonly used and hard to change it, or it is because of the mathematical basis of encoding symbols and logical designs?

 

[1] Prof. Irvine, (Video) “Introduction to Computer System Design”.

 

[2] Brian S. Haney, Applied Natural Language Processing for Law Practice, 2020 B.C. Intell. Prop. & Tech. F. (2020).