Author Archives: Fudong Chen

Tracking and Sharing, a method to improve recommendation accuracy?

Fudong Chen

Abstract

This article attempts to answer the following questions: whether and how the recommendation system can recommend a topic-related content that has never appeared in the system. To figure out the question, the article gives a brief description of the recommendation system and concludes that without relative data, the system cannot recommend relative content. Then the article focuses on the external data of the app and deblackbox the digital fingerprint to show that it is possible to improve the recommendation system by tracking users and sharing data. Finally the article discusses the data privacy and expressed some concerns.

Introduction

The recommendation system of various apps based on machine learning and algorithms brings us a lot of convenience. Shopping apps recommend products we need, video apps recommend videos that attract us, and search engines guess what we want to search. The prediction or interference of our needs through machine learning and various algorithms is actually well understood, since they are based on the behavioral data created by ourselves. For example, I bought a science fiction novel, and the shopping app recommends other science fiction novels; another example, if I click on a cat-related video, the app will recommend more cat videos. Admittedly, based on the different recommendation system algorithm, there are different recommendation strategies, but most of the strategies are explainable and understandable from a human perspective. But in daily life, we may meet the following situations: we discussed a topic with friends (maybe on other apps, or even in reality). And the topic has never been discussed or searched on an app. but after a while, the topic-related advertisements or videos are recommended on the app. This coincidence naturally makes us question: can the recommendation system recommend a topic-related content that has never appeared in the system, or is our mobile app monitoring us all the time and extracting key words for recommendation? This article will try to answer this question. First, explain the composition and data sources of the recommendation system in general. Then, starting from data sources, explain how large Internet companies build user portraits in multiple dimensions, debalckbox the method of tracking users. And finally discuss the impact of mobile phone fingerprints (digital fingerprints) on data privacy.

How does the recommendation system works?

Before discussing whether the recommendation system can make recommendations as accurate as monitoring,we need to briefly describe how the recommendation system works. Simply put, the recommendation system is divided into three aspects: data, algorithm, and architecture. The data provides information and is the input of the recommendation system. The data contains user and content attributes information and user behavior and preference information, such as clicking on a certain type of video, purchasing a certain type of goods, etc. The algorithm provides the logic for processing the data, that is, how to process the data to get the desired output. Take the most commonly used algorithm in the recommendation system, Collaborative Filtering algorithm, as an example. Collaborative Filtering is based an assumption: if A and B have similar historical annotation patterns or behavior habits in some content, then they will have similar interests in content. It generally uses the nearest neighbor algorithm to calculate the distance between users by using the user’s historical preference information, and then uses the weighted product reviews of other user which is the nearest neighbor to predict the target user’s preference for a specific product. The system recommends products or content to target users based on the result. The architecture specifies how data flows and processes. It specifies the process of how data travels from the client to the storage unit (database) and then back to the client.

In other words, the recommendation system categorizes raw data and forms user portraits, attaches model tags or labels (ie patterns) to each user, and then recommends content based on various algorithms, such as the Collaborative Filter just mentioned.

Fig1, data processing

As fig1 shown above, the original data contains four aspects:

User data refers to the user’s personal information, such as gender, age, registration time, mobile phone model, etc.

Content data refers to the content provided by the app. Foe example, content data of shopping apps such as Taobao and Amazon are related with products and product reviews. Content data of video apps such as Tik Tok and Netflix are related with videos and video reviews.

User behavior logs refer to what the user did on the app, such as what videos they searched for, what videos they shared, or what product they purchased.

External data is data given by other apps. A single app can only collect a certain aspect of the user’s preferences data. For example, a video app can only describe what type of content user prefers in the video field. But if we integrate other different types of app data, the user’s data dimension will be greatly enriched.

The fact labels are cleaned based on the original data, including dynamic and static portraits:

Static portrait refers to the attributes of the user which are independent of the product scene, such as age and gender. Such information is relatively stable.

Dynamic portrait refers to the user’s behavior data on the app, and explicit (the behavior clearly expressed by the user) includes likes, sharing, etc. It is worth mentioning that if it is a comment, it is necessary to use NLP to determine whether the user is positive, negative or neutral. The implicit ones (the user does not clearly express their preferences) include the duration time the user watch video, clicks, etc.

Model labels are obtained through weighted calculation and cluster analysis through fact labels, which means weight for each dimension, and then calculate, and the users will be classified (cluster analysis) depended on the calculation.

In short, the recommendation system processes the data layer by layer by using various models and algorithms, and then returns the corresponding recommendation results. But in any case, the recommendation system cannot give recommendation results out of nothing. It needs to input various data, process the data according to algorithms designed by humans, and return the results according to certain logic. Therefore, for a single app, if we have not discussed the topic on the app (that is, there is no corresponding data for the recommendation system), it is reasonable that the app will not return the recommendation results of the related topic.

However, it can be seen in fig1 that the data source is not limited to the app itself. If there is corresponding external data, the recommendation system have the ability to recommend the content corresponding to the external data. In fact, technically speaking, large Internet companies such as Google, Alibaba and ByteDance, etc., usually have multiple apps in different fields, which can share user data and expand user portraits’ dimensions through user account information and digital fingerprints. Take Alibaba as an example. Ali’s apps include map, health, payment, video platform and even weibo, a social platform, so Ali’s portrait of Chinese users can cover many dimensions. It is worth mentioning that for different apps with common accounts, it is reasonable to directly match the account with the database. However, some Ali-owned apps, such as AutoNavi Maps, do not require users to log in to their accounts. Does Ali have a way to track this kind of users? The answer is yes. For users who use the app without logging in to a personal account, the app can identify or track users by the fingerprint of the smartphone.

How to track users?

Existing tracking mechanisms are usually based on either tagging or fingerprinting (Klein & Pinkas, n.d.). Tracking here are similar to the word recognize or identify mentioned above.

The typical tagging method is cookies. Cookies are data stored on the user’s local terminal. It is a small piece of text information sent by the server to the client browser and stored locally on the client as a basis for the server to identify the user’s identity status. Their main use is to remember helpful things like your account login info, or what items were in your online shopping cart (Cover Your Tracks, n.d.). But now, whether PC browser or mobile phone, there are many users who choose to delete or hide cookies, which leads to the poor effect of using cookies to identify users.

The typical fingerprint technology is Browser fingerprint technology. it is a concept proposed by Ecjersley in 2010 (Eckersley, n.d.), which means when a user uses a browser to access the server, the server get browser feature identification, canvas feature value, some hardware Information and system information, and generates a unique strings for the browser used by the user through a specific fingerprint generation algorithm. The accuracy of user identification technology based on browser fingerprints depends on the identification ability of browser fingerprints, and the identification ability of browser fingerprints depends on its degree of uncertainty. The higher the uncertainty, the higher its uniqueness. , The stronger the identification ability. For example, whether sharing cookies is a measurement of fingerprints. Some people are willing to share and the others are not. So if we know the measurement whether sharing the cookies, we can make sure which one the user belongs to. And if we have more measurements, users will be more likely to identify. From the source of measurements acquisition, they are divided into HTTP headers and JavaScript. HTTP headers means when connecting to a server, browsers send the user-agent, the desired language for a webpage, the type of encoding supported by the browser, among other headers. JavaScript is a programming language used to develop web pages. The server can obtain device information through JavaScript commands. For example, obtain the User-Agent through navigator.userAgent, and use commend of Intl.DateTimeFormat().resolvedOptions().timeZone to obtain the time zone. The following figure shows my fingerprint information on the website AmIUnique:

Figure 2, some measurements of fingerprints, source: https://amiunique.org/fp

All the measurements in Figure 2 are to find out the uniqueness of the user. It is worth mentioning the measurement of Canvas and WebGL. When drawing a 2D picture or 3D picture on different operating systems and different browsers including PC and mobile phone, the generated image content is actually not exactly the same, even if it looks the same to our eyes. So by extracting the picture information of Canvas and WebGL, we can uniquely identify and track the user.

Deblackbox the digital fingerprints

In the above, we talked about the measurements of browser fingerprints. In fact, digital fingerprints of mobile phone and browser fingerprints have many similarities, especially those related to JavaScript. Although different algorithms use different measurements to track mobile phones, these digital fingerprints models all follow a generic methodology which is shown below:

Figure 3, Generic methodology of digital fingerprints, source: (Baldini & Steri, 2017)

Meanwhile, we can also deblackbox digital fingerprint following the fingerprint recognition process of the browser.

Figure 4, Browser fingerprint recognition process

Looking at the two pictures together, digital fingerprint recognition is composed of 3 entities, namely the mobile phone on the client side (refers to Browser), the apps on the server side (refers to Website), and the database (SQL). In fact, for fingerprints of mobile phones, in addition to the above-mentioned measurements similar to browser fingerprints, such as device information, user configuration, etc., there are also many measurements about mobile phone components (hardware). But all the data needs to be digitized before proceeding to the next step. Therefore, for apps, digital information that can be directly obtained is usually used for identification.

When the user enters the app, the identification process of digital fingerprints begins. After the users access, App will send files such as html, css and JavaScript to the client, and usually the fingerprint collection script will be sent to the user together. The fingerprint collection script is defined by the app developer. For simple features, they can be obtained directly through API. For example, the user agent can directly use the userAgent property of the navigator object to obtain, and the screen resolution can be obtained through the width and height properties of the Screen object. The client (here means the phone) will send the fingerprint digital information to the app according to the script command. Note that because JavaScript and html do not require permission to run, users cannot perceive this process. The digital information will then be sent to the database, and be matched by Instance based algorithm and machine learning algorithm in the database. Instance based algorithm is often used in static fingerprint, which means the collected fingerprint feature values are converted into string form and spliced, and the spliced string is transformed into a fixed-length number through a hash algorithm. So if the number matches one of the instance in the database, then the user is identified. However, due to the frequency of feature value changes, the tracking time of static fingerprints for users is often very short. Most of the time, the company will use dynamic fingerprint and matches it by deep learning. In simple terms, the dynamic fingerprint compares each feature value of the fingerprint and sets a threshold. When the similarity of fingerprint to be matched and a fingerprint in the database reach the threshold, then confirm that the two match, otherwise insert the fingerprint to be matched into the database as a new one. There are many methods to generate threshold, such as statistical analysis methods, distance algorithms, random forest algorithms, LSTM algorithms and so on.

Back to the original question, when the user portraits of people are enriched, the portraits will not only include behavioral data, but also interpersonal relationship data and the data about relationship between you devices (PC, phone and so on) and accounts. For example, if you shared a shopping link to a friend a long time ago, your user portrait and your friend’s user portrait will be considered relevant, so when you discuss a topic with your friend, your friend may have left data on the topic online. The recommendation system based on the relationship between you and your friends, as well as other data such as location, coexisting in a local area network, etc. It is reasonable that after discussing the topic, the recommendation system will recommend the relevant content to your friend and also recommend it to you at the same time.

Discussion of data privacy and sharing personal data

Whether it is the opt-out privacy policy in the United States or the principle of informed consent represented by Europe, I think the key to data privacy lies in informed and optional. Like cookies that record user data, sharing cookies lets users enjoy convenience on the website and get a better experience; not sharing cookies will not lead to be unable to use the main services of the website. More importantly, the user has the right to choose whether to share cookies or not. Even if the website does not provide the option of not obtaining cookies, users can manually cancel sharing cookies through browser settings. But the emergence of digital fingerprints broke the principle of informed and optional. Now whether it is a website or an app, whether it is a PC or a mobile phone, companies can collect digital fingerprint information to identify users without the user’s perception. Secondly, for data share, the app usually provides a privacy policy statement before use. No matter which type of app, the topic of data sharing will be mentioned that the consent of data privacy statement means that the company is allowed to share the data in the company and its affiliates. If the user rejects the statement of privacy policy, he will not be able to use the services of the entire app. This actually deduces the user’s choice. Additionally, Some companies can let users turn off Ad personalization by themselves. Apps of Google and Ali all have this option. But this option does not guarantee that these companies will not collect your data. For example, app of Taobao clearly states: Service logs, device-related information, and device location information when you use the app will all be used for personalized recommendations. You can make decision independently on recommended content by turning off personalized recommendations (in my view, instead of refusing to be collected information). Take another example, Google’s privacy policy update of June 2015 indicates that they use “technologies to identify your browser or device (Privacy Policy – Privacy & Terms – Google, n.d.)” In fact, according to an interview with Bytedance employees, the above-mentioned information is classified as level-2 information, which means we cannot find a specific person in reality followed this kind of information. But it contains information like Consumer behavior, geographic location, browsing history and it can point to a specific account but not directly pointing to the owner of the account. After special approval, this kind of information can be shared with related company or different departments in the same company. In other words, the data we generate in an app and the user portraits generated therefrom may be used and analyzed by other apps of the same company. In addition, the combination of data sharing and tracking user technique also makes the app’s permission acquisition policy useless. For example, even if I forbid the shopping app to obtain the current location permission, it can still get the desired data through the map app. Take another example, the content I posted on social media can also be learned and analyzed by other apps, even if I do not log in to other apps with a social media account. In fact, when a user logs in to Tik Tok for the first time, which is also called cold start, the user may still be recommended to the accounts of classmates or friends he know in reality before he generates first bit of behavioral data in Tik Tok. This is brought about by track technique and data sharing. In addition, although the permission acquisition situation of apps is transparent, and the sensitive permission will needs to be confirmed by the user every time when it is used (for example, for the acquisition of microphone permission, app permission needs to ask for user consent, and a second confirmation is required when using microphone permission. This is also the reason why I think it is temporarily impossible to use mobile phone to monitor keywords for advertising recommendations), some mobile phone components that are considered not sensitive and do not require permission to use may also be used to violate privacy. According to Zheng, there is technology to eavesdrop part of the voice information of the mobile phone speaker through the accelerometer, a motion sensors of mobile phone (Zheng et al., n.d.).

Conclusion

In the article, we ask a question based on a daily phenomenon: whether the mobile app has the ability to make recommendations as accurate as monitoring. First of all, we introduce the basic composition and operation of the recommendation system, and concluded that the recommendation system cannot give recommendation results out of nothing. It needs to input various data and process the data according to the algorithm designed by humans. The result should be relative with the input data. From the perspective of data sources, we deblackbox the process of digital fingerprint and believe that the data sharing of apps in different fields and the user tracking technique can enrich user portraits and make accurate recommendations. Finally, the article expresses the concerns about the impact of digital fingerprint on data privacy, and considers that data privacy in the mobile phone field needs more research and corresponding restrictive measures.

References

Baldini, G., & Steri, G. (2017). A Survey of Techniques for the Identication of Mobile Phones Using the Physical Fingerprints of the Built-In Components. 19(3), 29.

Eckersley, P. (n.d.). How Unique Is Your Web Browser? 19.

Klein, A., & Pinkas, B. (n.d.). DNS Cache-Based User Tracking. 15.

Laperdrix, P., Rudametkin, W., & Baudry, B. (n.d.). Beauty and the Beast: Diverting modern web browsers to build unique browser fingerprints. 18.

Privacy Policy – Privacy & Terms – Google. (n.d.). Retrieved May 13, 2021, from https://www.google.com/policies/privacy/archive/20150501-20150605/

Zheng, T., Zhang, X., Qin, Z., Li, B., Liu, X., & Ren, K. (n.d.). Learning-based Practical Smartphone Eavesdropping with Built-in Accelerometer. 18.

Cover Your Tracks. (n.d.). Retrieved May 13, 2021, from https://coveryourtracks.eff.org/learn

Anand, S. A., & Saxena, N. (n.d.). Speechless: Analyzing the Threat to Speech Privacy from Smartphone Motion Sensors. 18.

FP-STALKER: Tracking Browser Fingerprint Evolutions. (n.d.). 14.

Das, A., Borisov, N., & Chou, E. (n.d.). Every Move You Make: Exploring Practical Issues in Smartphone Motion Sensor Fingerprinting and Countermeasures. 21.

Hauk, C. (2021, January 14). Browser Fingerprinting: What Is It and What Should You Do About It? Pixel Privacy. https://pixelprivacy.com/resources/browser-fingerprinting/

Big data in media view and science view

In our daily life, media often advertises big data as collecting and using a lot of data to get the objective and correct answers. In other words, in the frame of media, the big data technology is described as an ideal input-output black box. Big data here is huge in volume, including the long tail (means cover all the things including minority) and objective (unconscious), so it seems reasonable that the results got from the big data is objective and correct.

In some ways, it makes sense. Take the recommendation system of music app as an example. First, recognize the patterns (regularities) in all the music data by a machine and classify them into different type based on the patterns. Then consider users’ behavior as data and get the behavior patterns. Map the user pattern and the music pattern and constantly adjust the output results (the music recommended) in real time to get better feedback (user clicks the like or favorite button or downloads the music). It’s a use of big data and machine learning and the big data here is fit to the definition of Kitchin: “huge in volume”, all the music and users behaviors here are the data; “high in velocity, being created in or near real-time”, every time the user clicks or does not click, it is generated in real time and will be returned as new data; “fine-grained in resolution and uniquely indexical in identification”, the system is based on each user’s behavior to recommend music and adjust the results, it can be said that each user’s recommendation system is unique; “relational in nature, containing common fields that enable the conjoining of different data sets”, data of the user behaviors, music, etc. are gathered together for real-time analysis; “flexible, holding the traits of extensionality (can add new fields easily) and scalability (can expand in size rapidly)”, the data use for analysis is allowed to add a new user, a new music or a new variable of user behavior.

Big data in the media perspective is actually an empiricist epistemology of big data. But actually, the results of the big data are not that objective and correct like the media describes, since the process of big data collection and analysis contains the participation of human, like the choices of algorithm and models. This is what Kitchin said” data are created within a complex assemblage that actively shapes its constitution”. In fact, the recommendation systems of different music apps are different and the same users will get the different recommendation results when use different music apps, which can prove that the big data technology is not so that objective and correct. (If so, the results should be the same.) What’s more, the big data only give the correlation and insights in the data but cannot explain why. But for the business view, it is useful enough. The operators do not need to know why the user like song A will also like song B, all they need to know is that there is positive relationship between song A and B. Therefore, it is reasonable for media to use the empiricist view of big data, since it can simplify the epistemology, easy for ordinary customers to accept and persuade to buy or use the service.

But it is in different situation when comes to science. The simple input-output model and empiricist epistemology cannot meet the need of science research. Take machine learning of good selfie as an example. The big data and machine learning can return a dataset of good selfies but it does not explain why they are good. The output only shows the phenomenon, or it shows a surface correlation between a good selfie and selfie patterns. It is a result of abduction, which means the machine give the best result in a specific scenario. But for the science, especially the humanities, it is not enough to only get a pattern snapshot. The important thing should be how to explain the correlation or why the machine return this result. Form this point of view, like Kitchin said,” the pattern is not the end-point but rather a starting point for additional analysis” (Kitchin, n.d.). The big data and machine learning gives a new method to find the phenomenon and then science research will do additional deduction or induction work to explain it.

 

Reference

Johnson, J., Denning, P., Delic, K. A., & Sousa-Rodrigues, D. (2018). Big Data or Big Brother? That is the question now. Big Data, 10.

Johnson, J., Denning, P., Sousa-Rodrigues, D., & Delic, K. A. (2017). Big Data, Digitization, and Social Change. Big Data, 8.

Kitchin, R. (n.d.). Big Data, new epistemologies and paradigm shifts. Big Data, 12.

Amazon Translate service

In our daily life, cloud services are everywhere but not obvious. In many situations, we use products or services which use cloud computing but we just contribute them to the Internet. Email services provided by different company, translation services and Virtual Assistants are all the examples. We have the network, so the products can work well. (In fact, the networking is only the base or tool of one of the five characteristics of cloud, ubiquitous access. But outside the black box, it looks like the network finishes all the things.) While on the other hand, in terms of the business, many company claim that they use cloud computing to support their work to show that they are the high-tech company. The separation of daily experience and business claiming about the cloud computing may label the cloud as a high-tech matter, irrelevant to daily life and exacerbate the black box of cloud service and the cloud computing. What is the cloud or cloud computing? According to Rountree & Castrillo, “cloud is actually a service or group of services”. There are 3 basic cloud service models, SaaS, PaaS and IaaS which “can be viewed in terms of increased levels of abstraction”(Ruparelia, 2016). Take an example of Amazon translate.

Amazon translate is one of the service of the Amazon Web Services (AWS). AWS can provide different service models for users based on user needs. For example, if an individual want to translate a text from English to Chinese, he can just use the console to translate, select the language and input the text, then an real-time translation results output. This is an SaaS example. But when comes to the business, it would be different. For example, the Hotels.com needs to translate customer reviews in 41 languages so that the users can understand the reviews and have more information about the hotels (Amazon Translate Customers, n.d.). In this case, Hotel.com should not only input the review data to the cloud but also manage its website and how to collect its data and show the results to the users, like write the CSS and HTML to create the Web page, a function of code to calls Amazon Translate to work and so on. This is an PaaS example. In fact, the Amazon Translate provide limited code language to use it, like the JAVA and python (in fact, translate a web page requires AWS SDK for java). And it specifies how to input data and how to upload data. So for the translate function, AWS only provide the PaaS or higher abstract level of cloud service model. But AWS itself provide the IaaS. As IaaS, It only provides computing infrastructure and some basic function like data storage, virtualization, networking and resources above vendors, Users can handle the data, middleware and the system.

As mentioned before, cloud is a group of services. For the translate itself, the input data will just calls the Amazon Translate model, then the encoder of the model reads the word of text one by one and construct a semantic representation. then the decoder use the representation to generate the translation also one word at a time (What Is Amazon Translate? – Amazon Translate, n.d.). While as a cloud services, we usually use many services of the cloud instead of one. AWS encapsulates many cells (services). Translate is one of them. Most of the time, we will use different cells to get the final output. For example, use Amazon Polly to read the Amazon Translate results. In this case, before we get the translate results, inside the cloud, the data will also input to the Amazon Polly model and return results to the users.

In my opinion, the cloud services is an AI version of professionals do professional things. like the video How AWS Is Changing Businesses Using Artificial Intelligence said “AI and machine learning is hard to implement alone and can be a complex undertaking”. Take Hotel.com as an example, it’s difficult for them to train a AI on its own to translate the reviews in different languages, let alone it needs 41 languages. But with AWS, all the Hotel.com need is pack the data and input it through the APIs provided by AWS.

In short, with the cloud services, companies do not have to store, manage and process all the data on the local server, but can use to outsource all these to the cloud platform, thereby saving costs and improving work efficiency. The transformation of this model is like changing from building a house to building blocks.

 

Reference

Amazon Translate Customers. (n.d.). Amazon Web Services, Inc. Retrieved April 5, 2021, from https://aws.amazon.com/translate/customers/

edureka! (2018, July 13). Cloud Computing Service Models |  IaaS PaaS SaaS Explained | Cloud Masters Program | Edureka. https://www.youtube.com/watch?v=n7B4icXvs74

Rountree, D., & Castrillo, I. (2014). The basics of cloud computing: Understanding the fundamentals of cloud computing in theory and practice. Elsevier/Syngress.

Ruparelia, N. (2016). Cloud computing. The MIT Press.

What Is Amazon Translate? – Amazon Translate. (n.d.). Retrieved April 3, 2021, from https://docs.aws.amazon.com/translate/latest/dg/what-is.html

 

Question

What’s the difference between cloud services and cloud computing? Take AWS as an example, are the products provided by AWS cloud services? and are the services based on the cloud computing?

 

Ethics of deep fake

I want to talk about deep fake this week. Deep fake is closed to our daily life today. For example, we can upload our friends’ photo to combine with a dynamic meme. Actually, the top commitment below the video It’s Getting Harder to Spot a Deep Fake Video (Bloomberg Quicktake, 2018) is “2018: Deep fake is dangerous 2020: DAME DA NE”, which means from 2018 to 2020, the impression of deep fake is changed from dangerous to meme. In addition, My Heritage platform allows people to upload old photos and let them come to life. In my view, the ethic or social problems of deep fake technology are about the data collection and how to use it as a tool.

Deep fake, based on GANs (Generative Adversarial Networks), refers to the algorithms that input pictures and sounds and do the face manipulation. Put one person’s facial contour and expression on other specific person’s face, and at the same time use the realistic processing of the sound to create a synthetic but seemingly real video.

The first ethic problem is about the data collection. The deep fake may not have data bias problem in my view, since the goal of deep fake is to replace one’s face with others. It might have some “dangerous” patterns of race or gender, but we cannot find it out and it would not lead to bias output, at least in my opinion. But what about during the training of deep fake it may use many photos data without consent? I think refers to deep fake, the data collection does not infringe personal information and has no effect on each individuals. “The risk of any harm does not increase by more than a little bit as the result of the use of any individual’s data.”(Kearns & Roth, 2020) But whether the benefit overweight the sum of the cost of all the individuals and the distribution of benefit is fair are based on how to use the deep fake.

When deep fake is used in journalism, it seems that the Pandora’s Box is opened. From a computer science perspective, we now still have methods to differentiate whether a video using deep fake technology to generate “fake” faces. Since it is not creating a video with nothing but needs a large amount of data of specific person’s audio and video to extract the features and patterns. But when it comes to the communication and journalism, the point is not how well deep fakes can do but it has the ability to do. Visual texts was originally the most powerful evidence for constructing truth. But deep fake replaced different or even opposite content and meanings of the visual texts, resulting in self-subversion of the visual texts. In other words, deep fakes overturns the notion that seeing is believing. I am concerned and scared that because of the overturn, people might only be willing to believe what they want to believe and consider the videos that contradict one’s own point of view as the output of deep fakes. And like Danielle Citron said “When nothing is true then the dishonest person will thrive by saying what’s true is fake.”(You Thought Fake News Was Bad?, 2018)

 

Reference

Atlantic Re:think. (2018, June 30). HEWLETT PACKARD ENTERPRISE – Moral Code: The Ethics of AI. https://www.youtube.com/watch?v=GboOXAjGevA&t=104s

Bloomberg Quicktake. (2018, September 27). It’s Getting Harder to Spot a Deep Fake Video. https://www.youtube.com/watch?v=gLoI9hAX9dw

Floridi, L., & Cowls, J. (2019). A Unified Framework of Five Principles for AI in Society. Harvard Data Science Review. https://doi.org/10.1162/99608f92.8cd550d1

Kearns, M., & Roth, A. (2020). The ethical algorithm: The science of socially aware algorithm design. Oxford University Press.

This is how AI bias really happens—And why it’s so hard to fix. (n.d.). MIT Technology Review. Retrieved March 20, 2021, from https://www.technologyreview.com/2019/02/04/137602/this-is-how-ai-bias-really-happensand-why-its-so-hard-to-fix/

You thought fake news was bad? Deep fakes are where truth goes to die. (2018, November 12). The Guardian. http://www.theguardian.com/technology/2018/nov/12/deep-fakes-fake-news-truth

It is Still for the Specific Tasks

Google assistance or other virtual assistants is “like a shortcuts to parts of app” (App Actions Overview | Google Developers, n.d.). I can activate the Google assistance by saying “Hey Google” and ask it to play a movie on my phone. In addition, we can also speak to it to book a restaurant or add a memo. Outside the black box of Google assistance, we can see that user activate the assistance and give it an unstructured command. Then the Google assistance analyzes the words and return order to specific apps to get the right answers or actions.

Fig1. data flow outside the blackbox – Source from App Actions Overview | Google Developers, n.d.

What’s in the black box? First, the questions or commands spoke by users are transformed into text (human representations). This process is called Automatic Speech Recognition (ASR). The user’s sound will be first stored in FLAC or WAV files and transmitted to Google’s server system. In the system, the data will be undergo signal processing and feature extraction by ASR and then encoded into vectors. Then ASR uses the trained acoustic model and language model to obtain scores respectively, combines these two scores to perform a candidate search, and finally gets the language Recognized result. After decoding the result, we finally get a text corresponding to the voice.

Second, since the users’ query might be unstructured, the text should be changed into a structured query and classified to the right model. By the way, unstructured means people have many different ways to ask for a same things. For example, “how’s the weather today” and “what is today’s weather forecast” both ask for the same information, but because of the many reasons, the way to ask questions is different. For this part, the NLP will use language pattern recognizers to map text with vocabulary databases, get the semantic matching and rank all the candidates to find the most likely matching. After that, the Google assistance can match the result to the specific task model like domain models, task flow models or dialog flow models.

Fig2. NLP procedure –  Source from Gruber et al., 2017

Third, return output depends on the models results. “When a user’s query matches the predefined pattern of a built-in intent, Assistant extracts query parameters into schema.org entities and generates an Android deep link URL” (App Actions Overview | Google Developers, n.d.). In other words, based on the users’ commands the Google assistance will return results which people can understand to meet the requirement. If you want to watch adventure movie, it might activate the Netflix app or just give you a list of adventure movies. The difference of output depends on whether the Netflix app use the API with Google assistance. It is worth mentioning that Google duplex can help users book a restaurant or something like that by automatically talking to shop assistants with a phone call. “At the core of Duplex is a recurrent neural network (RNN) designed to cope with these challenges, built using TensorFlow Extended (TFX)” (“Google Duplex,” n.d.).

In short, though the Google assistance or other virtual assistants show like a human in some way, which means you can talk to it, you can ask it to do something only human can do before, it is still designed for specific tasks. It just recognizes and classifies people’s commands and follow different models to finish the tasks.

 

Question:

What is the difference between BERT and Google duplex? BERT is used for the Google search, but it seems that its effect is similar to the duplex in some way.

 

Reference

App Actions overview | Google Developers. (n.d.). Retrieved March 16, 2021, from https://developers.google.com/assistant/app/overview

Conversational Actions. (n.d.). Google Developers. Retrieved March 15, 2021, from https://developers.google.com/assistant/conversational/overview

Google Duplex: An AI System for Accomplishing Real-World Tasks Over the Phone. (n.d.). Google AI Blog. Retrieved March 16, 2021, from http://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html

Gruber, T. R., Cheyer, A. J., Kittlaus, D., Guzzoni, D. R., Brigham, C. D., Giuli, R. D., Bastea-Forte, M., & Saddler, H. J. (2017). Intelligent automated assistant (United States Patent No. US9548050B2). https://patents.google.com/patent/US9548050B2/en

Speech-to-Text basics | Cloud Speech-to-Text Documentation. (n.d.). Retrieved March 16, 2021, from https://cloud.google.com/speech-to-text/docs/basics

Think of things backwards-How does Google Translate work

The machine translation can be considered as a simple model. We input some words or sentences, the machine analysis the input, transfer the input and then generate the output, words or sentences in other languages. When we do not apply the Neural Network to machine translation, we usually use three architectures, direct, transfer and interlingua for machine translation which do not involve probability and statistics. In fact, “GNMT did not create its own universal interlingua but rather aimed at finding the commonality between many languages using insights from psychology and linguistics” (McDonald, 2017).  Google translate belongs to the statistical MT. When comes to the statistical MT, “All statistical translation models are based on the idea of a word alignment. A word alignment is a mapping between the source words and the target words in a set of parallel sentences”, according to the Speech and Language Processing.  Google translate uses the bidirectional RNN to align the input and output. Firstly, it encodes the input sentence into vectors by one RNN for the input language which is used for encoding. Then the vectors will try to be mapped by many vectors which represent words in language of output (actually, words here are still vectors) to find which alignment is the best. It is just like what Speech and Language Processing wrote “think of things backwards”. The task here is to find the hidden output vectors which can generate the input vectors. And from this step we can know that it is a supervised machine learning. After the mapping or alignment, the match vectors will be decoded into words in output language.

Question:

Does the google translate use English as a bridge to link with other language, like Chinese and Japanese?

How does the statistical machine translation deal with syntax and semantics, use the probability and statistics to skip these kinds of problems?

References

A Neural Network for Machine Translation, at Production Scale. (n.d.). Google AI Blog. Retrieved March 9, 2021, from http://ai.googleblog.com/2016/09/a-neural-network-for-machine.html

CS Dojo Community. (2019, February 14). How Google Translate Works—The Machine Learning Algorithm Explained! https://www.youtube.com/watch?v=AIpXjFwVdIE

Jurafsky, D., & Martin, J. H. (2000). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall.

McDonald, C. (2017, January 7). Ok slow down. Medium. https://medium.com/@chrismcdonald_94568/ok-slow-down-516f93f83ac8

Simplify the problem through the layers

The article is to solve a problem what makes a good selfie. The features to tell whether a selfie is good or not are abstract representations, which means hard to describe. We are easy to say this is a good selfie by our feeling, while we are difficult to conclude what makes a good selfie. It just like we can figure out object ‘A’ in different kinds of handwriting, but we cannot explain well why this is an A. Luckily, the Deep Neural network is good at it, to find rules and features hidden the class and apply them to new instances. Actually, Karpathy considered the problem what makes a good selfie as a tow-class classification problem how to divide selfies into good class and bad class, which is easier for Deep Neural Network to solve.

To deal with the problem, the first step is to acquire data suitable for the ConvNet. In the video How to make an AI read your handwriting (LAB): Crash Course Ai #5, to let the machine understand the handwriting correctly, he should first digitize the data and then make the data similar to the letter in the EMNIST, a training set. Though the selfie data do not need to be done like that since it is digital, it still has some restrictions like the images should contain at least one face.

The next step is feature extraction. Like the discrimination of nuts and bolts is on the area, the discrimination of apples and bananas is on the circularity, the classification of selfies also depends on different features. Maybe from a human perspective, there is a big difference between distinguishing fruits and distinguishing between good and bad selfies, while for the machine, there is no essential difference. It’s just that the complexity of the distinction may require a lot of filters, not just about circularity or size. As for how to choose a dimension or filter, it is tried out by the machine through training. Like the article said, random filters depend on the results, mathematical process for changing all filters. The data goes through the filters and finally becomes some kinds of values for output.

On the other hand, the selfies as input will be considered as pixels and go through like input layer → hidden layer → output layer. In my understanding, hidden layers is a set of layers. There are many filters in each layer for particular task, or we can say each filter will only be excited about specific and corresponding feature. For the image itself, it will be segmented to isolate different objects from each other and from the background, and the different features are labeled. And filters will detect whether the data have corresponding features. It seems like divided the images into many pieces and filters will answer yes or no (binary) to determine whether it will output some value to next layers or filters according to image’s feature. (But to be honest, I feel a little confused with the concepts of layer, column, filter, object and feature)

Questions:

1 What’s the invariance of selfie image? Can you give some examples?

2 It seems that features of good selfies in the article are concluded through the results by the author instead of the machine. Is it possible that the standard of machine in good selfie is different from human’s conclusion? Because the machine just deals with pixel and may do not understand like Face should occupy about 1/3 of the image. For example, in the article there is one selfie that the machine just got rid of the “self” part of selfie for a machine version good selfie.

3 The article said t-SNE is a wonderful algorithm and it takes some number of things and lays them out in such way that nearby things are similar. I am curious about how the t-SNE deal with movies. Compared with selfies, the movies are dynamic and have a larger amount of data.

References

Alpaydin, E. (n.d.). Machine Learning. MACHINE LEARNING, 225.

Dougherty, G. (2013). Pattern Recognition and Classification. Springer New York. https://doi.org/10.1007/978-1-4614-5323-9

Unicode and DBMS in levels — Fudong Chen

Unicode system is a coding table which links all the written characters of any language to binary codes one by one. But these specific codes cannot be stored and expressed as fonts or characters of language if there is no encoding method to bridge the computer and the Unicode. The problem of black squares and gibberish in text file is very common when it comes to Chinese words or Japanese words. For example, when I downloaded a Japanese video game, it usually has a introduction and description of the games written in txt file. But in most cases, the txt file which should be in Japanese will show black squares and meaningless gibberish. That’s because I do not have the specific transformation for Japanese or my default encoding method does not fit the txt file.

The Unicode system can be managed in levels:

First, application levels. The data in this level is the character shapes to pixel patterns on the digital screens. The data we input with peripherals like keyboards will be translated by Unicode system and finally showed on the screens as string.

Second, logic and language levels. The data is the characters and corresponding codes. The Unicode is a bridge links characters and binary codes. Every specific characters will have their own binary codes one by one. The text input by computer programs can be translated to its own codes and then translated again into representations of the computer.

Third, physical level. The data is bit units stored in disk or RAM. The Unicode codes will be translated into specific bit units by encoding method like Utf-8 so that they can be stored in the computer. Actually, the characters and the bit units is not in one-to-one correspondence. Different characters can be stored in different bit units with different encoding method.

In short, characters(users) ⇌ Unicode ⇌ encoding method(UTF-8) ⇌ bytes ⇌ disk, network, RAM

 

Refers to the DBMS, I do not have experience of it, but I can try to explain the system in levels:

First, application level. Data is what we see and input through the applications which are designed for the users and ask users to input specific format data through a data entry form. This kind of data would be transmitted to the DBMS and then be translated by DBMS into data packets to the database. The packets from database to the DMBS will also be translated into the information people can understand directly as a result and then shown on the applications. The result might be a data form or some specific error signs like wrong input.

Second, system level. Data is SQL statements and code structures of computers. The parser and grant checking will check the SQI statements from application. Then the data will be transmitted to semantic analysis and query treatment for understanding and classification. The access management, concurrency control and recovery mechanism will work according to the types of data and send the instruction to the database.

Third, physical level. Data is bit units. The system distinguishes different bit units in representations, identifies the access types, and find the matched stored placed in database according to the data types and system file organizations.

Question:

First, metadata is considered as the data about data. I am confused about the statement. What’s the difference between metadata and the data describing the rules how to use data in the system? Does the latter belong to the former?

Second, in the reading, I learn about how the Unicode system showed the characters on the screen. But I find that the example in the introduction to data and database that it uses 6 characters to constitute a character on the screen. So can I consider that the characters in language are not one by one corresponded to the Unicode?

Reference

Buckland, M. K. (2017). Information and society. The MIT Press.

Irvine, M. (n.d.). CCTP-607 Universes of Data: Distinguishing Kinds and Uses of “Data” in Computing and AI Applications. 9.

Kroenke, D. M., Auer, D. J., Vandenberg, S. L., & Yoder, R. C. (2017). Database concepts (Eighth edition). Pearson.

Thoughts on the signal transmission theory of information – Fudong Chen

The signal transmission model is similar to a communication system. It has five essential elements:

-Information source is a source like person or machine who input different kinds of message.

-Transmitter is to encode or tokenize the information mentioned above, into specific pattern that fit the independent medium.

-Channel is the independent medium. Actually the information in the channel is unobservable, but we can control patterns of electrical current for signals so that the signals could be measurable.

-Receiver is to decoding information from the signal or we can say invert the operation of the transmitter.

-Destination is a person or machine who receive the decoded information.

In the theory, one important matter is the design of the transmitter and the receiver which are related to the method for encoding and decoding.  In addition, noise is also a factor that the theory focuses on. To overcome the noise source through the transmission and enable error correction, the model adds redundancy, namely uses extra symbols.

However, the model does not include the meaning. When Shannon built the model, he tried to eradicate the meaning of the message and focused on the engineering problem without meaning or semantics which are made by human’s collective practice or shared ideas.  When transmission, the information will be converted into bits when transmission and bits could not become information, let alone the meaning, until structured in a encodable pattern and output to human interpretable representations. What’s more, meanings, values, and interpretations are not physical properties or features of a symbolic medium (not an electrical structures); they are inferences and correlations made by symbol using communities. In other words, meaning cannot be in the channel or other physical things. The model only provides an essential abstraction layer in the designs of all electronic and digital systems which the meaning could not be described in.

Like Newton bridged the physics and the formula, Shannon made a bridge between information and uncertainty, entropy and chaos.  The bridge made the information into quantities, suitable for use in mathematical formula and finally link the human logic and symbolic values with electronic media. In other words, the theory forms a semiotic system for digital electronic data representation. In the system, representation, or instance, will be tokenized (encoded) and re-tokenize (decoded). However, as I mentioned above, the process only provides abstraction layer in the digital system, but not includes the interpretations of other symbolic structures, like language and graphs. Information theory cannot provide different kinds of abstract layers and cannot interpret instances unrelated to digital things. Therefore, it can only be a subsystem of the whole sign and symbol systems.

Reference

Irvine, Introducing Information Theory: The Context of Electrical Signals Engineering and Digital Encoding (2021).

James Gleick, & OverDrive, I. (2011). The Information. Knopf Doubleday Publishing Group. http://api.overdrive.com/v1/collections/v1L2BowAAAC4HAAA1k/products/d46545f2-0229-430c-b61f-314458ac6ed1

Human controls the AI –Fudong Chen

In my undergraduate thesis, I used a mood recognition tool to collect and analyze comments below the videos to find the audience’s emotion towards the videos’ topic, so I am interested in the natural language processing. The method of the bag of words is a really classical method of machine learning, building a dictionary with words, transforming the text into a specific vector so that the computer can understand and setting some words and rules to make the result more accurate. Outside the black box, we can just find the data in and the result out, while inside the black box, we can see the design and idea of human through the process of machine learning. It’s similar to the idea of the article of Johnson and Verdicchio. The autonomy of AI is limited by designer. Although we do not know how the AI deal with the data, but we can limit the result by the analog input and the actuators.

When come to the topic between AI and designer, it seems like that the article of Johnson and Verdicchio considers AI as a type of tool. It points out that the AI discourse neglects human actors and human behavior and emphasizes the effect of designer which can limit the AI. The designer should be responsible for AI. But I am concerned whether this statement ignores the users of AI. In reality, more and more AI are open to individual. When we pay attention on the responsibility of AI’s designer, should we also consider the responsibility of AI’s users? For example, the mood recognition can be used to determine customers’ feeling about a product, but it can also be used to monitor the public opinion.

 

Reference

Alpaydin, E. (n.d.). Machine Learning. MACHINE LEARNING, 225.

Johnson, D. G., & Verdicchio, M. (2017). Reframing AI Discourse. Minds and Machines, 27(4), 575–590. https://doi.org/10.1007/s11023-017-9417-6

What I am afraid of AI – Fudong Chen

After finishing the introductory readings, there is a topic that impressed me, the evolution or the change of AI’s goals and aims.

AI has two main aims. One is technological: using computers to get useful things done. The other is scientific: using AI concepts and models to help answer questions about human beings and other living things. In my view, Alan Turing’s idea, Turing machines, belongs to the scientific one, though it was an abstract mathematical idea initially. So at least at the very beginning, AI was used to adapt and finish human beings’ goal and purpose. But throughout the history of AI, there are a large amount of people dread the AI since they are afraid that AI could be evolved to replace humans.

Actually, I am not afraid about this, since most of the AI now are still empirical and depend on statistics, namely they only have the ability to predict near future by the past data. But I do afraid the AI in some way, because I found a stereotype among the readings like a user can access data elsewhere or it’s an era of democratization of digital technology.

However, in the reality, algorithms and data have become assets of technology companies. Those who master big data seem to be more like AI that can predict the future. It is undeniable that business or commerce is usually expected to be used to provide private services to every single customers. Data allows companies to understand individuals better than individuals themselves, and the companies can even predict and show the interests and products to individuals that they do not know before. But what will the world become if this predict ability is used elsewhere? When the AI’s goals change, from helping people do useful things to using data to make money, which seems to be an inevitable result of commercialized, how do we do?

In addition, as a student without strong mathematic background, when I learn about machine learning, I found I just learn and use a black box. Data starts to drive the operation; it is not the programmers anymore but the data itself that defines what to do next. I just input data to get results, analyze the result and sometimes I even do not need to analyze it. I do not understand what happened in the black box. I would like to know what I can get with learning AI and machine learning,

Boden, M. A. (2016). AI: Its nature and future (First edition). Oxford University Press.

Ethem Alpaydin. (n.d.). Machine Learning.

Wooldridge, M. J. (2021). A brief history of artificial intelligence: What it is, where we are, and where we are going (First U.S. edition). Flatiron Books.