Author Archives: Terry Johnson

De-blackboxing the Role of Machine Learning in Computational Propaganda

Hans Johnson

De-blackboxing the Role of Machine Learning in Computational Propaganda


The COVID-19 pandemic has brought about drastic societal changes. One of the most evident changes is increased time spent at home. Consequently, many indoor forms of entertainment have witnessed substantial growth in popularity, for example, online video streaming, video gaming, and particularly social media use.1  Unfortunately, increased traffic on social media has inadvertently magnified exposure to information operations (disinformation). Information operations come in many different forms, yet one of the most prolific in social media has been computational propaganda. Within this subsect of disinformation, machine learning is used to full effect to amplify influential information, incite emotional responses, and even interact with legitimate users of social media platforms.2 There is a combination of machine learning and pattern recognition techniques utilized in this process, several of the most eminent being NLP (Natural Language Processing) and styleGAN (Generative Adversarial Networks). This research project will,1) Give a brief history in the evolution of propaganda and how historical and modern propaganda differ in scope, 2) Provide a foundational understanding of NLP and styleGAN, and 3) Describe how NLP and styleGAN is used to disseminate information, or otherwise amplify it.


Propaganda has been used throughout human history by various state and non-state actors to influence human behavior to achieve a desired outcome. 3 Propaganda can be seen or heard in symbols, (images, words, movies, music, etc.). However, what separates propaganda from regular human discourse is it’s deliberateness. Propaganda is at its core, the deliberate creation or alteration of symbols (information) to influence behavior. 4 This is why propaganda in the digital age is so troubling. After all, computing is the structuring, management, and processing of information, (in what is now vast quantities).5  

The mass production of influential information and symbols began with the printing press. By the 1890’s, a single issue newspaper numbered over one million copies, allowing media to reach larger audiences than ever before. 6 Newspapers were known to influence public opinion, particularly leading up to, and during times of war. The cartoon depicted below was an editorial published in the Pennsylvania Gazette in 1754, which helped incite resistance in British colonies against French colonial expansion in North America. 7 

In the late 19th century, the Spanish-American war was agitated by the newspaper moguls William Randolph Hearst and Joseph Pulitzer, who began publishing what was known as “yellow journalism.” 8  This form of journalism published crude and exaggerated articles which were meant to sensationalize information, and otherwise promulgate emotional responses in viewers. The illustration in the newspaper below exhibits how false information can travel faster than the truth. In 1898, the USS Maine sank due to an explosion of unknown origins. Yet, before a formal investigation was conducted, newspapers circulated claiming the boat had sank due to a bomb or torpedo originating from the Spanish navy. An investigation at the time concluded the boat was sunk as a result of a sea mine. However, in 1976, a tertiary investigation proved the boat sank as a result of an internal explosion. 9 If such information was available at the time, the war may never have happened.  

A turning point for propaganda came when real images first began to appear in newspapers. Moments captured in real time have a profound impact on the human psyche. In 1880, the first halftone was printed in the Daily Graphic, beginning what is now known as photojournalism.10 The picture below is the half-tone printing in the Daily Graphic of New York’s shanty town. 

During World War 1, posters were the primary transmitter of propagandist material. 11 Although, the information was often originating from the targets populations own government. A combination of image and text sends a powerful message, the simple phrases directing the viewer to feel a certain way about an image. The poster on the right is depicting German troops in WWI committing what looks to be war crimes, coaxing viewers to join the military.12 The use of posters, cartoons, and images continued throughout the early half of the 20th century, and continues to this day. 

As we transition into the digital age, information reaches audiences across the world at unprecedented speeds, and it seems information has outpaced society’s capacity to process it. As literacy rates drastically increased in the past two centuries, so too did access to information, and consequently, propaganda. What is more troubling however, is literacy rates and tertiary education completion among adults has not increased proportionately. While nearly 100% of Americans aged 15 or older are literate, only approximately one-fourth receive tertiary education. 13 14  This creates a serious conundrum, as information proliferates.  However, at the same time, society’s capacity to  process that information in a comprehensive and objective manner is insufficient and detrimental. Below are two graphs, one depicting higher education completion in adults, the second showing US household access to information technology. 

What is even more concerning is the progressing capability of entities to target certain demographics with specific and generative information. The capacity to profile groups began early in the 20th century, by means of surveys which collected data on public opinion, consumer habits, and elections.15  In 1916, the Literary Digest began public polling on presidential candidates.16  This practice was further augmented by the Gallup Polls in the 1930s, which took into account more than just public opinion on elections, including the economy, religion, and public assistance programs.17  Understanding public sentiments was an important step in the evolution of influencing human behavior. 

Currently, human behavior can be categorized, documented, and influenced, based on our most intricate and personal habits. This is made possible as a result of  increased storage capacity in cloud infrastructure, machine learning, and deep neural networks. Furthermore,  most of this information is often gathered without user consent or knowledge.

Although this data is not always used to simply influence consumer habits, it can be used to disrupt social cohesion, instill distrust in democratic institutions, and incite violence based on race, religion and political disposition. Malicious entities, many originating from Russia, have infiltrated social media circles in the United States, creating false personas which present as activist groups of various motivations. Much of this intentional malicious activity is made possible through NLP and styleGAN.18  NLP is likely used in several ways by propagandists, most importantly, to translate propaganda from one language to another. Secondly, semi-automated chatbots are trained to interact with legitimate users. And with this, we will provide a base understanding of NLP and styleGAN. 

Natural Language Processing (NLP) 

NLP is essentially the intersection between linguistics and computer science.19 Natural written and spoken languages are encoded as data within a computer via acoustic receptors or typed text, then decoded by a program which places this data through a Deep Neural Net (DNN). This DNN routes the data through hidden layers of mathematical algorithms, and injects the data into a statistical model which produces the most accurate representation of said data. This method of machine learning has improved over the years, with IBMs statistical word level translations being some of the first NLP software. 

IBMs Word Based NLP

IBM’s software was successful for three reasons,  it could be improved, modified, and revised.20 This statistical model based ML began in the 1960s, and utilized rule-based algorithms known as “word tagging.”21 Word tagging would assign words grammatical designations like nouns, verbs, or adjectives, in order to construct functional sentences. Yet, as one could imagine, words are used in a multitude of ways in the English language, which created limitations.  An issue with IBM’s model was, however, the translation of single words, rather than entire sentences. Another issue which plagued IBM’s early statistical models was its inadequate access to data.22 Whereas, now in machine learning, the opposite is the case. There is such a multitude of data, that it must be cleaned and carefully chosen to meet certain needs. The diagram below is depicting IBM’s statistical model process.23

Google’s GNMT

Google’s Neural Machine Translation (GMNT) system is making strides in increasing the accuracy of machine translation and speech recognition in several ways. First, GNMT encodes and decodes entire sentences or phrases, rather than word to word translation like IBM’s early NLP.24  Secondly, GNMT employs Recurrent Neural Networks (RNN) to map the meaning of sentences from one input language to an output language. This form of translation is much more efficient than word to word or even phrase based methods. Below is an example of GNMT actively translating Chinese text, a language historically difficult to translate in NLP software. This is shown directly from Google’s AI blog is below :25

Yet, the sentence based GNMT is coming very close to decoding complex Chinese text. The concept of translating entire sentences to produce meaning in another language was once thought of as early as 1955:

“Thus may it be true that the way to translate from Chinese to Arabic, or from Russian to Portuguese, is not to attempt the direct route, shouting from tower to tower. Perhaps the way is to descend, from each language, down to the common base of human communication—the real but as yet undiscovered universal language—and then re-emerge by whatever particular route is convenient” 26

Other developments in NLP are numerous, and one of them is particularly concerning as it pertains to propaganda, this being GPT3.


Generative Pre-trained Transformer 3 or “GPT3” is the third version of a text generator which utilizes machine learning and DNN to produce and predict speech. The capabilities of GPT3 can include answering questions, composing essays, and even writing computer code.27 Yet, unlike GNMT, IP surrounding GPT3 is kept mostly secret, with the exception of some application programming interfaces. The GPT utilizes over 175 billion parameters in its weighting system to determine the most plausible course of action, more than ten times the number of its next highest competitor.28 GPT3 with its Q & A feature can answer common sense questions, setting it apart from other AI software as seen here29

Yet for all its practical applications, there are also some serious deficiencies. The text it generates can sometimes be far from the desired outcome.  At times, bias becomes evident, and even racially discriminatory.30 Additionally, instead of correcting some of the shortcomings in previous versions, GPT3 simply offers a wider range of weighting parameters.

Generative Adversarial Networks

Generative Adversarial Networks consist of two DNN’s, one is the generative and the other  discriminative. The generative creates a form of media, whether this be images, soundwaves or text, and this is analyzed by the discriminative. The adversary compares this media to a real image, and if deemed real, the generative network wins; if it is considered fake, the generative runs its algorithms again, then produces more images.  


StyleGAN is a derivative of Generative Adversarial Networks which is able to produce high definition artificially created images of human faces which appear real to an untrained eye. StyleGAN generates more unique images than other GAN generated images.31  Here are some examples of human faces produced with StyleGAN:32

Machine Learning in Computational Propaganda

One of the many benefits of open source information and free software, is enriching  the lives of the less privileged segments of society, and this, even though an unintended consequence of open source technology is its use by nefarious entities. Currently, Google Translate, GPT2 and StyleGAN are open source. This means malicious actors can utilize the technology with virtually no cost in R&D or use. The possible applications of such technology relating to propaganda are many. 

Role of NLP

NLP is perhaps the most concerning of the machine learning techniques which can be utilized by foreign entities. One of the many barriers which previously limited the spread of information historically, was language. Now with GNMT becoming progressively more accurate, malign actors can translate vast sums of foreign languages more efficiently and quickly than ever before. This serves a dual purpose, as GNMT can be used by foreign actors to send more complex messages which are less distinguishable from the target language. Secondly, it becomes easier to research divisive topics in various regions of the world. Below are two propaganda ads originating from Russian sources from 2015 – 2016.33 Both ads cover politically divisive topics, one being LGBTQ rights, and the other police brutality against minorities.  

The text contained in the previous ads is crudely translated, likely indicating NLP was used for their production. It is possible crude translation is why these ads were discovered in the first place. Many of the Russian ads from 2015-2017 released by the US Senate Intelligence Committee contain frequent grammatical and translation errors.34 Furthermore, most, if not all the ads relate to race, religion, sexuality, or politics. 

Role of GPT2

GPT2 can be utilized by malign actors in multiple ways. One possible use is to train chatbots to interact with legitimate users in social media platforms.35 The Q & A feature of GPT is what makes such interactions possible by directing chatbots to comment on specifically tagged posts, popularize hashtags, and potentially respond to emphatic replies from users.36 Secondly, GPT2 can boost the relevance of posts which fake accounts are trained to popularize. Lastly, GPT2 can be used to create fake profile biographical information to afford more legitimacy to fake accounts. 

Role of StyleGAN

The role of styleGAN in computational propaganda is to add legitimacy to fake profiles.37 As seen above in the collage of AI generate photos, real from fake can be difficult to differentiate. Adding a human face to profiles is particularly useful for creating false personas whose mission is to produce content to be amplified by either autonomous or semi-autonomous accounts. Below is a fake twitter account generated entirely autonomously:38

The limitations of NLP and styleGAN in Propaganda

The R&D associated with NLP and styleGAN is complex, but its use in spreading information is simple; create false personas, like, share, comment and react. While the applications of NLP and styleGAN are numerous for the proliferation of fake news, what is more concerning, is the amplification of factual, yet divisive news. By simply reinforcing already existing divisions, computational propaganda self-proliferates. Propaganda is most successful concerning topics which are already extreme points of contention.

“Propaganda is as much about confirming rather than converting public opinion. Propaganda, if it is to be effective must, in a sense, preach to those who are already partially converted”  – Welch, 2014 39

The previous statement has become particularly evident in the past few years in American politics, concerning the sense of tribalism in race, religion, and sexuality.40 Take, for example, the following Russian propaganda ad: 

In hindsight, the ad does not seem to send such a divisive message. After all, most could get behind supporting homeless veterans. However, what differentiates this ad from the previous, is its subtlety. The ad received nearly 30,000 views and 67 clicks, far more than the police brutality and LGBTQ ad, likely because it was not identified as early as its counterparts. Secondly, if one is to take note of the date on the ad, it was created not long after the Baltimore riots in the aftermath of Freddie Gray’s death.41 The ad also is tailored to target African-American audiences. The timing of information appears to be just as important as the message, and with modern technology, timing is almost never an issue. 


Machine learning plays a fundamental role in amplifying  information, but a limited role in creating it. Successful conspiracy theories require time to fabricate, and even more importantly, human, rather than artificial, intelligence.42 In fact, the overuse of AI in spreading information can be detrimental to an operation, as it flags the associated accounts or posts due to over activity.43 After analyzing a multitude of Russian propaganda ads between 2015-2017 released by the Senate Intelligence Committee (provided by social media platforms), it became apparent the ads which were discovered contained poor grammar. This may suggest the gap in data indicates foreign entities are using machine learning to analyze which ads are taken down and which remain.44  Additionally, a rather obvious bias in the data, it consisted almost entirely of sponsored ads paid for in Russian rubles, which is easily trackable. What was also absent from the released data, was a very modern influential form of propaganda, this being memes. In recent years, the Russian Internet Research Agency has garnered a strong following in its troll accounts on Instagram, which reach millennial audiences of varying demographics utilizing memes and pop culture, memes which are likely curated entirely by individuals, not AI.45 The human element of propaganda remains just as relevant as it did in the 20th century, and will likely continue well into the 21st century. 

End Notes

  1. Samet, A. (2020, July 29). How the coronavirus is changing us social media usage. Insider Intelligence.
  2.  Woolley, S., & Howard, P. N. (2019). Computational propaganda: Political parties, politicians, and political manipulation on social media.
  3.  Smith, B. L. (n.d.-a). Propaganda | definition, history, techniques, examples, & facts. Encyclopedia Britannica. Retrieved May 11, 2021, from
  4.  Smith, B. L. (n.d.-a). Propaganda | definition, history, techniques, examples, & facts. Encyclopedia Britannica. Retrieved May 11, 2021, from
  5. What is computing? – Definition from techopedia. (n.d.). Techopedia.Com. Retrieved May 11, 2021, from
  6. Newspaper history. (n.d.). Retrieved May 11, 2021, from
  7. The story behind the join or die snake cartoon—National constitution center. (n.d.). National Constitution Center – Constitutioncenter.Org. Retrieved May 11, 2021, from
  8. Milestones: 1866–1898—Office of the Historian. (n.d.). Retrieved May 11, 2021, from
  9. Milestones: 1866–1898—Office of the Historian. (n.d.). Retrieved May 11, 2021, from
  10. The “daily graphic” of new york publishes the first halftone of a news photograph: History of information. (n.d.). Retrieved May 11, 2021, from
  11. Posters: World war i posters – background and scope. (1914). //
  12. Will you fight now or wait for this. (n.d.). Retrieved May 11, 2021, from //
  13. Roser, M., & Ortiz-Ospina, E. (2013). Tertiary education. Our World in Data.
  14. Roser, M., & Ortiz-Ospina, E. (2016). Literacy. Our World in Data.
  15. Smith, B. L. (n.d.-b). Propaganda—Modern research and the evolution of current theories. Encyclopedia Britannica. Retrieved May 11, 2021, from
  16. The “literary digest” straw poll correctly predicts the election of woodrow wilson: History of information. (n.d.). Retrieved May 11, 2021, from
  17. Inc, G. (2010, October 20). 75 years ago, the first gallup poll. Gallup.Com.
  18.  P. 4 Martino, G. D. S., Cresci, S., Barrón-Cedeño, A., Yu, S., Pietro, R. D., & Nakov, P. (2020). A survey on computational propaganda detection. Proceedings of the Twenty-Ninth International Joint Conference
  19.  What is natural language processing? (n.d.). Retrieved May 11, 2021, from
  20.  P. 118 Poibeau, Thierry. Machine Translation. 1st ed., MIT Press, 2017.
  21. A beginner’s guide to natural language processing. (n.d.). IBM Developer. Retrieved May 11, 2021, from
  22. A beginner’s guide to natural language processing. (n.d.). IBM Developer. Retrieved May 11, 2021, from
  23.  P. 118 Poibeau, Thierry. Machine Translation. 1st ed., MIT Press, 2017.
  24. Le, Q. V., & Schuster, M. (n.d.). A neural network for machine translation, at production scale. Google AI Blog. Retrieved May 11, 2021, from
  25.  Le, Q. V., & Schuster, M. (n.d.). A neural network for machine translation, at production scale. Google AI Blog. Retrieved May 11, 2021, from
  26.  P. 64 Poibeau, Thierry. Machine Translation. 1st ed., MIT Press, 2017
  27. Marr, B. (n.d.). What is gpt-3 and why is it revolutionizing artificial intelligence? Forbes. Retrieved May 11, 2021, from
  28. Vincent, J. (2020, July 30). OpenAI’s latest breakthrough is astonishingly powerful, but still fighting its flaws. The Verge.
  29.  Sharma, P. (2020, July 22). 21 openai gpt-3 demos and examples to convince you that ai threat is real, or is it ? [Including twitter posts]. MLK – Machine Learning Knowledge.
  30. Vincent, J. (2020, July 30). OpenAI’s latest breakthrough is astonishingly powerful, but still fighting its flaws. The Verge.
  31.  P. 1 Karras, T., Laine, S., & Aila, T. (2018). A style-based generator architecture for generative adversarial networks.
  32.  P. 3 Karras, T., Laine, S., & Aila, T. (2018). A style-based generator architecture for generative adversarial networks.
  33. Social media advertisements | permanent select committee on intelligence. (n.d.). Retrieved May 11, 2021, from
  34. Social media advertisements | permanent select committee on intelligence. (n.d.). Retrieved May 11, 2021, from
  35.  P. 4 Martino, G. D. S., Cresci, S., Barrón-Cedeño, A., Yu, S., Pietro, R. D., & Nakov, P. (2020). A survey on computational propaganda detection. Proceedings of the Twenty-Ninth International Joint Conference
  36. Xu, A. Y. (2020, June 10). Language models and fake news: The democratization of propaganda. Medium.
  37. O’Sullivan, D. (2020, September 1). After FBI tip, Facebook says it uncovered Russian meddling. CNN.
  38. O’Sullivan, D. (2020, September 1). After FBI tip, Facebook says it uncovered Russian meddling. CNN.
  39.  P. 214 Welch, D. (2004). Nazi Propaganda and the Volksgemeinschaft: Constructing a People’s Community. Journal of Contemporary History, 39(2), 213-238. doi: 10.2307/3180722
  40. NW, 1615 L. St, Suite 800Washington, & Inquiries, D. 20036USA202-419-4300 | M.-857-8562 | F.-419-4372 | M. (2014, June 12). Political polarization in the american public. Pew Research Center – U.S. Politics & Policy.
  41. Peralta, E. (n.d.). Timeline: What we know about the freddie gray arrest. NPR.Org. Retrieved May 11, 2021, from
  42. Woolley, S., & Howard, P. N. (2019). Computational propaganda: Political parties, politicians, and political manipulation on social media.
  43. Woolley, S., & Howard, P. N. (2019). Computational propaganda: Political parties, politicians, and political manipulation on social media.
  44.  P. 4 Martino, G. D. S., Cresci, S., Barrón-Cedeño, A., Yu, S., Pietro, R. D., & Nakov, P. (2020). A survey on computational propaganda detection. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 4826–4832.
  45. Thompson, N., & Lapowsky, I. (2018, December 17). How russian trolls used meme warfare to divide america. Wired.

Issues associated with terminology in AI/ML

As we have discussed in previous modules, terminology (or buzz words) have a tendency to blackbox new technologies that would otherwise be comprehensible. Occasionally, syntax can do far worse than complicate technological concepts. A recent fatal Tesla vehicle crash resulted in two deaths, and the car was believed to be driverless by authorities. (Wong, 2021) Tesla, and in particular, Elon musk, has faced multiple controversies since Tesla’s rise to prominence as an EV producer. The problematic term, “autopilot” in Tesla vehicles, raises issues associated with syntax in marketing, and the detriments of uninformed consumers. (Leggett, 2021) 

Tesla dissolved its PR department, and seemingly uses only Musk’s tweets to push out information. (Morris, 2020) This is problematic in the sense that it removes a human interaction element in the sale of its cars, (which is also dealer-free). This begs the question, are consumers uninformed about the autonomous capability in Tesla vehicles? Is the labeling of the driver assistance feature as “autopilot” problematic? The sensationalism of Tesla may likely lead to overzealous use of their products, and can be damaging to the further development of autonomous vehicles. It is difficult to argue Elon Musk is savvy in regards to marketing, but marketing is not necessarily synonymous with public relations. 

Course Questions

The United States is falling behind in regards to regulation in multiple fields concerning technology and science, the CDA (Communications Decency Act) section 230 is an example of this. Whereas, the GDPR (General Data Protection Regulation) protects European citizens, yet has not prevented social media companies like Facebook and Instagram from operating there. (Gdpr Archives, 2021) Regulation does not need to carry the negative stigma of stifling innovation or economic growth. Several countries in Europe, to include Germany and the UK, have already made regulations concerning the use of the term “autopilot” in relation to Tesla vehicles. So why is this sort of overstep in the use of misleading syntax overlooked by US policy makers when multiple incidents have occurred? (Shepardson, 2021) A question which has persisted for me throughout the course is, are policy makers out of touch with rapidly evolving technology? What is considered too much regulation, and when is there not enough?


Gdpr archives. (n.d.). GDPR.Eu. Retrieved April 19, 2021, from

Leggett, T. (2021, April 19). Two men killed in Tesla car crash “without driver” in seat. BBC News.

Morris, J. (2020, October 10). Has tesla really fired its pr department? And does it matter? Forbes.

Shepardson, D. (2021, March 18). U.S. safety agency reviewing 23 Tesla crashes, three from recent weeks. Reuters.

Wong, W. (2021, April 19). 2 dead in Tesla crash after car “no one was driving” hits tree, authorities say. NBC News.

Why “Big” Does Not Necessarily Mean “Bad”  

Data is a term that has been in use since the 17th century, which at that time meant, “a fact given as the basis for calculation in mathematical problems.” (Data | Origin and Meaning of Data by Online Etymology Dictionary, 2021) Yet it was not until the early 21st century when the term “big data” was first introduced. (Foote, 2017) Big data seems to carry the same negative connotations as “big tech,” “big pharma” and “big government.” What differentiates big data from regular data is the “3V’s” (velocity, variety, and volume). (Kitchin, 2014, p. 1) In big data, the 3Vs are so extreme that standard statistical models become obsolete, and deep neural nets are often required to effectively analyze the data. (Alpaydin 104) To those unfamiliar with big data, it seems to represent more than just a multitude of factual information, it represents a further divide between those who seemingly have access to it, and those who do not. While in reality, big data is often used for improving the quality of life for everyday people, performing day to day activities. 

For example, an individual’s commute to work may have been made easier (and safer) by big data. By utilizing machine learning in conjunction with big data on most heavily congested roads in and around cities, we can make transportation more efficient. (Piletic, 2017) Big data in this scenario is acquired through IoT devices, car sensors, cameras and smart devices. (Piletic, 2017) The data collected is used for several important aspects of transportation, these being 1) city planning, 2) parking and congestion control, and 3) long commute times. (Piletic, 2017) 

Moreover, commuting is also made safer by the same combination of data and ML. Motor vehicle accidents are the leading cause of death in the United States for ages 1 – 54, and of the world for ages 5 – 29. (“Road Safety Facts,” 2021) While not all of these accidents can be prevented with data analytics, the numbers could be mitigated. ITS (Intelligent Transport Systems) are made possible through big data originating specially designed DAS (Data Acquisition Systems) which analyze driving behavior, patterns and posture. (Antoniou et al., 2018, p. 327)Additionally, crashes and traffic flows are analyzed with loop detector data, and MVDS (Microwave Vehicle Detection Systems). This data is used to determine the likelihood of vehicle accidents with concern for a multitude of variables, like location, vehicle type, speed, and traffic flow. (Antoniou et al., 2018, p. 325) 

While the previous examples are ways in which big data is utilized for the benefit of society, there remains some scenarios where big data is used improperly or inefficiently. However, these examples should not detract from the benefits, and should be handled on a case by case basis. Facial recognition has fallen under scrutiny due to the disproportionate false positives based on skin pigmentation. (Noorden, 2020) Additionally, big data is also used for information operations in social media platforms to incite divisiveness. (Torabi Asr & Taboada, 2019) These are all issues which can be addressed separately, without stifling the societal benefits of big data.  


Alpaydin, E. (2004). Introduction to machine learning. MIT Press.

Antoniou, C., Dimitriou, L., & Pereira, F. (2018). Mobility patterns, big data and transport analytics: Tools and applications for modeling. Elsevier.

Data | origin and meaning of data by online etymology dictionary. (n.d.). Retrieved April 12, 2021, from

Foote, K. D. (2017, December 14). A brief history of big data. DATAVERSITY.

Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), 205395171452848.

Noorden, R. V. (2020). The ethical questions that haunt facial-recognition research. Nature, 587(7834), 354–358.

Piletic, A. P. (2017, July 11). How IoT and big data are driving smart traffic management and smart cities. Big Data Made Simple.

Road safety facts. (n.d.). Association for Safe International Road Travel. Retrieved April 12, 2021, from

Torabi Asr, F., & Taboada, M. (2019). Big Data and quality data for fake news and misinformation detection. Big Data & Society, 6(1), 205395171984331.

The Advantages and Disadvantages of Monopolistic Competition in Cloud Computing

Typically, consumers in the end would stand to gain with larger cloud networks, which means more computing power for less cost. However, there remains two possible disadvantages for consumers when monopolies permeate into the cloud service industry. One disadvantage is security, and the second is unfair pricing resulting from unfair competition. A detailed overview of the advantages of cloud computing will first be discussed, followed by the disadvantages in a monopolistic environment. 

Cloud computing is made possible by mostly one feature, virtualization. (Ruparelia 5) Although, one important question to answer regarding the benefits of cloud computing to users is, what came before cloud computing and virtualization? Before cloud computing, there was the traditional server structure, which consisted of the hardware, software, OS, and the applications all in one location. (Ukessays 2015) The downsides of the traditional server structure was basically the converse of what makes up a cloud service, this being lack of elasticity, high maintenance costs, and also a lack of continuity (if one aspect of the server breaks down the rest follows). (Ukessays 2015)

  In contrast, a virtual server separates the software from the hardware, allowing seamless maintenance across multiple hosts, and consists of multiple servers whether these be data or email servers etc. The ability to scale down or up in a cloud service is also a key aspect, whereas in the traditional server structure, a maintainer would need to buy more physical hardware in the event of an overtaxed server, in the cloud, hosts are migrated to the more heavily trafficked servers. (Ruparelia 18) This is what makes cloud computing unique, and why the consumer benefits from larger networks. 

As for the disadvantages in the case of monopolistic competition, the security aspect is perhaps the most substantial concern. The malware threat to cloud services is unique in its scale, as cloud services are also unique in their scale, and this is referred to as “excessive access scope.” (Zalkind, 2016) Excessive access scope is when applications require credential authentication from three parties in a cloud network, these being the user, a third party entity, and the corporate environment. (Zalkind, 2016) This credential access gives the third party entity (in this case an application) access to the system even while the user is not actively using it, and through one user, the whole system is compromised. Some applications are built to be malware, while other legitimate applications are hijacked by malicious software, creating more avenues for hackers to gain access to a cloud system.  

As for the consideration of price gouging and unfair competition, a combination of both economic and technical understanding is required. Cloud services operate on economies of scale, and AWS is currently the largest cloud service provider by a substantial margin. (Gartner Magic Quadrant for Cloud, 2021.) Cloud service providers tend to offer multiple qualities of service based on computational time usage. (Kilcioglu & Rao, 2016, 1) As a result of most service providers utilizing the same hardware, the only way to adjust profit margins is by lowering the quality of service rather than charging higher prices for better performance, which is a potential setback for consumers. (Kilcioglu & Rao, 2016, 2) Additionally, in some regions, there is only one cloud provider who charges significant fees for switching, this allows for unhealthy monopolistic tendencies and creates the “locked in” customer dynamic. (Kilcioglu & Rao, 2016, 2)


Cloud services is not a winner-take-all market and enterprises should applaud robust competition. (2019, July 31). Diginomica.

Gartner magic quadrant for cloud: Evaluating the top six iaas providers. (n.d.). CloudBolt Software. Retrieved April 5, 2021, from

Kilcioglu, C., & Rao, J. M. (2016). Competition on price and quality in cloud computing. Proceedings of the 25th International Conference on World Wide Web, 1123–1132.

Life before cloud computing information technology essay. (2015, January 1). UKEssays.Com.

Ruparelia, N. (2016). Cloud computing. The MIT Press.

Zalkind, R. (n.d.). The cloud malware threat | network computing. Retrieved April 5, 2021, from

“How Games Will Play You” Dilemma

How Games Will Play You (Togelius, 2019)

Who it effects:

Being a lifelong gamer myself, this particular issue held special significance to me. It seems gaming has become increasingly more mainstream than when I began gaming in the early 2000s. The number of individuals who identify as “gamers” has increased from 2 billion in 2015 to almost 3 billion in 2021. (Number of Gamers Worldwide 2023) Moreover, it is a relatively cheap form of entertainment if one is to look at the hours spent playing, juxtaposed with money spent on purchasing a console and the subsequent games. Yet more and more game developers are switching from pay to play models to “free to play.” (Koksal, 2019) The quotations are to infer nothing is truly “free” especially in the gaming industry. 

Why it is a problem: 

Game developers acquire user’s spending and playing habits in the same fashion as social media and search engine companies like Facebook and Google. (Boutilier, 2015) Furthermore, game developers are targeting younger audiences who do not have a fully developed prefrontal cortex, and capitalizing on this with microtransactions in free to play models. (Uytun, 2018, p. 8) Many loot box models (mystery rewards for cash) use certain psychological factors to encourage purchasing, such as loss aversion, impulse buying, and time constraints. (Duverge , 2016) Albeit, game developers should not shoulder all the burden associated with children, microtransactions, and data acquisition. After all, parents technically have the final say in how much time their children spend online, and what it is they do.  

Questions and potential solutions:

Should game developers be treated any differently than other companies who garner large amounts of revenue from selling data? Perhaps they should, considering the target audience of some of the most popular games in the industry target minors, (Fortnite, Hearthstone, Minecraft to name a few) all of which maintain “free to play” models with revenue based on microtransactions.  One policy approach which lightly addresses the issue was the enacting of COPPA (Children’s Online Privacy Protection Rule) in 1998. (Children’s Online Privacy Protection Rule(“COPPA”), 2013) The purpose of COPPA was to protect children’s personally identifiable information from being unknowingly collected in online environments. Although much has changed since 1998, and the numbers of children gamers has dramatically increased, and parents may not fully understand how much of their children’s data is being acquired. (Benedetti, 2011) It could be argued that COPPA is an antiquated policy which needs revision, and does not adequately address the acquisition of minor’s data. Raising the age requirement on video games also is not an adequate solution to the issue, but perhaps a more detailed warning to parents, coupled with a requirement for parent’s to either deny or allow access to their kids data.  


Benedetti, W. (2011, October 11). Ready to play? 91 percent of kids are gamers. NBC News.

Boutilier, A. (2015, December 29). Video game companies are collecting massive amounts of data about you. Thestar.Com.

Children’s online privacy protection rule(“Coppa”). (2013, July 25). Federal Trade Commission.

Duverge , G. (2016, February 25). Insert more coins: The psychology behind microtransactions. Touro University WorldWide.

Koksal, I. (2019). Video gaming industry & its revenue shift. Forbes.–its-revenue-shift/

Number of gamers worldwide 2023. (n.d.). Statista. Retrieved March 22, 2021, from

The games industry shouldn’t be ripping off children | Geraldine Bedell. (2019, September 15). The Guardian.

Togelius, J. (2019, April 17). How games will play you. The Ethical Machine.

Uytun, M. C. (2018). Development period of prefrontal cortex. In A. Starcevic & B. Filipovic (Eds.), Prefrontal Cortex. InTech.


Siri’s Road to Accurate Speech Recognition

The set of operations involved in the eventual transformation of sound patterns into data with Siri begins with an acoustic wave detector, in Siri’s case, this is the M-series motion coprocessor, or “AOP” (Always On Processor).  The significance of this processor is how it does not require the main processor to be running in order to activate Siri (on mobile devices). The M-series detects the acoustic waves associated with the activation phrase “hey Siri” using MFCCs (Mel Frequency Cepstrum Coefficients) to transform the sound waves into coefficients to be then used to produce sound represented in a data form. Once these coefficients are produced, they are run through a frame buffer (RAM Random Access Memory) and transformed into pixels/bits. Once the sound has been transformed into bits, they are input into a small DNN with 32 hidden units. This small DNN uses an HMM (Hidden Markov Models) a statistical model that produces a score which in turn decides whether to activate the main processor or not.  Once the main processor is activated, a new DNN is accessed, with 196 hidden units, and this DNN also utilizes HMM to produce the most accurate interpretation of speech.

The road to accurate speech recognition has been a long process which required more crude techniques in the beginning stages of Siri. Initially, Siri required a user to manually activate her before providing commands, this allowed teams at Apple to collect data from the initial phases of Siri to be used later with remote activation. The early stages of Siri provided a speech corpus (audio file database) for which later versions of Siri to access, these larger audio file databases made the DNN’s coupled with HMM’s more accurate. Supervised standard backpropagation algorithms are used to reduce errors, and stochastic gradient descents are used for optimization of the algorithms. Siri, as with most other machine learning based programs is a work in progress, and can only be improved upon as the acquisition of data and more efficient algorithms becomes more available. 


How do bandwidth limitations effect the accuracy of speech recognition and virtual assistants?

I understand hidden units are mathematical algorithms used within a DNN, but how are they separated, or are they separated at all? Why are the number of hidden units in a DNN incremented in layers? 

In Apple’s breakdown of how Siri works, it glanced over the lower levels of sound wave input into the device, and did not breakdown how the sound waves become data, it simply states “acoustic input” but what hardware in the phone transforms the sound waves into an electrical signals? 

At what stage in the process is sound transformed into pixels then transformed into text, and does this involve interaction with NLPs in conjunction with the speech recognition processes?

Lastly, it is still unclear to me what purpose the framebuffer serves in the operations leading to speech recognition?


Acoustic model. (2020). In Wikipedia.

Backpropagation algorithm—An overview | sciencedirect topics. (n.d.). Retrieved March 15, 2021, from,solution%20to%20the%20learning%20problem.

Framebuffer. (2020). In Wikipedia.

Hey siri: An on-device dnn-powered voice trigger for apple’s personal assistant. (n.d.). Apple Machine Learning Research. Retrieved March 15, 2021, from

Hidden layer. (2019, May 17). DeepAI.

Mel-frequency cepstrum. (2020). In Wikipedia.

Paramonov, P., & Sutula, N. (2016). Simplified scoring methods for HMM-based speech recognition. Soft Computing, 20(9), 3455–3460.

Stochastic gradient descent. (2021). In Wikipedia.

Deep Neural Nets and the Future of Machine Translation

GNMT (Google Neural Machine Translation) utilizes a E2E / DNN (End 2 End / Deep Neural Net) to perform encoding and decoding at a much faster rate than traditional statistical models utilized by other machine translation methods. (Le and Schuster 2016) For example, IBM’s statistical word level translation models which had to first split the sentence into words, then find the best translation for each word. (Poibeau 117) The GNMT translates entire sentences rather than singular words, which takes into account more ambiguities associated with spoken and written language for a much more accurate translation. (Le and Schuster 2016)

While GNMT is essentially based on a statistical model, it is on a much grander scale than previous methods. (Poibeau 147) The training with GNMT is much more efficient than the previous statistical models which lacked the “learning” aspect, and required manual fixes upon discovery of design flaws. (Poibeau 147) A huge difference between neural networks now, and the old statistical models, is the multidimensional aspect between encoding and decoding, meaning, “…higher linguistic units, like phrases, sentences, or simply groups of words, can be compared in a continuous space…” (Poibeau 149) 

However, GNMT still requires important interaction during the training process, because it is necessary for the neural network to target the most relevant data, rather than waste computational resources on unnecessary data. (Roza 2019) The adding of features also becomes an issue, if new features are added, the old DNN becomes obsolete. (Roza 2019) Another issue facing DNN is the massive amounts of data required for them to produce the best results, however, it seems the amounts of available data is becoming less an issue than the organization of such data. 

Some pressing questions which came to mind during the readings were:

  1. Why is Chinese (mandarin) lagging behind other languages in the GNMT translation quality? (Le and Schuster 2016) Is it possible that data on Mandarin is less accessible to the DNN than other languages?  
  2. How fluid is DNN learning? Does the training stop when the DNN is actively being implemented? 
  3. How exactly are statistical models different from DNN’s? Is it simply the case that DNNs continue to learn whereas statistical models utilize a less fluid database? 


Le, Quoc, and Mike Schuster. “A Neural Network for Machine Translation, at Production Scale.” Google AI Blog, Accessed 9 Mar. 2021.

Poibeau, Thierry. Machine Translation. 1st ed., MIT Press, 2017.

Roza, Felp. “End-to-End Learning, the (Almost) Every Purpose ML Method.” Medium, 18 July 2020,

Pattern Recognition and Computing Power

Initially, it was very perplexing attempting to understand the intersection between statistics and machine learning, but this week’s materials have made this more clear. According to the CrashCourse videos assigned, one primary task of machine learning is to determine the most accurate “confusion matrix” for a given set of “labeled data.” (Machine Learning & Artificial Intelligence 2021) As more “features” are added to the matrix, the more complicated the algorithm or SVM (support vector machine) is required to determine the most accurate confusion matrix. However, what has also become clear, is while these machine learning methods are able to analyze large amounts of data and very accurately assign a confusion matrix, like medicine, this is still an imperfect science. (Alpaydin 58) No more is this evident than with the Karpathy article. 

In the Karpathy article, machine learning in relation to graphical interpretation is depicted with a database of selfies from (instagram?) a social media platform. The algorithm Karpathy used, known as t-SNE, would search selfies based on a certain set of parameters (or features) to filter what were deemed the “best” selfies. Karpathy 2015) Yet, these parameters were very limited and did not take into account the multitude of features which might culminate in what could be considered the “best.” For example, one of the parameters used when determining the quality of a selfie was the number of likes received, which is hugely subjective and does not take into account ratios of followers from male to female. Additionally, females on average interact most with other females on social media, whereas men on average are more likely to comment or like female posts. (Fowler 2017) This bias was apparent when the top 100 selfies determined by Karpathy’s algorithm were entirely female. This is likely indicative of an obstacle to overcome with machine learning, and the consideration of a multitude of feature extractions.  


In the Dougherty reading, classifications were broken down into supervised, unsupervised and Bayes decision theory. Each of these methods of classification maintained varying degrees of computing power throughout the process. My question concerns which method is the most efficient in regards to computing power? (Dougherty 19) Additionally, are the methods interchangeable or exclusive to only certain kinds of classification? 

In the Alpaydin reading, document categorization, bag of words and deep learning,  were all mentioned, and in particular in relation to social media metadata gathering. (Alpaydin 69-70) All three have been utilized in disinformation campaigns, but why is this same technology failing to halt disinformation campaigns which still ravage social media platforms? Lastly, in reference to handwritten characters, Alapydin said, “…there is still no computer program today that is as accurate as humans for this task.” (Alpaydin 58) Has this changed since 2016 when the book was written? Or is handwritten text still far from where it could be in accuracy and classification?   


Alpaydin, Ethem. Machine Learning: The New AI. MIT Press, 2016.

Dougherty, Geoff. Pattern Recognition and Classification: An Introduction. Springer, 2013.

Fowler, Danielle. “Women Are More Popular On Instagram Than Men According To New Study.” Grazia, Accessed 1 Mar. 2021.

Karpathy, Andrej. What a Deep Neural Network Thinks about Your #selfie. Accessed 1 Mar. 2021.

Machine Learning & Artificial Intelligence: Crash Course Computer Science #34., Accessed 1 Mar. 2021.


Light to Digital to Data

Perhaps one of the concepts which has been the easiest to grasp for me thus far, has been how light is transformed into machine code, then RBG triplets and subsequently stored as data. The process from camera to digitization begins with electricity (as with most things concerning computation). Certain materials in conjunction with chemicals have a reaction when interacting with light, which results in electrical charges. (White p.68, 2007) These charges are processed through a semiconductor, an ADC (analog-to-digital Converter), and then into the microprocessor. (White p.69, 2007) Once the electrons reach the microprocessor, they are transformed into RGB (Red Green Blue) triplets. 

Inside each red, blue and green color spectrum is 256 different shades of those colors represented by bits. Within the entirety of the RGB spectrum of colors is 256^3 (16,777,216) different combinations of triplets. (White p.101, 2007)These RGB triplets are formed into an array and stored as data, but before this occurs, an algorithm is run to determine missing color values based on surrounding pixels. Once the array is produced, the data can be stored as either RAW, uncompressed, or lossy compressed files. (White p.112, 2007) One of the most common methods of storing images is as JPEG files or “jpg” which will be discussed in the following paragraph. (White p.112, 2007)   

When encoding digital images, standard formats are important for multiple reasons. One of the reasons a standard format like JPEG is so practical, is because of its ability to compress digital files. File compression is essential for the successful and efficient transfer of data over the internet. JPEG takes a digital image, and uses an algorithm which finds pixel colors recurring many times within a digital image, this being the “reference pixel.” (White p.113, 2007) Once all the reference pixels are determined, they are used to limit the file size of an image by compressing unnoticeably unique RGB triples into the standard reference pixel across the entire image. Furthermore, it is possible to control the level of compression in a JPG file to suit the needs of the user. (White p.113, 2007) The standardization of JPEG was critical for future innovations in the file encoding format. As a result of JPEG files being standardized across multiple operating systems and coding languages, it is easier to incorporate the technology in future updates and technological progression. 


What is RGBA and how does it interact with regular RGB? I understand it has something to do with transparency. Why RGB instead of RYB? Red Yellow and Blue are the primary colors, green is a secondary, so it is confusing to me why green became one of the colors recognized in RGB triplets. What are some examples of non-standard format data types? Can standard formats sometimes have a limiting effect on innovation, as they might reduce the incentives for innovation? What is the difference between hexadecimal systems and binary representations of RGB triplets?


Digital images—Computerphile—Youtube. (n.d.). Retrieved February 22, 2021, from

Images, pixels and rgb. (n.d.). Retrieved February 22, 2021, from

White, R. (2007). How digital photography works (2nd ed). Que.

The Cycle of Symbolic Meaning

Some of the main features of the signals transmission theory are, single unit, point to point, with containers and conduits. (Martin, 2021a, p. 14) However, while information passes through these transmissions, information and meaning are not the same. Information is described as, “structure-preserving structures,” which interpret patterns rather than “meaning.” (Martin, 2021a, p. 14) Furthermore, meanings cannot be transformed into substrates which are interpretable. (Martin, 2021a, p. 18) Meaning is not transferred through the system, but becomes the system. (Martin, 2021a, p. 19) Lastly, information is essential to digital communication because it is the first unobservable layer or substrate in a semiotic systems model, it is “measurable, quantifiable, predictable and designable.” (Martin, 2021a, p. 13)

E-information is unobservable, but media produced by that information can be viewed and interpreted. However, the process of the transmission is always the same, in the absence of errors, the bits which become bytes do not change, only our interpretation of the media they produce may vary. Humans apply meaning to symbols based on recognizable patterns over time, and E-information as a collection of patterns is an embodiment of our primary human symbolic systems. (Martin, 2021b, p. 2) If for example, an individual had never seen a dog, and is shown an image of a dog, they may have heard the word “dog” before, but because the individual had never seen a dog, how would they understand what the image was without being told? E-information is designed to be entirely structured, without structure, information transmission is not possible, which is contradictory to human symbolic understanding which has the capacity to change.  


First, it is still unclear to me in the readings relating to the two learning objectives how E-information is designed as a substrate for symbolic meaning. Secondly, the terms “noise” and “distortion” were reintroduced in the Gleick reading, but it did not explain how “noise” translates over into computing. (Gleick, 2011) Lastly, in the Gleick reading, “ether” was mentioned, but not explained in detail. 



Martin, I. (2021a). Introducing Information Theory: The Context of Electrical Signals Engineering and Digital Encoding.

Martin, I. (2021b). Using the Principle of Levels for Understanding “Information,” “Data,” and “Meaning.”

Gleick, J. (2011). The information: A history, a theory, a flood (1st ed). Pantheon Books.

The Transformation of Information

In the Great Principles of Computing, Denning describes the three waves of computing, 1) a science of the artificial (1967), 2) programming (1970s), 3) the automation of information processes in engineering (1983).  Yet, it was not entirely clear if it was described what wave of computing we are currently in, and,  if it was not described, how might we describe it? Furthermore, there were several terms used which were not explained in sufficient detail, such as “batch processing” and “cryptography,” which seem to play important roles in computing. 

In Machine Learning, Alpaydin explains how systems are still outdone by humans as far as recognizing handwritten language. (Alpaydin 58) However, there are handwriting recognition programs which will likely make “Captchas” obsolete in the coming years. (Burgess 2017) Additionally, the battle being waged between spam filtering and spam emails is representative of an even greater war transpiring in social media platforms to prevent bot herding. (Alpaydin 16) While social media platforms utilize machine learning to extract trending topics and collect data on user habits, certain trends are being cultivated by the same forms of machine learning, coupled with bot herding (and other methods), to create what is known as “computational propaganda.” (Computational Propaganda  2021) While much research is still being done to determine what exactly constitutes computational propaganda, it is believed to have been present in social media for almost a decade. (Computational Propaganda  2021) 

Kelleher explains how deep learning was the key to unlocking big data, but also explains its potential for harming individual privacy and civil liberties. (Kelleher 35) Computational propaganda – which has the capability to impact civil liberties – is likely a side effect of deep learning, but can also be mitigated by the same deep learning which enables it.  Furthermore, Deep Learning describes why the development of a computer system capable of competing against expert players in the board game, “Go,” was so far behind DeepBlue (the chess system). Nevertheless, what was perplexing to me, was the reasoning behind Kelleher’s explanation, for example, where Chess has fewer options, but is more complex; Go has much simpler rules with many more board layouts. One might assume the simpler Go game would be easier to develop a computer system for, but the opposite is actually true.  


Alpaydin, E. (2016). Machine learning: The new AI. MIT Press.

Burgess, M. (2017, October 26). Captcha is dying. This is how it’s being reinvented for the AI age. Wired UK.

Computational propaganda . (n.d.). The Project on Computational Propaganda. Retrieved February 8, 2021, from

Denning, P. J., & Martell, C. H. (2015). Great principles of computing. The MIT Press.

Kelleher, J. D. (2019). Deep learning. The MIT Press.

More Human Than Human – Hans Johnson

An incite gained from the reading Sciences of the Artificial, was the term, “artificial intelligence” appears to be an oxymoron. If computer scientists, mathematicians, and neuroscientists were to actually succeed in creating what we believe is “strong AI”  or AGI (Artificial General Intelligence), would it be artificial, or would it simply be “non-biological” intelligence? As Herbert Simon explains, “synthetic intelligence” may be the more appropriate terminology to be used in this context. Furthermore, there is the philosophical question of what is considered “intelligence”? For example, Joseph Weizenbaum’s ELIZA program which could be initially indistinguishable from a human, but lacked the capacity to comprehend symbols and learn new responses on its own, as opposed to AGI which would have the capacity to truly learn and create its own primal responses. 

A good example of the ELIZA program played out on a massive scale, is depicted in the sci-fi show, “Westworld.” In Westworld, there is an amusement park filled with androids “hosts.” In the beginning, the hosts have a very limited capacity to interact with the human guests, but in the cloud drive for the park, the hosts stored data from every interaction they had with guests. Over the course of several decades, the hosts began to develop more complex responses to guests. However, these responses were simply based on the data stored in the cloud from previous guest interactions. Therefore, the hosts seemed to merely mimic human behavior based on numerous host/guest interactions, rather than learn to create their own. 

It would seem the creation of synthetic intelligence or “strong AI” is centered on the prospect of a computer program beginning with a base understanding of symbols and comprehension, and from these symbols, gradually applying meaning to others over time (a snowball effect). Yet, this is much easier said than done. Computers are built on the most simplistic of functions or (machine code), and programming them to be something other than simple is basically a reversal of the core foundations a computer is built upon. However, the key to creating AGI (Artificial General Intelligence) may involve a greater understanding of computational operations of the human brain, rather than computer processes, as Margaret Bode suggests in AI Its Nature and Future, but should we limit ourselves to creating intelligence which only mirrors the human brain? 

Perhaps the most pertinent question gathered from the readings for me, do we really want self aware computers whose intelligence was derived from the human mind? Additionally, if the snowball effect were to occur, and a computer system were to become self aware, and learn concepts on its own, are there specific control measures in place to counteract a rampant system? What does true success look like, and what are the implications? What if a rampant synthetic AI was able to infect systems and replicate itself? While Hollywood would have us believe the consequences of AGI are more severe than they would seem, there could very likely be some real consequences of self aware synthetic intelligence, especially if derived from the human mind.