De-blackboxing the Role of Machine Learning in Computational Propaganda

Hans Johnson

De-blackboxing the Role of Machine Learning in Computational Propaganda

Abstract

The COVID-19 pandemic has brought about drastic societal changes. One of the most evident changes is increased time spent at home. Consequently, many indoor forms of entertainment have witnessed substantial growth in popularity, for example, online video streaming, video gaming, and particularly social media use.1  Unfortunately, increased traffic on social media has inadvertently magnified exposure to information operations (disinformation). Information operations come in many different forms, yet one of the most prolific in social media has been computational propaganda. Within this subsect of disinformation, machine learning is used to full effect to amplify influential information, incite emotional responses, and even interact with legitimate users of social media platforms.2 There is a combination of machine learning and pattern recognition techniques utilized in this process, several of the most eminent being NLP (Natural Language Processing) and styleGAN (Generative Adversarial Networks). This research project will,1) Give a brief history in the evolution of propaganda and how historical and modern propaganda differ in scope, 2) Provide a foundational understanding of NLP and styleGAN, and 3) Describe how NLP and styleGAN is used to disseminate information, or otherwise amplify it.

Introduction

Propaganda has been used throughout human history by various state and non-state actors to influence human behavior to achieve a desired outcome. 3 Propaganda can be seen or heard in symbols, (images, words, movies, music, etc.). However, what separates propaganda from regular human discourse is it’s deliberateness. Propaganda is at its core, the deliberate creation or alteration of symbols (information) to influence behavior. 4 This is why propaganda in the digital age is so troubling. After all, computing is the structuring, management, and processing of information, (in what is now vast quantities).5  

The mass production of influential information and symbols began with the printing press. By the 1890’s, a single issue newspaper numbered over one million copies, allowing media to reach larger audiences than ever before. 6 Newspapers were known to influence public opinion, particularly leading up to, and during times of war. The cartoon depicted below was an editorial published in the Pennsylvania Gazette in 1754, which helped incite resistance in British colonies against French colonial expansion in North America. 7 

In the late 19th century, the Spanish-American war was agitated by the newspaper moguls William Randolph Hearst and Joseph Pulitzer, who began publishing what was known as “yellow journalism.” 8  This form of journalism published crude and exaggerated articles which were meant to sensationalize information, and otherwise promulgate emotional responses in viewers. The illustration in the newspaper below exhibits how false information can travel faster than the truth. In 1898, the USS Maine sank due to an explosion of unknown origins. Yet, before a formal investigation was conducted, newspapers circulated claiming the boat had sank due to a bomb or torpedo originating from the Spanish navy. An investigation at the time concluded the boat was sunk as a result of a sea mine. However, in 1976, a tertiary investigation proved the boat sank as a result of an internal explosion. 9 If such information was available at the time, the war may never have happened.  

A turning point for propaganda came when real images first began to appear in newspapers. Moments captured in real time have a profound impact on the human psyche. In 1880, the first halftone was printed in the Daily Graphic, beginning what is now known as photojournalism.10 The picture below is the half-tone printing in the Daily Graphic of New York’s shanty town. 

During World War 1, posters were the primary transmitter of propagandist material. 11 Although, the information was often originating from the targets populations own government. A combination of image and text sends a powerful message, the simple phrases directing the viewer to feel a certain way about an image. The poster on the right is depicting German troops in WWI committing what looks to be war crimes, coaxing viewers to join the military.12 The use of posters, cartoons, and images continued throughout the early half of the 20th century, and continues to this day. 

As we transition into the digital age, information reaches audiences across the world at unprecedented speeds, and it seems information has outpaced society’s capacity to process it. As literacy rates drastically increased in the past two centuries, so too did access to information, and consequently, propaganda. What is more troubling however, is literacy rates and tertiary education completion among adults has not increased proportionately. While nearly 100% of Americans aged 15 or older are literate, only approximately one-fourth receive tertiary education. 13 14  This creates a serious conundrum, as information proliferates.  However, at the same time, society’s capacity to  process that information in a comprehensive and objective manner is insufficient and detrimental. Below are two graphs, one depicting higher education completion in adults, the second showing US household access to information technology. 

What is even more concerning is the progressing capability of entities to target certain demographics with specific and generative information. The capacity to profile groups began early in the 20th century, by means of surveys which collected data on public opinion, consumer habits, and elections.15  In 1916, the Literary Digest began public polling on presidential candidates.16  This practice was further augmented by the Gallup Polls in the 1930s, which took into account more than just public opinion on elections, including the economy, religion, and public assistance programs.17  Understanding public sentiments was an important step in the evolution of influencing human behavior. 

Currently, human behavior can be categorized, documented, and influenced, based on our most intricate and personal habits. This is made possible as a result of  increased storage capacity in cloud infrastructure, machine learning, and deep neural networks. Furthermore,  most of this information is often gathered without user consent or knowledge.

Although this data is not always used to simply influence consumer habits, it can be used to disrupt social cohesion, instill distrust in democratic institutions, and incite violence based on race, religion and political disposition. Malicious entities, many originating from Russia, have infiltrated social media circles in the United States, creating false personas which present as activist groups of various motivations. Much of this intentional malicious activity is made possible through NLP and styleGAN.18  NLP is likely used in several ways by propagandists, most importantly, to translate propaganda from one language to another. Secondly, semi-automated chatbots are trained to interact with legitimate users. And with this, we will provide a base understanding of NLP and styleGAN. 

Natural Language Processing (NLP) 

NLP is essentially the intersection between linguistics and computer science.19 Natural written and spoken languages are encoded as data within a computer via acoustic receptors or typed text, then decoded by a program which places this data through a Deep Neural Net (DNN). This DNN routes the data through hidden layers of mathematical algorithms, and injects the data into a statistical model which produces the most accurate representation of said data. This method of machine learning has improved over the years, with IBMs statistical word level translations being some of the first NLP software. 

IBMs Word Based NLP

IBM’s software was successful for three reasons,  it could be improved, modified, and revised.20 This statistical model based ML began in the 1960s, and utilized rule-based algorithms known as “word tagging.”21 Word tagging would assign words grammatical designations like nouns, verbs, or adjectives, in order to construct functional sentences. Yet, as one could imagine, words are used in a multitude of ways in the English language, which created limitations.  An issue with IBM’s model was, however, the translation of single words, rather than entire sentences. Another issue which plagued IBM’s early statistical models was its inadequate access to data.22 Whereas, now in machine learning, the opposite is the case. There is such a multitude of data, that it must be cleaned and carefully chosen to meet certain needs. The diagram below is depicting IBM’s statistical model process.23

Google’s GNMT

Google’s Neural Machine Translation (GMNT) system is making strides in increasing the accuracy of machine translation and speech recognition in several ways. First, GNMT encodes and decodes entire sentences or phrases, rather than word to word translation like IBM’s early NLP.24  Secondly, GNMT employs Recurrent Neural Networks (RNN) to map the meaning of sentences from one input language to an output language. This form of translation is much more efficient than word to word or even phrase based methods. Below is an example of GNMT actively translating Chinese text, a language historically difficult to translate in NLP software. This is shown directly from Google’s AI blog is below :25

Yet, the sentence based GNMT is coming very close to decoding complex Chinese text. The concept of translating entire sentences to produce meaning in another language was once thought of as early as 1955:

“Thus may it be true that the way to translate from Chinese to Arabic, or from Russian to Portuguese, is not to attempt the direct route, shouting from tower to tower. Perhaps the way is to descend, from each language, down to the common base of human communication—the real but as yet undiscovered universal language—and then re-emerge by whatever particular route is convenient” 26

Other developments in NLP are numerous, and one of them is particularly concerning as it pertains to propaganda, this being GPT3.

GPT3

Generative Pre-trained Transformer 3 or “GPT3” is the third version of a text generator which utilizes machine learning and DNN to produce and predict speech. The capabilities of GPT3 can include answering questions, composing essays, and even writing computer code.27 Yet, unlike GNMT, IP surrounding GPT3 is kept mostly secret, with the exception of some application programming interfaces. The GPT utilizes over 175 billion parameters in its weighting system to determine the most plausible course of action, more than ten times the number of its next highest competitor.28 GPT3 with its Q & A feature can answer common sense questions, setting it apart from other AI software as seen here29

Yet for all its practical applications, there are also some serious deficiencies. The text it generates can sometimes be far from the desired outcome.  At times, bias becomes evident, and even racially discriminatory.30 Additionally, instead of correcting some of the shortcomings in previous versions, GPT3 simply offers a wider range of weighting parameters.

Generative Adversarial Networks

Generative Adversarial Networks consist of two DNN’s, one is the generative and the other  discriminative. The generative creates a form of media, whether this be images, soundwaves or text, and this is analyzed by the discriminative. The adversary compares this media to a real image, and if deemed real, the generative network wins; if it is considered fake, the generative runs its algorithms again, then produces more images.  

StyleGAN

StyleGAN is a derivative of Generative Adversarial Networks which is able to produce high definition artificially created images of human faces which appear real to an untrained eye. StyleGAN generates more unique images than other GAN generated images.31  Here are some examples of human faces produced with StyleGAN:32

Machine Learning in Computational Propaganda

One of the many benefits of open source information and free software, is enriching  the lives of the less privileged segments of society, and this, even though an unintended consequence of open source technology is its use by nefarious entities. Currently, Google Translate, GPT2 and StyleGAN are open source. This means malicious actors can utilize the technology with virtually no cost in R&D or use. The possible applications of such technology relating to propaganda are many. 

Role of NLP

NLP is perhaps the most concerning of the machine learning techniques which can be utilized by foreign entities. One of the many barriers which previously limited the spread of information historically, was language. Now with GNMT becoming progressively more accurate, malign actors can translate vast sums of foreign languages more efficiently and quickly than ever before. This serves a dual purpose, as GNMT can be used by foreign actors to send more complex messages which are less distinguishable from the target language. Secondly, it becomes easier to research divisive topics in various regions of the world. Below are two propaganda ads originating from Russian sources from 2015 – 2016.33 Both ads cover politically divisive topics, one being LGBTQ rights, and the other police brutality against minorities.  

The text contained in the previous ads is crudely translated, likely indicating NLP was used for their production. It is possible crude translation is why these ads were discovered in the first place. Many of the Russian ads from 2015-2017 released by the US Senate Intelligence Committee contain frequent grammatical and translation errors.34 Furthermore, most, if not all the ads relate to race, religion, sexuality, or politics. 

Role of GPT2

GPT2 can be utilized by malign actors in multiple ways. One possible use is to train chatbots to interact with legitimate users in social media platforms.35 The Q & A feature of GPT is what makes such interactions possible by directing chatbots to comment on specifically tagged posts, popularize hashtags, and potentially respond to emphatic replies from users.36 Secondly, GPT2 can boost the relevance of posts which fake accounts are trained to popularize. Lastly, GPT2 can be used to create fake profile biographical information to afford more legitimacy to fake accounts. 

Role of StyleGAN

The role of styleGAN in computational propaganda is to add legitimacy to fake profiles.37 As seen above in the collage of AI generate photos, real from fake can be difficult to differentiate. Adding a human face to profiles is particularly useful for creating false personas whose mission is to produce content to be amplified by either autonomous or semi-autonomous accounts. Below is a fake twitter account generated entirely autonomously:38

The limitations of NLP and styleGAN in Propaganda

The R&D associated with NLP and styleGAN is complex, but its use in spreading information is simple; create false personas, like, share, comment and react. While the applications of NLP and styleGAN are numerous for the proliferation of fake news, what is more concerning, is the amplification of factual, yet divisive news. By simply reinforcing already existing divisions, computational propaganda self-proliferates. Propaganda is most successful concerning topics which are already extreme points of contention.

“Propaganda is as much about confirming rather than converting public opinion. Propaganda, if it is to be effective must, in a sense, preach to those who are already partially converted”  – Welch, 2014 39

The previous statement has become particularly evident in the past few years in American politics, concerning the sense of tribalism in race, religion, and sexuality.40 Take, for example, the following Russian propaganda ad: 

In hindsight, the ad does not seem to send such a divisive message. After all, most could get behind supporting homeless veterans. However, what differentiates this ad from the previous, is its subtlety. The ad received nearly 30,000 views and 67 clicks, far more than the police brutality and LGBTQ ad, likely because it was not identified as early as its counterparts. Secondly, if one is to take note of the date on the ad, it was created not long after the Baltimore riots in the aftermath of Freddie Gray’s death.41 The ad also is tailored to target African-American audiences. The timing of information appears to be just as important as the message, and with modern technology, timing is almost never an issue. 

Conclusion

Machine learning plays a fundamental role in amplifying  information, but a limited role in creating it. Successful conspiracy theories require time to fabricate, and even more importantly, human, rather than artificial, intelligence.42 In fact, the overuse of AI in spreading information can be detrimental to an operation, as it flags the associated accounts or posts due to over activity.43 After analyzing a multitude of Russian propaganda ads between 2015-2017 released by the Senate Intelligence Committee (provided by social media platforms), it became apparent the ads which were discovered contained poor grammar. This may suggest the gap in data indicates foreign entities are using machine learning to analyze which ads are taken down and which remain.44  Additionally, a rather obvious bias in the data, it consisted almost entirely of sponsored ads paid for in Russian rubles, which is easily trackable. What was also absent from the released data, was a very modern influential form of propaganda, this being memes. In recent years, the Russian Internet Research Agency has garnered a strong following in its troll accounts on Instagram, which reach millennial audiences of varying demographics utilizing memes and pop culture, memes which are likely curated entirely by individuals, not AI.45 The human element of propaganda remains just as relevant as it did in the 20th century, and will likely continue well into the 21st century. 

End Notes

  1. Samet, A. (2020, July 29). How the coronavirus is changing us social media usage. Insider Intelligence. https://www.emarketer.com/content/how-coronavirus-changing-us-social-media-usage
  2.  Woolley, S., & Howard, P. N. (2019). Computational propaganda: Political parties, politicians, and political manipulation on social media. http://www.oxfordscholarship.com/view/10.1093/oso/9780190931407.001.0001/oso-9780190931407
  3.  Smith, B. L. (n.d.-a). Propaganda | definition, history, techniques, examples, & facts. Encyclopedia Britannica. Retrieved May 11, 2021, from https://www.britannica.com/topic/propaganda
  4.  Smith, B. L. (n.d.-a). Propaganda | definition, history, techniques, examples, & facts. Encyclopedia Britannica. Retrieved May 11, 2021, from https://www.britannica.com/topic/propaganda
  5. What is computing? – Definition from techopedia. (n.d.). Techopedia.Com. Retrieved May 11, 2021, from http://www.techopedia.com/definition/6597/computing
  6. Newspaper history. (n.d.). Retrieved May 11, 2021, from http://www.historicpages.com/nprhist.htm
  7. The story behind the join or die snake cartoon—National constitution center. (n.d.). National Constitution Center – Constitutioncenter.Org. Retrieved May 11, 2021, from https://constitutioncenter.org/blog/the-story-behind-the-join-or-die-snake-cartoon
  8. Milestones: 1866–1898—Office of the Historian. (n.d.). Retrieved May 11, 2021, from https://history.state.gov/milestones/1866-1898/yellow-journalism
  9. Milestones: 1866–1898—Office of the Historian. (n.d.). Retrieved May 11, 2021, from https://history.state.gov/milestones/1866-1898/yellow-journalism
  10. The “daily graphic” of new york publishes the first halftone of a news photograph: History of information. (n.d.). Retrieved May 11, 2021, from https://www.historyofinformation.com/detail.php?id=3930
  11. Posters: World war i posters – background and scope. (1914). //www.loc.gov/pictures/collection/wwipos/background.html
  12. Will you fight now or wait for this. (n.d.). Retrieved May 11, 2021, from //www.awm.gov.au/collection/ARTV00079
  13. Roser, M., & Ortiz-Ospina, E. (2013). Tertiary education. Our World in Data. https://ourworldindata.org/tertiary-education
  14. Roser, M., & Ortiz-Ospina, E. (2016). Literacy. Our World in Data. https://ourworldindata.org/literacy
  15. Smith, B. L. (n.d.-b). Propaganda—Modern research and the evolution of current theories. Encyclopedia Britannica. Retrieved May 11, 2021, from https://www.britannica.com/topic/propaganda
  16. The “literary digest” straw poll correctly predicts the election of woodrow wilson: History of information. (n.d.). Retrieved May 11, 2021, from https://www.historyofinformation.com/detail.php?id=1349
  17. Inc, G. (2010, October 20). 75 years ago, the first gallup poll. Gallup.Com. https://news.gallup.com/opinion/polling-matters/169682/years-ago-first-gallup-poll.aspx
  18.  P. 4 Martino, G. D. S., Cresci, S., Barrón-Cedeño, A., Yu, S., Pietro, R. D., & Nakov, P. (2020). A survey on computational propaganda detection. Proceedings of the Twenty-Ninth International Joint Conference
  19.  What is natural language processing? (n.d.). Retrieved May 11, 2021, from https://www.ibm.com/cloud/learn/natural-language-processing
  20.  P. 118 Poibeau, Thierry. Machine Translation. 1st ed., MIT Press, 2017.
  21. A beginner’s guide to natural language processing. (n.d.). IBM Developer. Retrieved May 11, 2021, from https://developer.ibm.com/technologies/artificial-intelligence/articles/a-beginners-guide-to-natural-language-processing/
  22. A beginner’s guide to natural language processing. (n.d.). IBM Developer. Retrieved May 11, 2021, from https://developer.ibm.com/technologies/artificial-intelligence/articles/a-beginners-guide-to-natural-language-processing/
  23.  P. 118 Poibeau, Thierry. Machine Translation. 1st ed., MIT Press, 2017.
  24. Le, Q. V., & Schuster, M. (n.d.). A neural network for machine translation, at production scale. Google AI Blog. Retrieved May 11, 2021, from http://ai.googleblog.com/2016/09/a-neural-network-for-machine.html
  25.  Le, Q. V., & Schuster, M. (n.d.). A neural network for machine translation, at production scale. Google AI Blog. Retrieved May 11, 2021, from http://ai.googleblog.com/2016/09/a-neural-network-for-machine.html
  26.  P. 64 Poibeau, Thierry. Machine Translation. 1st ed., MIT Press, 2017
  27. Marr, B. (n.d.). What is gpt-3 and why is it revolutionizing artificial intelligence? Forbes. Retrieved May 11, 2021, from https://www.forbes.com/sites/bernardmarr/2020/10/05/what-is-gpt-3-and-why-is-it-revolutionizing-artificial-intelligence/
  28. Vincent, J. (2020, July 30). OpenAI’s latest breakthrough is astonishingly powerful, but still fighting its flaws. The Verge. https://www.theverge.com/21346343/gpt-3-explainer-openai-examples-errors-agi-potential
  29.  Sharma, P. (2020, July 22). 21 openai gpt-3 demos and examples to convince you that ai threat is real, or is it ? [Including twitter posts]. MLK – Machine Learning Knowledge. https://machinelearningknowledge.ai/openai-gpt-3-demos-to-convince-you-that-ai-threat-is-real-or-is-it/
  30. Vincent, J. (2020, July 30). OpenAI’s latest breakthrough is astonishingly powerful, but still fighting its flaws. The Verge. https://www.theverge.com/21346343/gpt-3-explainer-openai-examples-errors-agi-potential
  31.  P. 1 Karras, T., Laine, S., & Aila, T. (2018). A style-based generator architecture for generative adversarial networks. https://arxiv.org/abs/1812.04948v3
  32.  P. 3 Karras, T., Laine, S., & Aila, T. (2018). A style-based generator architecture for generative adversarial networks. https://arxiv.org/abs/1812.04948v3
  33. Social media advertisements | permanent select committee on intelligence. (n.d.). Retrieved May 11, 2021, from https://intelligence.house.gov/social-media-content/social-media-advertisements.htm
  34. Social media advertisements | permanent select committee on intelligence. (n.d.). Retrieved May 11, 2021, from https://intelligence.house.gov/social-media-content/social-media-advertisements.htm
  35.  P. 4 Martino, G. D. S., Cresci, S., Barrón-Cedeño, A., Yu, S., Pietro, R. D., & Nakov, P. (2020). A survey on computational propaganda detection. Proceedings of the Twenty-Ninth International Joint Conference
  36. Xu, A. Y. (2020, June 10). Language models and fake news: The democratization of propaganda. Medium. https://towardsdatascience.com/language-models-and-fake-news-the-democratization-of-propaganda-11b1267b3054
  37. O’Sullivan, D. (2020, September 1). After FBI tip, Facebook says it uncovered Russian meddling. CNN. https://www.cnn.com/2020/09/01/tech/russian-troll-group-facebook-campaign/index.html
  38. O’Sullivan, D. (2020, September 1). After FBI tip, Facebook says it uncovered Russian meddling. CNN. https://www.cnn.com/2020/09/01/tech/russian-troll-group-facebook-campaign/index.html
  39.  P. 214 Welch, D. (2004). Nazi Propaganda and the Volksgemeinschaft: Constructing a People’s Community. Journal of Contemporary History, 39(2), 213-238. doi: 10.2307/3180722
  40. NW, 1615 L. St, Suite 800Washington, & Inquiries, D. 20036USA202-419-4300 | M.-857-8562 | F.-419-4372 | M. (2014, June 12). Political polarization in the american public. Pew Research Center – U.S. Politics & Policy. https://www.pewresearch.org/politics/2014/06/12/political-polarization-in-the-american-public/
  41. Peralta, E. (n.d.). Timeline: What we know about the freddie gray arrest. NPR.Org. Retrieved May 11, 2021, from https://www.npr.org/sections/thetwo-way/2015/05/01/403629104/baltimore-protests-what-we-know-about-the-freddie-gray-arrest
  42. Woolley, S., & Howard, P. N. (2019). Computational propaganda: Political parties, politicians, and political manipulation on social media. http://www.oxfordscholarship.com/view/10.1093/oso/9780190931407.001.0001/oso-9780190931407
  43. Woolley, S., & Howard, P. N. (2019). Computational propaganda: Political parties, politicians, and political manipulation on social media. http://www.oxfordscholarship.com/view/10.1093/oso/9780190931407.001.0001/oso-9780190931407
  44.  P. 4 Martino, G. D. S., Cresci, S., Barrón-Cedeño, A., Yu, S., Pietro, R. D., & Nakov, P. (2020). A survey on computational propaganda detection. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 4826–4832. https://doi.org/10.24963/ijcai.2020/672
  45. Thompson, N., & Lapowsky, I. (2018, December 17). How russian trolls used meme warfare to divide america. Wired. https://www.wired.com/story/russia-ira-propaganda-senate-report/