Category Archives: Week 11

Aren’t you scared?

Ethics and socio-technical implications of AI or the broader technology we use has been a recent hot topic. However, although it is a “rising phenomenon”, I’d like to question even where and when we come across it. Documentaries such as Coded Bias and The Social Dilemma depict who are the few who are aware of the implications AI can have on our lives and livelihood as well as our rights as we have known them thus far. This course is an example of that; people in academia, people who work in the field such as coders, data scientists, etc., ‘tech activists’ and politicians specialized on privacy, tech and internet issues and legislature. But are the big companies concerned? Are the governments concerned? Who does this system benefit? 

Few weeks ago I was at the doctor’s and when asked what I study he turned around and said “Aren’t you scared?”. In that moment my first thought was what a bizarre question to get from a doctor. Scared of what? Robots taking over? Not necessarily. Machines taking over? Ehh, not necessarily still far away from that, plus humans still control what we create in terms of human vs robot. However, what is scary is the way in which the technologies we use today are embedded in human biases, racism, sexism, human rights issues and other societal implications. People assume that because we don’t see what is behind our technology, we don’t see the “blackbox” that governs how the technologies we use on a daily basis operate, who operate them and how. I think up until now it has been hard for people to understand why there are biases in algorithms or issues within the system. A lot of question that typically pop up are: Don’t more women work for tech companies? Don’t people of color work in the field nowadays? Don’t we have a more diverse workforce in the tech industry? Yes sure, is the simple answer (not as many as they should but let’s start with baby stems for the sake of this description). The problem is that that isn’t the root of the issue. I truly appreciated how well Coded Biases explains why and how these biases exist from the ground up. Simple starting with the Dartmouth conference in the summer of 1956 we see who were the creators of this initiative, of AI, of ML, white men. So the whole process, the whole system starts being fed with patterns that depict just that. Not necessarily in terms of what photos we fead the ‘machine’ for facial recognition, but concepts, theoretical and social representations were those depicted by a small pool of data. The data that you feed in the machine is what it will learn from so if most photos are of red flowers for example, the system has an easier time reading red typical flowers than lets say a white purple-dotted orchid. So the discord of AI began and picked up on that narrative alone. A machine doesn’t have a soul, doesn’t have its own thinking, it learns from what humans feed into it. It does what humans tell it to do. So who is scarier? Humans or the machine? 

Yes, it’s mostly based on a code. Yes there is more diversity in the tech industry and field of AI. But who writes the code? Or better yet, who had written all the code, today’s code is based off of. Last class, we talked about people in academia, such us ourselves, who question the technology we study, the technology we use and work on even though it may be our passion, it may be our interest. However, that is what CCT and other Interdisciplinary, ‘liberal art’ courses/programs are for. Studying what your interests are, but also question what that is. Don’t just look at it from one point. Nothing about technology is simple or lined up in one straight path. Maybe it’s the anthropologists in me but it’s hard to grasp that so many people in the overall field and not concerned about the implications of their actions. There is code so we hide behind it.  For the most part, computer scientists, software engineers, data scientists don’t come out of undergrad or their extremely software engineer concentrated master’s having discussed the big issues of the system, of the technologies we use today. Yes it is an amazing advancement of our days that has exponential improved our lives in more ways that we can probably think on the top of our head, but are the socio-political-economic repercussions and biases greater that what we have made them? 

We saw in the weeks of Cloud Computing and Big Data, the consequences or ideas of monopolizing an industry in which regular people and companies alike, rely on for the safekeeping and management of their documents, files and many more. The naiveness of people who assume that just because they don’t have their location settings on or they don’t use social media, that they also don’t have a technological footprint or a way for governments to monitor them. Unless you are 100% detached from everything technological, which honestly can be pretty hard these days especially because if you as an individual might be, something that you use such as a product or service might not be, one way or another data or information about you is out there. Big Data isn’t just random numbers and information collected from thousands of sources. Those numbers and figures didn’t magically appear out of nowhere. They are being fed into the system by our own usage of everything that we do. At the end of the day, we are the machines. Our societies are the technologies. They didn’t create themselves out of nowhere. And being skeptical and able to question what is really going on behind the scenes is what is going to help us conceptualize and overcome the many issues and implications that exist and are constantly happening, as depicted in Coded Bias. 

 

Resources 

Film Documentary, Coded Bias (Dir. Shalini Kantayya, 2020). Available on Netflix, and free to view on the PBS Independent Lens site

“Big Data”: ACM Ubiquity Symposium (2018).

Rob Kitchin, The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. London; Thousand Oaks, CA: SAGE Publications, 2014. Excerpts.

Cathy O’Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (New York: Crown, 2016).

Boudewijn de Bruin and Luciano Floridi, “The Ethics of Cloud Computing,” Science and Engineering Ethics 23, no. 1 (February 1, 2017): 21–39.

Geoff Dougherty, Pattern Recognition and Classification: An Introduction (New York: Springer, 2012). Excerpt: Chaps. 1-2.

Big Data has Big Problems and Even Bigger Solutions

Big data is something I know a lot about because when the term started to be popularized in Psychology I was enamored, like many others, for the potential of doing anything with such large data sets, and the promises of being able to find truths that would normally be out of reach. Years later, after many studies and attempts to utilize such data I find myself realizing that, aside from the hype, Big data was just like any other technique we utilize, giving us lots of information but not a ton of knowledge.

There are a few things with big data that are problematic. First is the ability to generate information and connection between two seemingly irrelevant things. Which ordinarily sounds amazing until you realize that the best and most effective application of these tools to date is to market and sell ads/products to you more effectively. Amazon and YouTube are great examples of this. It knows what you want to buy before you want to buy it or the video you want to watch before you knew you want to watch it. This also made it so Facebook could control algorithms to improve or depress mood of those who use their site. What these companies who use Big Data care about is the bottom line which leads to the next issue.

Those who use Big Data sometimes don’t understand the ramifications of the work that they do. There was a study I saw a number of years ago which used a data set of faces to see if they could identify faces of Gay men. This is intriguing but also highly invasive and controversial. This isn’t much of an issue outside of a proof of concept but in countries like Iran where being Gay is illegal using something like this system to determine the likelihood someone is Gay (regardless of the true accuracy) is terrible. Data needs to come with the responsibility to use it or else we will end up with scenarios where we do something we can’t easily undo and end up harming a large group of people.

This brings me to my last point, Big Data used responsibly takes a lot of effort. There is a new project happening called the Human Screenome Project which takes pictures of what is shown on your phone every 5 seconds. Amazing large data set which will reveal a lot about how people use their phone, but to even parse through the millions of pictures to derive the information for analysis will take years and thousands of hours. Big data is fantastic but not some easy shortcut just because it’s there. When used responsibly a lot of time and effort needs to go into understanding what exactly it tells you and how to interpret what you’ve found.

Human Screenome Project

 

Why “Big” Does Not Necessarily Mean “Bad”  

Data is a term that has been in use since the 17th century, which at that time meant, “a fact given as the basis for calculation in mathematical problems.” (Data | Origin and Meaning of Data by Online Etymology Dictionary, 2021) Yet it was not until the early 21st century when the term “big data” was first introduced. (Foote, 2017) Big data seems to carry the same negative connotations as “big tech,” “big pharma” and “big government.” What differentiates big data from regular data is the “3V’s” (velocity, variety, and volume). (Kitchin, 2014, p. 1) In big data, the 3Vs are so extreme that standard statistical models become obsolete, and deep neural nets are often required to effectively analyze the data. (Alpaydin 104) To those unfamiliar with big data, it seems to represent more than just a multitude of factual information, it represents a further divide between those who seemingly have access to it, and those who do not. While in reality, big data is often used for improving the quality of life for everyday people, performing day to day activities. 

For example, an individual’s commute to work may have been made easier (and safer) by big data. By utilizing machine learning in conjunction with big data on most heavily congested roads in and around cities, we can make transportation more efficient. (Piletic, 2017) Big data in this scenario is acquired through IoT devices, car sensors, cameras and smart devices. (Piletic, 2017) The data collected is used for several important aspects of transportation, these being 1) city planning, 2) parking and congestion control, and 3) long commute times. (Piletic, 2017) 

Moreover, commuting is also made safer by the same combination of data and ML. Motor vehicle accidents are the leading cause of death in the United States for ages 1 – 54, and of the world for ages 5 – 29. (“Road Safety Facts,” 2021) While not all of these accidents can be prevented with data analytics, the numbers could be mitigated. ITS (Intelligent Transport Systems) are made possible through big data originating specially designed DAS (Data Acquisition Systems) which analyze driving behavior, patterns and posture. (Antoniou et al., 2018, p. 327)Additionally, crashes and traffic flows are analyzed with loop detector data, and MVDS (Microwave Vehicle Detection Systems). This data is used to determine the likelihood of vehicle accidents with concern for a multitude of variables, like location, vehicle type, speed, and traffic flow. (Antoniou et al., 2018, p. 325) 

While the previous examples are ways in which big data is utilized for the benefit of society, there remains some scenarios where big data is used improperly or inefficiently. However, these examples should not detract from the benefits, and should be handled on a case by case basis. Facial recognition has fallen under scrutiny due to the disproportionate false positives based on skin pigmentation. (Noorden, 2020) Additionally, big data is also used for information operations in social media platforms to incite divisiveness. (Torabi Asr & Taboada, 2019) These are all issues which can be addressed separately, without stifling the societal benefits of big data.  

References

Alpaydin, E. (2004). Introduction to machine learning. MIT Press.

Antoniou, C., Dimitriou, L., & Pereira, F. (2018). Mobility patterns, big data and transport analytics: Tools and applications for modeling. Elsevier.

Data | origin and meaning of data by online etymology dictionary. (n.d.). Retrieved April 12, 2021, from https://www.etymonline.com/word/data

Foote, K. D. (2017, December 14). A brief history of big data. DATAVERSITY. https://www.dataversity.net/brief-history-big-data/

Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), 205395171452848. https://doi.org/10.1177/2053951714528481

Noorden, R. V. (2020). The ethical questions that haunt facial-recognition research. Nature, 587(7834), 354–358. https://doi.org/10.1038/d41586-020-03187-3

Piletic, A. P. (2017, July 11). How IoT and big data are driving smart traffic management and smart cities. Big Data Made Simple. https://bigdata-madesimple.com/iot-big-data-driving-smart-traffic-management-smart-cities/

Road safety facts. (n.d.). Association for Safe International Road Travel. Retrieved April 12, 2021, from https://www.asirt.org/safe-travel/road-safety-facts/

Torabi Asr, F., & Taboada, M. (2019). Big Data and quality data for fake news and misinformation detection. Big Data & Society, 6(1), 205395171984331. https://doi.org/10.1177/2053951719843310

Avoid the Fallacies of Empiricism

With we entering the internet era, Big Data, accordingly, comes out. “The impact of digital data on society is very great and increasing. Social networks and big data determine what is noticed and acted upon” (Johnson et al., 2018). How to use it? How to correctly use it? How to deal with the consequences of it? We need to consider more and more things, especially we increasingly stress it in the domains of AI and ML. How to solve the challenges like capturing data, data storage, data analysis, etc., when we are using Big Data to train our AI/ML algorithms (Wikipedia)? On the way to de-blackbox, those technical issues should not be ignored, but this week, I want to put the emphasis on the attitudes towards it and the ethical problems.

Big data is not a simple term that we can understand as its literal meaning. “Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software” (Wikipedia). In Kitchin’s book, he gives readers a better understanding of data’s meaning by building Knowledge pyramid and classifications standards. He also points out that “how data are ontologically defined and delimited is not a neutral, technical process, but a normative, political, and ethical one that is often contested and has consequences for subsequent analysis, interpretation, and action” (Kitchin, 2014).  The point can also relate to views proposed by Cathy O’Neil, which are, in commonsense, predictive but too negative. She believes that the data is not objective and accountable, and the ‘standards of success’ are full of bias. She strongly suggests issuing laws, proposing independent audits, and building moral norms for practitioners (O’Neil, 2016). Her attitude to big data is overly pessimistic, with many negative examples. Data and algorithms are all tools for human beings. It is us that to determine how to use it. We should develop the ability to access and analyze the data instead of attributing the problems to the data itself. Like gender and ethical issues, the examples she gives should be attributed to human society rather than tracks/data we made in our activities. What she advocates could be seen in the data infrastructure, which contains social and cognitive structures. “Data infrastructures host and link databases into a more complex sociotechnical structure” (Kitchin, 2014). It is complicated and hard to achieve, but I believe it’s the ultimate way to solve all the unfair and biased cases appearing in O’Neil’s book. We need to overcome the fallacies of empiricism to speed up to complete the industry’s criteria. And for the ethical and moral part, the point is the same. It is the person that decides what to do, not the algorithms. There definitely will be lots of negative and harmful cases during the development process, but as time goes by, I believe we could build a reasonable system for Big data and AI/ML, exactly like how we create social systems since ancient times.

Reference

Johnson, J., Denning, P., Delic, K. A., & Sousa-Rodrigues, D. (2018). Ubiquity Symposium: Big data: Big data or big brother? That is the question now. Ubiquity, Volume 2018(Number August (2018)), Pages 1-10. https://doi.org/10.1145/3158352

Kitchin, R. (2014). The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. Sage Publications.

O’Neil, C. (2016). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown Publishers.

“Big data.” In Wikipedia, April 10, 2021. https://en.wikipedia.org/wiki/Big_data.

 

Big Data; myth or reality?

This weeks reading were definitely an interesting out look on what we mean by “big data” and re-defining not only the definitional but certain socio-economical theories and constitutes about it. For the most part, when people talk or generalize about “big data” it is this unfathomable, unquantifiable large amounts of information that we’re trying to categorize and clean-up for x and y reason for x and y company. Although this isn’t technically wrong, the authors and sources we looked at especially this week (but also can of course connect it to previous weeks and greatly falls right after last week’s cloud computing theme) give a different perspective on deeper issues and concepts that immediately surround and relate to the accumulation, understanding, meaning and use, purposeful or accidental, of the countless of data that is collected every second of every day. As Rob Kitchin puts it:

“Big Data creates a radical shift in how we think about research …. [It offers] a profound change at the levels of epistemology and ethics. Big Data reframes key questions about the constitution of knowledge, the processes of research, how we should engage with information, and the nature and the categorization of reality … Big Data stakes out new terrains of objects, methods of knowing, and definitions of social life. (boyd and Crawford, 2012)” (Kitchin, 2014, 1). 

In a more data science perspective, big data works to analyze, extract information, deal with large and complex amounts of data that cannot be deal with or processed through “every-day” software that we use on the go, as it also is sufficient enough for the fast pace lifestyles that most of us lead. Basically, accumulating unstructured data that needs to be filtered and categorize in order to serve a purpose. But what we are calling on here, is the realization of how much more complicated “big data” really is and the fact that in reality, “bit data” affect our lives way more than we actually realize and play a “prominent role in every aspect” of it (Huberman, 2017). The way we choose to live our lives nowadays, is a life that in a way is constantly interloped with technology (smart phones, emails, social media & networks, credit cards, smart home devices, cars, laptops, etc.) where knowingly or unknowingly we are constantly feeding back the system with so much information about us, what we do, where we are, what she buy, eat, drink, listen to, that in return we get a very personalized portfolio if you may, that matches our preferences, hobbies, lifestyles, etc. We get personalized ads, personalized feeds and more because of this mass accumulation of data that is taking place on a much larger scale than it did even 5-10 years ago since not only has the accumulation of different devices per individual increased but our lives on the internet have also developed on such an exponentially fast paced trajectory. Imagine how many people around the world are constantly “feeding” the cloud or companies with data and information that then has to be analyzed, categorized and set to its respective path to only be processed by companies and then be fed back to us in more disguised and discrete forms, one of those mainly being advertisements.

Johnson and Denning (2017) also emphasize this huge “big data revolution” as a result of “the expansion of the internet into billions of computing devices, and the digitization of almost everything. […] Digitization creates digital representations for many things once thought to be beyond the reach of computing technology”. And this exactly explains how much “big data” truly affect all aspects of life that not only have the ability to personalize ads but also indicate yet again how globalized this world has become because of the constant development of technology. For example, especially during this pandemic, we saw the importance of online education something that would have been never imagined or accepted years ago. The fact that so many children, people, educators, students, etc. all over the world are able to log onto to platforms from wherever they are for hours at a time while also being able to record, participate, interact and do so much more while getting an education can be attributed the the capacity of technology to support such activities and not only maintain them as they are happening live but also save them for future use. Even the cities we live in, accumulate countless of data over processes we most likely don’t even assume provide data yet it is truly hard to not be digitized nowadays, otherwise the difficulties and setbacks that can arise with being “disconnected” or not apart of the world. Transportation services in cities whether smart app rides Uber, Lyft, etc. who collect data, so does public transportation such as buses to monitor the amount of people who use them, to plan out routes, get informed on best possible routes, traffic accidents and more. CCTV and other security systems are constantly monitoring, recording and collecting information many times on the spot analyzing potential threats or issues. Of course, socio-political and economic issues are ultimately affected by the development and evolution of technology in all aspects of life. Examples of this are economic crisis that never just affect one entity or one country or one company but the whole system, wars and political disputes do not stay limited within borders or zones but expand into other circumstances and cross borders as people migrate, seek refuge, change status, etc. 

 

 

Resources

Bernardo A. Huberman, “Big Data and the Attention Economy” Ubiquity 2017, (December 2017): 2:1–2:7.

Jeffrey Johnson, Peter Denning, et al., “Big Data: Big Data or Big Brother? That Is the Question Now (Concluding Statement),” Ubiquity 2018, no. August (August 2018): 2:1–2:10.

Jeffrey Johnson, Peter Denning, et al., Big Data, Digitization, and Social Change (Opening Statement)Ubiquity 2017, (December 2017).

Rob Kitchin, “Big Data, New Epistemologies and Paradigm Shifts,” Big Data & Society 1, no. 1 (January 1, 2014): 1-12.

Rob Kitchin, The Data Revolution: Big Data, Open Data, Data Infrastructures and Their ConsequencesLondon; Thousand Oaks, CA: SAGE Publications, 2014. Excerpts.

Wikipedia, “Big Data

Data Overload

Before 2000, things were mainly stored on analog equipment like music on tapes and movies on film. Ever since 2002, with the improvement of obsoleting the poverty, the demand for efficient sharing dramatically increased. Because more people have the opportunity of getting electric devices, the digital age has begun. What differentiates big data from data is the 5v characters: large volume, more variety, high velocity, veracity, and value. The volume usually is the main factor to determine whether it is categorized as big data or not. (more than TB or PB). The term big data was popularized by John Mashey in the 1990s. Currently, the major usage of big data covers a wide range. Other than cloud computing, the data analyzing tech include ML and natural language processing as we mentioned in our previous class. Visualization also considers an expression of big data. The point of having big data is to deal with these massive amounts of data more efficiently than the traditional data processing equipment is unable to handle. However, the storage speed starts to exponentially increase ever since the 1980s. Even though there is one-third of our data are text and still images, it still becomes very crowded. Speaking of its application, I personally think the craze of NFT in 2021 is a sign of the big data explosion. It may represent a new stage of big data. The claim of ownership is what’s really behind NFT, and if the sense of “digital assets” starts to become convincing for the majority group, then it unenviably causes internet traffic. What’s worse is that most of the space will be occupied by the “recreations” (Just something that came to my mind).

Every aspect of our life relates to the application of big data. For example, the usage in a medical community makes sure the patients can receive personalized healthcare, more specifically, by abstracting data from a large database to build a detailed mechanistic model for individual patients. The detailed data source robust the treatment and makes the curative effect more efficient. However, other than the data bias, the data traffic is another major challenge. A single breast tomosynthesis takes nearly 450 MB, which equates to high-resolution commercial photography (besides, a professional photographer takes about 1000 – 2000 photos each shooting, and most of them are wasted in the cloud). Another example of big data usage is the recommendation systems we experience every day. On YouTube, Facebook, online shopping, etc. For example, Netflix, what surprised me most is not just they recommend the categories that seem interests to you, they even switch the covers to tempt their audiences, to let them change their mind. If an audience watches romantic movies a lot, the merchant will extract a clip of a scene of a kissing couple, even if it’s a horror movie or action movie. In this case, you may pass the movie for the first time simply because you didn’t like the content, they still will convince you to reopen it by simply changing a cover. And we as the audience are unable to notice most of the time. There are other examples like google Maps, dating apps, government uses big data to keep records, monitor the crime rates. Environmentally, official institutions can predict the natural disaster five days ahead by observing the exiting data. Other applications occur in education, marketing, social media, etc.

Question: What’s the usage of big data in blockchain? 

Resources:

Intro to Big Data: Crash Course Statistics #38. (2018, November 14). [Video]. YouTube. https://www.youtube.com/watch?v=vku2Bw7Vkfs

Viceconti, M., Hunter, P., & Hose, R. (2015). Big Data, Big Knowledge: Big Data for Personalized Healthcare. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 1–2. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7047725

What Is Big Data? (2016, March 7). [Video]. YouTube. https://www.youtube.com/watch?v=eVSfJhssXUA

Wikipedia contributors. (2021, April 12). Big data. Wikipedia. https://en.wikipedia.org/wiki/Big_data

Week 11

Many of the readings this week focus on the shifting definition of big data. To being with the more literal definition of “big,” digital data huge in volume (consists of terabytes or petabytes of data), high in velocity (being created in/near real-time), and diverse in variety in type (structured/unstructured in nature, temporally/spatially referenced) can be defined as big data. In addition, big data strives to capture entire populations or systems, making it exhaustive in scope, aims to be as detailed as possible, and relational in nature, allowing for the conjoining of different datasets, and is scalable (can expand in size rapidly) (Big Brother). Essentially, we try to record relevant data that we can combine with other potentially relevant data, all with the hopes of answering questions about populations/systems, which leads to the next definition of big data: one that is tasked with giving deep and new insights into human behavior. In this case, data is not “big” in its volume, velocity, or variety, but “big” in that in theory, huge amounts of data are available to anyone in the world over the internet (in reality there is private data) (Digitization). 

With our goals of generating relevant insights in mind, data science steps in to produce these insights. Driven by practical problems, data science is required to transform big data into useful, valuable information and involves finding relevant data, data preparation, data analysis, and data visualization (Huberman). The applications-driven nature of data science means that visualization is extremely important for understanding the output of the applications stage and communicating the results to clients and stakeholders. Given the absurdly large and complex amount of data, data scientists tackle the scientific challenge of formulating methods to represent complex and entangled systems. Data scientists utilize big data every day to generate insights. In one example, it was discovered that people tend to tell lies on Facebook while their Google searches reflect deep personal truths (Huberman).

In our everyday life, we use technology in ways that add to big data and data scientists’ work. For example, almost every time we use social media, go online shopping, or surf the web, data is being collected on us. Our locations, spending histories, interests, and sometimes even information about our body types are collected. This data is sold to many different private companies with the hopes of generating more insights about human behavior (and in most cases, generating more money). In today’s day and age, big data is used in governments, the education system, media, the healthcare industry, and a variety of other places.

 

Huberman, Bernardo A. “Big Data and the Attention Economy: Big Data (Ubiquity Symposium).” Ubiquity, December 2017, 1–7.
Johnson, Jeffrey. “Big Data: Big Data or Big Brother? That Is the Question Now.” Ubiquity, August 2018, 1–10.
Johnson, Jeffrey. “Big Data, Digitization, and Social Change: Big Data (Ubiquity Symposium.” Ubiquity, December 2017, 1–8.
Kitchin, R. The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. SAGE Publications, 2014. 

Big Data (Analysis, Application, Challenges, and Ethics)

1.   Introduction

Big data is a field of AI that introduces ways to analyze, systematically acquire information from, or otherwise manipulate data sets that are too large or complex to be dealt with by traditional data-processing application software.  Big data includes datasets with huge sizes exceeding traditional programs’ capacity to handle appropriate time and value (Wikipedia, 2020). The characteristics of Big Data are (Kitchin, 2014):

  1. Enormous volume, consisting of terabytes or petabytes of data.
  2. High velocity, being created in or near real-time;
  3. variety, being structured and unstructured in nature.
  4. Exhaustive in scope, striving to capture entire populations or systems.
  5. Fine-grained resolution and uniquely indexical in identification.
  6. Relation in nature, containing common fields that enable the conjoining of different data sets.
  7. Flexible, holding the traits of extensionality and scalability.

2.   Applications of Big Data

Big data is a sign that everything is changing. Every portfolio is affected: finance, transport, housing, food, environment, industry, health, welfare, defence, education, science, and more (Johnson, Big data: big data, digitization, and social change, 2017). Here some of the applications in big data (Wikipedia, 2020):

  1. Government

The use and modification of big data inside governmental applications allow getting the benefit, especially in terms of cost, productivity, and innovation.

  1. Healthcare

Providing personalized medication, clinical risk intervention, and medical prediction systems using big data analysis has improved healthcare very well.

  1. Media

The industry moves away from the traditional approach of using specific media such as newspapers, magazines, or television shows. Instead, it taps into consumers with technologies that reach targeted people at optimal times in optimal locations.

  1. Insurance

Health insurance providers gather data on social “determinants of health” such as food and TV consumption, clothing size, marital status and purchase habits. This information can be used to make predictions on health costs in order to spot health issues in their clients.

  1. Internet of Things (IoT)

The IoT devices provide information that is used to make a mapping of device interconnectivity. The media industry, special companies, and governments have been using these mappings in order to reach their audience more effectively and increase the efficiency of their media.

 

  1. Information technology (IT)

Big data has been used as a helpful tool for employees in their work, making “big data” significant within business operations. Big data helpful application in IT made the collection and distribution of information technology (IT) more efficient. Applying big data processes with Machine learning and deep learning makes IT departments more powerful in predicting potential issues and providing solutions before the problems even happen.

3.   Benefits of Big Data

The impact of big data, open data, and data infrastructures can be seen clearly in science, business, government, and civil society (Huberman, 2017). Here some of the benefits of Big data:

  • Businesses can analyze customer traffic to calculate precisely how many employees they will need each hour of the day. The goal is to spend as little money as possible (Arslan, 2016).
  • Geographical coverage: global sources delivered sizable and comparable data for all countries, no matter their size (Wikipedia, 2020).
  • Level of detail: providing fine-grained data with many interrelated variables and new concepts, such as network connections (Wikipedia, 2020).
  • Timeliness and time series: graphs can be produced within days of being collected (Wikipedia, 2020).

4.   Big Data Challenges

Big Data challenge is not a technical problem of transferring the maximum number of bits in the minimum amount of time, but also the scientific challenge of formulating approaches to perform the complex and twisted systems that must design and manage to run the modern world (Johnson, Big data: big data, digitization, and social change, 2017).

Another challenge is coping with its abundance and exhaustivity (including sizeable amounts of data with low utility and value), timeliness and dynamism, messiness and uncertainty, semi-structured or unstructured nature, and the fact that much of big data is generated with no specific question in mind or is a by-product of another activity. The tools for linking diverse datasets together and analyzing big data were poorly developed until recently because they have been too computationally challenging to implement (Huberman, 2017), (Useche, 2019).

5.   Ethics of AI, Data science and big data

Millions of people use the web for their social, informational, and consumer needs. During that, they publish their information through all social networks (Huberman, 2017).

The problem is that with real-world data, there is often information in there that you did not intend to be in there, but it is captured because of the bias in the data collection process. Human beings can have very diverse motives for why they make something. We need to put checks and control in place like any technology that it has utilized to benefit us (Askell, 2020).

Big commercial companies gather troves of private data claiming no interest in personal details, while in reality selling, exchanging, or misusing such data (Johnson, Big Data: Big Data or Big Brother, 2018).

 

References

Arslan, F. (2016). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, by Cathy O’Neil. Journal of Information Privacy and Security, pp. 157-159.

Askell, A. (2020, 12, 1). Ethics & AI: Privacy & the Future of Work. Retrieved from youtube: https://www.youtube.com/watch?v=zNxw5gJtHLc&list=PLzdnOPI1iJNeehd1RXhnVMBFi1WhWLx_Y&index=7

Huberman, B. (2017, 12). Big Data and the Attention Economy. ACM Digital Library, pp. 1-7.

Johnson, J. (2017, 12). Big data: big data, digitization, and social change. Ubiquity, an ACM publication.

Johnson, J. (2018, 8). Big Data: Big Data or Big Brothe. Ubiquity, pp. 2-10.

Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society.

Useche, D. O. (2019, 4). Challenges of Interpreting Big Data. Retrieved from “Big Ideas”: AI to the Cloud: https://blogs.commons.georgetown.edu/cctp-607-spring2019/category/week-11/

Wikipedia. (2020). Big data. Retrieved from Wikipedia: https://en.wikipedia.org/wiki/Big_data

Big data in media view and science view

In our daily life, media often advertises big data as collecting and using a lot of data to get the objective and correct answers. In other words, in the frame of media, the big data technology is described as an ideal input-output black box. Big data here is huge in volume, including the long tail (means cover all the things including minority) and objective (unconscious), so it seems reasonable that the results got from the big data is objective and correct.

In some ways, it makes sense. Take the recommendation system of music app as an example. First, recognize the patterns (regularities) in all the music data by a machine and classify them into different type based on the patterns. Then consider users’ behavior as data and get the behavior patterns. Map the user pattern and the music pattern and constantly adjust the output results (the music recommended) in real time to get better feedback (user clicks the like or favorite button or downloads the music). It’s a use of big data and machine learning and the big data here is fit to the definition of Kitchin: “huge in volume”, all the music and users behaviors here are the data; “high in velocity, being created in or near real-time”, every time the user clicks or does not click, it is generated in real time and will be returned as new data; “fine-grained in resolution and uniquely indexical in identification”, the system is based on each user’s behavior to recommend music and adjust the results, it can be said that each user’s recommendation system is unique; “relational in nature, containing common fields that enable the conjoining of different data sets”, data of the user behaviors, music, etc. are gathered together for real-time analysis; “flexible, holding the traits of extensionality (can add new fields easily) and scalability (can expand in size rapidly)”, the data use for analysis is allowed to add a new user, a new music or a new variable of user behavior.

Big data in the media perspective is actually an empiricist epistemology of big data. But actually, the results of the big data are not that objective and correct like the media describes, since the process of big data collection and analysis contains the participation of human, like the choices of algorithm and models. This is what Kitchin said” data are created within a complex assemblage that actively shapes its constitution”. In fact, the recommendation systems of different music apps are different and the same users will get the different recommendation results when use different music apps, which can prove that the big data technology is not so that objective and correct. (If so, the results should be the same.) What’s more, the big data only give the correlation and insights in the data but cannot explain why. But for the business view, it is useful enough. The operators do not need to know why the user like song A will also like song B, all they need to know is that there is positive relationship between song A and B. Therefore, it is reasonable for media to use the empiricist view of big data, since it can simplify the epistemology, easy for ordinary customers to accept and persuade to buy or use the service.

But it is in different situation when comes to science. The simple input-output model and empiricist epistemology cannot meet the need of science research. Take machine learning of good selfie as an example. The big data and machine learning can return a dataset of good selfies but it does not explain why they are good. The output only shows the phenomenon, or it shows a surface correlation between a good selfie and selfie patterns. It is a result of abduction, which means the machine give the best result in a specific scenario. But for the science, especially the humanities, it is not enough to only get a pattern snapshot. The important thing should be how to explain the correlation or why the machine return this result. Form this point of view, like Kitchin said,” the pattern is not the end-point but rather a starting point for additional analysis” (Kitchin, n.d.). The big data and machine learning gives a new method to find the phenomenon and then science research will do additional deduction or induction work to explain it.

 

Reference

Johnson, J., Denning, P., Delic, K. A., & Sousa-Rodrigues, D. (2018). Big Data or Big Brother? That is the question now. Big Data, 10.

Johnson, J., Denning, P., Sousa-Rodrigues, D., & Delic, K. A. (2017). Big Data, Digitization, and Social Change. Big Data, 8.

Kitchin, R. (n.d.). Big Data, new epistemologies and paradigm shifts. Big Data, 12.

Big Data, AI/ML and Cloud Computing: The Perfect Match- Chirin Dirani

Undoubtedly, the world is undergoing a technological revolution. This revolution is changing every aspect of our modern daily life and is evident in areas such as “finance, transport, housing, food, environment, industry, health, welfare, defense, education, science, and more.” According to the reading for this class, this revolution stems from the perfect match and combination of Big data, Cloud Platforms, and AI/ML. In week six, we have learned how AI/ML “hungry neural nets” use massive amounts of data for pattern recognition, and then make predictions based on already trained patterns to analyze new data. Last week, we dug deep in the definition and architecture of Cloud computing and identified the importance of Cloud platforms to AI/ML and Data systems.  For this week, I will delve into the world of Big data by explaining the key concepts of this revolutionary technology and elucidate how Big data exists because of Cloud computing.

Relatively, Big Data is a young term that was first used in the 1990s. Similar to Cloud Computing, there is no agreed academic definition of the term. The most common definition of Big data, mentioned in Rob Kitchin’s book, “refers to handling and analysis of massive datasets” and “makes reference to the 3Vs; volume, velocity and variety.” According to these 3Vs, Big data is “huge in volume,” “high in velocity” and “diverse in variety in type.” For Johnson and Denning, the Big data revolution occurred due to the “convergence of two trends: the expansion of the internet into billions of computing devices, and the digitization of almost everything.” In the sense that the internet provides access to massive amounts of data and digitization makes almost everything digital. There is a strong relationship between Big data, AI/ML and Cloud computing. Without Cloud computing, it is impossible for Big Data to exist. In the real world, the main providers of Cloud services provide the infrastructure and services for AI/ML and Big data to thrive. These providers use the concept of convergence to combine the three in one system. Through this system, unstructured Big data is classified, sorted, and analyzed by hungry neural net algorithms provided by AI/ML technologies, and the outputs are saved in cheap memories provided by Cloud computing ubiquitous servers. From this quick analysis, we infer that without AI/ML’s algorithms training, unstructured Big data can’t be classified and sorted. Also, without the infrastructure provided by Cloud computing, AI/ML processes of Big data can’t be implemented. 

The readings for this week varied between optimistic and pessimistic in the way they think of Big data developments; socially, technologically, educationally and application wise. What really resonates for me is Cathy O’Neil’s chapter; Civilian Casualties: Justice in the Age of Big Data. O’Neil’s work puts forward the notion that the incorrect outputs of Big data trained by AI/ML algorithms can lead to inequalities in our societies. O’Neil used examples from politics, education and the business sectors to validate her argument. The important conclusion I had from this chapter is the fact that Big data processes “codify the past” but they do not invent the future. According to O’Neil, only human moral imagination is able to do so. She advocates for the necessity to provide neural nets algorithms with human moral values to be able to produce ethical Big data. The question here is, are the “big four” providers of Cloud services able to put equality ahead of their profits?

References

Bernardo A. Huberman, “Big Data and the Attention EconomyUbiquity 2017, (December 2017).

Cathy O’Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (New York: Crown, 2016).

Jeffrey Johnson, Peter Denning, et al., Big Data, Digitization, and Social Change (Opening Statement), Ubiquity 2017, (December 2017).

Rob Kitchin, The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. London; Thousand Oaks, CA: SAGE Publications, 2014.

 

Big Data

I enjoyed the definition of Big Data by Kitchin that seems to be the community standard in which Big Data is different from other data because of terms like volume, velocity, variety, exhaustive, resolute, indexical, relational, flexible, and scalable amongst others that the Wikipedia blog also included. To me Big Data is data that shares the above italicized traits in which traditional computer processors and memory can not compute. Non-traditional would require AI/ML computation to deal with the abundance, exhaustivity and variety, timeliness and dynamism, messiness and uncertainty, and high relationality that is Big Data. My question for this, does the concept of Big Data involuntarily mean the use of AI/ML with the Data acquired? I know Kitchin characterizes Big Data as opposed to Data by being generated continuously, seeking to be exhaustive and fine-grained in scope, and flexible and scalable in its production, in doing so does that mean that Big Data we know today really only emerged from innovations in AI/ML? I believe the answer is yes, but would like confirmation.

Furthermore, Big Data is a relatively new phenomenon under the above definition because it is a result of two enabling changes in society that Denning argued. First, the expansion of the internet into a billion computing devices i.e. the Internet of Things that enables access to vast amounts of data. Second, the digitization of almost everything resulting in an explosion of innovation of network-based big data applications and automation of cognitive tasks. As a result, the emergence of Big Data from societal change is spurring more societal change. “Revolutions in science have often been preceded by revolutions in measurement,” – Sinan Aral. Science is only one thing that Big Data has changed, Kitchin argues that Big Data will move scientific approaches to a data-driven science method blending aspects of abduction, induction, and deduction the “born from data” rather than theory method. In society, Big Data is spurring changes in social networks and content providers ability to attract and hold consumers attention in the digital economy (Huberman). In government, Big Data can be a tool to enable surveillance and monitoring at unprecedented levels (Johnson). In education, Big Data is enabling the creation of different methods of learning and instructions through creation of personalized paths based on data analytics on users’ interactions with existing educational courses (Opening Statemen). The applications of big data touches nearly every aspect of our society though are there certain parts of our society that like cloud computing should not utilize Big Data?

Big Data is important because data is viewed as a prized resource that can optimize efficiency and profits for organizations or enable surveillance and security by governments. This and the relative newness of this technology has lent to a wild west in terms of lack of regulations and limits in the collection of data, the encroachment on consumer’s privacy and security rights, and the lack of transparency in models. So, I will echo a similar question made in the closing statement of the ACM Ubiquity, do regulatory initiatives even have the support to confront the ethical challenges in Big Data?

 

“Big Data.” 2021. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Big_data&oldid=1016820836.
“Kitchin-Big Data-New Epistemologies and Paradigm Shifts-2014.Pdf.” n.d. Google Docs. Accessed April 10, 2021. https://drive.google.com/file/d/1MkRUzSYCu1LKXWxkR6COus26Wae1srXC/view?usp=drive_open&usp=embed_facebook.
“Kitchin-The-Data-Revolution-Big-Data-Open-Data-Data-Infrastructures-Excerpts.Pdf.” n.d. Google Docs. Accessed April 10, 2021. https://drive.google.com/file/d/1T2JGeIHWkez0ecTgkWXJd5q4wWWkCOWl/view?usp=drive_open&usp=embed_facebook.
“ONeil-Weapons of Math Destruction (2016).Pdf.” n.d. Google Docs. Accessed April 10, 2021. https://drive.google.com/file/d/1ps92pvLRVWCbno4CrVCyYwCukDUWBJsh/view?usp=drive_open&usp=embed_facebook.
“Ubiquity: Big Data.” n.d. Accessed April 10, 2021. https://ubiquity.acm.org/article.cfm?id=3158352.
“Ubiquity: Big Data and the Attention Economy.” n.d. Accessed April 10, 2021. https://ubiquity.acm.org/article.cfm?id=3158337.
“Ubiquity: Big Data, Digitization, and Social Change.” n.d. Accessed April 10, 2021. https://ubiquity.acm.org/article.cfm?id=3158335.