Category Archives: Week 11

How “Big” is Big Data?

Similarly to how the media portrays machines in a humanely way, there is plentiful social discourse on the term “big data” when trying to further sell a hyperrealistic future that’s fully immersed in advanced technology. “Big Data” can be used to scare away the technologically unaware from growing fearful of their technology. This, however, is simply not the case, for big data is representative of the complex and multi-layered bodies of information from one point to another.

A dynamic shift between researchers and the development of technology has taken place in which researchers now find themselves trying to quickly write up and conceptualize algorithms and methods to represent the multidimensional systems that operate the technologically advanced world we live in today (Johnson, Denning, et al.). This is why data scientists are vital to the field of technology so that they can develop formulations at a greater speed. Big data can be seen as an older sibling to data itself (mainly because it is), where it hosts and transports large and different bodies of information from one end of a server to another. It’s inevitable presence and use in technology is a great contribution to the fields of natural language processing and machine learning, where the data collected can be coded and formulated to then become synthesized for machines to learn.

The biggest problem with big data is that due to the size of what “big data” really is, it’s at risk of causing severe negative implications to a technological ecosystem. Regardless of what big data encompasses (IoT’s, IT, the cloud), it’s inevitable that big data is ‘big data’ because of how much faith there has been to not only provide so much information, but also allow or there to be one housing unit for that information.

Jeffrey Johnson, Peter Denning, et al., Big Data, Digitization, and Social Change (Opening Statement), Ubiquity 2017, (December 2017).

Big Data & Advertising

The purpose of advertising and marketing is to inform, educate, and ultimately persuade consumers to buy a given product or service. According to the STP strategy, namely Segment, Target, Position in marketing, defining the target audience is very important and advertisers need to analyze data to resonate with their target audience. Different types of audience will prefer and resonate with different content. Advertisers will first need to group different people by age, economy, or other factors and then use different market strategy to respond to each group.

These are all about big data. Big data enables companies to better target the core needs of customers by developing rich and informative content. Big data can be considered as the huge amount of information which are available to anyone in the world over the internet. Big data can help advertisers gain essential insights into their target demographics, such as patterns, consumer habits and trends, etc. Using artificial intelligence to extract and analyze data from many avenues such as subscriptions, followers, browser history, search record can understand the audience’s preference better. These figures generate insights that can lead to better business decisions and strategic moves. The application of the right technology improves the quality of decision making and detailing processes. When the transaction has been changed from offline to online, the data has been more digitalized so that the artificial intelligence technology to analyze such huge amount of data.


https://www.mentionlytics.com/blog/5-real-world-examples-of-how-brands-are-using-big-data-analytics/

This is similar to the recommendation system. For example, Netflix uses big data analysis for target advertising. In this case, the big data is from over 100 million subscribers, their past search and view history. The use of artificial intelligence and big data can help uncover the hidden patterns, correlations and give insights so as to make proper business decisions and customize audience’ preference. The modern data analytics systems allow for speedy and efficient analytical procedures. This ability to work faster and achieve agility offers a competitive advantage to businesses.

Reference

  • Bridgewater, R. (2016). O’Neil, Cathy. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. (Brief article) (Book review). Library Journal, 141(19).
  •  Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1). https://doi.org/10.1177/2053951714528481

Air, Water and Facebook

Last week’s gnoviCon had some fascinating insights on the impact of Big Data and its potential to overturn established concepts of trust, privacy and security. The theme for this year was Big Tech, Data & Democracy and the keynote speaker, Siva Vaidhyanathan, touched on all three of those points in his talk on Facebook and its role in a democratic society.

A forerunner in accumulating and storing Big Data, Facebook advertising is unparalleled, even more so as a political tool. The 2016 presidential elections marked the first time in modern history that a political party’s campaign strategy involved investing heavily not on television ads, but in social media. With Facebook having 2.3 billion users, Vaidhyanathan challenged the audience to think of a company with a similar reach – “I don’t know anything that’s touched that many people, except maybe air and water. It goes air, water, Facebook!” he mused.

Recent events such as the Facebook Cambridge Analytica scandal coupled with the affordances of targeted advertising raise the question of whether Facebook and other social media sites are helping or hindering democracy. In this era of “fake news”, even unpopular opinions can be amplified on Facebook due to the algorithm, as if your “friends” interact with such a post, it will get more traction. Panelist Ethan Porter argued, “It’s incumbent upon social media companies to invest in combating misinformation. The good news however, is that people can become more politically informed by using social media”. This is an example of by-product learning (Prior, 2013) the act of learning political information through an unintended source. In recent times, most people learn political facts as a by-product of non-political routines such as scrolling through Facebook, circling back to the fact that Big Tech companies are held responsible for the security and visibility of sensitive data.

Another point brought up by the advent of big tech and the pernicious side effects of ubiquitous computing is that we are so overwhelmed by “a constant barrage of stimulation – a mini Times Square in our pockets – that it habituates us to the immediacy of its call,” said Vaidhyanathan. Our investment in social media has resulted in the dis-investment in social institutions such as science and health technology that help us collectively work to solve problems, adding to the vast galaxy of big data along the way.

References

Prior, Marcus. 2014. Post broadcast democracy. Chapter 1, “Introduction” and Chapter 3, “Broadcast television, political knowledge, and turnout.”

Bid Data and Privacy

Tianyi Zhao

Figure 1. “On the Internet, nobody knows you’re a dog.” Peter Steiner, The New Yorker, 1993.

(Source: https://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_you%27re_a_dog)

The picture shows above is an adage and meme about Internet anonymity by Peter Steiner, published by the New Yorker twenty-six years ago. The anonymity is also blamed for numerous crimes. However, in the era of big data, everyone on the Internet is so “naked” that each character and habit of us has been represented as the binary system 0 or 1. The data owners benefit from exploiting datasets we have produced. Privacy becomes not private any more, unless we are completely disconnected from the Internet.

On the one hand, we seem to enjoy the comfort and convenience brought by big data, which know us better than we know ourselves; on the other hand, we have to abandon individual privacy. We detest our IDs are sold and exploited, but we do not really care that our privacy is collected, analyzed, and used as long as it is not linked to specific individuals. Because we always believe that if the government or commercial organizations collect more personal data, they will provide better public services and enable customers to consume better products at lower prices.

However, there have been amounts of data abuse scandals in recent years. Facebook has been criticized that more than 50 million Facebook users’ information was accessed and exploited by political data firm Cambridge Analytica to target users with accurate advertising content in order to help Donald Trump’s campaign for President in 2016. Yahoo has been under tremendous pressure for risking the privacy of 3 billion people. Google is facing a big challenge—the launch of the GDPR in E.U. Controlling over 90% of many European countries’ market for general web searches, compared with 68% in the U.S. market, Google collected and analyzed more data than any other company. On Jan 21, 2019, Google was fined nearly $57 million by French regulators for violating GDPR rules because Google failed to “fully disclose to users how their personal information is collected and what happens to it.” (Romm, 2019) The GDPR is clearly a significant extension of the global process of policy convergence, in which criteria for convergence are deepening. (Bennett, 2018) Before the implementation of the GDPR in 2018, around 120 countries in the world have passed data protection statutes which meet at least minimum standards of formal international agreements. (Greenleaf, 2017) Steve Wilson coined a new term as Big Privacy, referring to the data privacy compact for the era of big data and AI. It is designed to enhance transparency about how personal data is collected and created, engender more restraint in how it is used and grant customers appropriate control over data about them. (Wilson, 2018) We can see that the legal systems in the globe are improving the privacy protection, and the GDPR, although it is the most restrict up to now, is just a start point.  The technology giants should self-regulate for achieving further the legal compliance by leveraging with the concept of Big Privacy, beginning from complying with the GDPR.

 

Works Cited

Romm, Tony. “France fines Google nearly $57 million for first major violation of new European privacy regime.” The Washington Post, Jan. 21, 2019.

“Top Data Privacy and Security Scandals.” Datafloq, Nov. 15, 2018.

Bennett, Colin J. “The European Genral Data Protection Regulation: An instrument for the globalization of privacy standards?” Information Polity, 2018. https://pdfs.semanticscholar.org/3813/041fc44467933d64c54c3e39a467c2be63c3.pdf

Greenleaf, G. “Global Data Privacy Laws 2017: 120 National Data Privacy Laws, Including Indonesia and Turkey.” Privacy Laws & Business International Report, Jan 30, 2017.

Wilson, Steve. “Big Privacy: The data privacy compact for the era of big data and AI.” ZDNet, Dec 5, 2018.

What Do We See in the Digital Age

Micro-targeting is a technique used by commercial marketers, but it is employed by political parties in political campaigns to track individual voters and identify potential supporters. It would not be possible in a large scale without the development of large-scale database containing data about as many voters as possible. The database tracks voters’ habits as companies do when they analyze consumer behaviors. They collect data about an individual voter based on his or her information that one shares online consciously or unconsciously. Voters’ demographic information along with other hundreds of variables are gathered together to be analyzed by data scientists. The outcomes are used by political parties to “better” communicate with voters through direct mails, phone calls, emails, and social media. In this way, political parties can have a significant impact on the voters.

Big data and ML algorithm have penetrated our daily life in the way that one cannot even notice it. It reminds me of agenda-setting theory. It was first raised by Lippman and then formally developed by McCombs and Shawin a study on the 1968 American presidential election. Agenda-setting theory describes the “ability (of the news media) to influence the importance placed on the topics of the public agenda”. It shows the power of media agencies that although they cannot decide how you think, but they can decide what you think about. When agenda setting theory meets big data, the whole landscape has changed significantly.As soon as we talk, share, and purchase online, our information will be recorded instantly. The huge amount of data about one specific person is connected as a societal system and through the analysis of the societal system, they can roughly know who you are and what you like. After that, they feed you what you may like and get profit from your attention. As Bernardo A. Huberman said, attention is what we value most in the digital age. Focus is always finite and ephemeral. So that’s why companies try their best to obtain and hold consumers’ attention through various methods including lurid headlines, targeted advertising and etc.

It is easier than before to get information about the world outside of mine, but it is also getting hard for me to see the whole picture of this world in the digital. I have to be skeptical about what I see.

 

References

 

 

big data purpose

Annaliese Blank

When I think about big data, the first thing that comes to mind would be the internet, or some form of a collection of user data from a global perspective that is intertwined with each other that gets recorded as one big unit of data exchange. In the Ubiquity piece, they explain the terminology and a brief history of this term. They say, “: the expansion of the internet into billions of computing devices, and the digitization of almost everything. The internet gives us access to vast amounts of data. Digitization creates digital representations for many things once thought to be beyond the reach of computing technology. The result is an explosion of innovation of network-based big data applications and the automation of cognitive tasks. This revolution is introducing what Brynjolfsson and McAfee call the “Second Machine Age.” This symposium will examine this revolution from a number of angles.” (ACM Ubiquity, pg.1).

Within the past twenty years and more, none of this was possible until recently. Internet expansion has opened up doors of opportunity for the future of big data. This is extremely important because this transition into the tech era requires the tools and the components to make universal connectivity possible. The transfer of data packets to hold vast amounts of information and code and have it be sent wirelessly and instantaneously would be another great description to big data and its ideal purpose. The big takeaway here would be this is the revolution in the power of digitization.

This revolution is the foundation to the new functions and operations for society, politics, education, policy, government, and science. For digital data and data science, not only does big data capabilities allow computing power able to handle volumes of data, but for data science and education, this aids the process of “data analysis, research, manual and automated search capabilities, and machine learning functions and modeling” (Ubiquity Big data, pg. 1). This is changing the way in which we learn information, search and record data, send data, analyze data, and compute and translate data for everyday or personal use. Big data has changed the world.

I wanted to pull in another outside source after this. The company and brand Statistical Analysis System, SAS, empowers the function and support to big data analysis. According to SAS, they define big data as, “Big data analytics examines large amounts of data to uncover hidden patterns, correlations and other insights. With today’s technology, it’s possible to analyze your data and get answers from it almost immediately – an effort that’s slower and less efficient with more traditional business intelligence solutions” (SAS, Big data, pg.1). The main benefits to big data are speed, efficiency, and innovation. It has influenced the business world in terms of business communications and analytics that provides the efficiency tools and proficient environment for advanced stability and recognition. The concept of big data allows the “competitive edge” that big companies need (SAS, big data, pg.1). According to SAS, the importance to big data lies in: “cost reduction, faster better decision making, and new products and services” (SAS, big data, pg. 1-2). The power of big data is taking the world by storm and will be unstoppable with continuous efforts and changes to its mechanics and process functions.

 

https://ubiquity.acm.org/article.cfm?id=3158335

https://ubiquity.acm.org/article.cfm?id=3158352

https://www.sas.com/en_us/insights/analytics/big-data-analytics.html

Jeffrey Johnson, Peter Denning, et al., Big Data, Digitization, and Social Change (Opening Statement), Ubiquity 2017, (December 2017).

From Paper Records to Big Data: How digitization changed the way we talk about data

When talking about “Big Data” as Irvine explains, what we really mean by that term is the massive amounts of data generated from multiple sources (human and human-designed computational agents) and stored in massive arrays of memory accessible to software processes at other levels in the whole system.  But why is this term such a big deal and overly used now days?

It all starts with the explosion in the amount of data we have generated since using the affordances that come with the age of the internet and the concept of digitization. This is largely due to the rise of computers, the Internet and technology capable of capturing data from the world we live in. Data in itself isn’t a new invention. Going back even before computers and databases, we had paper transaction records, customer records and archive files – all of which are data. Computers, and particularly spreadsheets and databases, gave us a way to store and manage data on a large scale, in an easily accessible way. Suddenly, information was available just with a click.

Every generation of computers since the 1950s has been confronted with problems where data was way too large for the memory and processing power available (Ubiquity Symposium: Big data: big data, digitization, and social change). So what is different about big data today?

Today, every two days we create as much data as we did from the beginning of time until 2000. And the amount of data we’re creating continues to increase rapidly; by 2020, the amount of digital information available will have grown from around 5 zettabytes today to 50 zettabytes. (Marr, 2019)

Nowadays, almost every action we take leaves a digital footprint. We generate data whenever we browse online, when we carry our GPS-equipped smartphones, when we communicate with our friends through social media, and when we shop. So in a way, we leave a digital trail every time we are connected to the internet. On top of this, the amount of machine-generated data is rapidly growing too. Data is generated and shared when our “smart” home devices communicate with each other or with their home servers.

Therefore, the term “Big Data” refers to the collection of all this data and our ability to use it to our advantage across a wide range of fields.

How does big data work?

The more data we have on a specific topic or problem that we want to solve or improve, the more we can make accurate predictions about it. This is done by comparing data points and looking at the relationships and patterns in the data that we have. This is done through a process that involves building models, based on the data we can collect, and then running simulations, tweaking the value of data points each time and monitoring how it impacts our results. This process is automated – today’s advanced analytics technology will run millions of these simulations, tweaking all the possible variables until it finds a pattern – or an insight – that helps solve the problem that we’re working on.

Big Data Concerns

Big Data gives us insights and answers on the problems that we’re trying to solve, but it also raises concerns and questions that must be addressed:

Data privacy – The Big Data we now generate contains a lot of information about our personal lives, much of which we have a right to keep private. Increasingly, we are asked to strike a balance between the amount of personal data we divulge, and the convenience that Big Data-powered apps and services offer.
Data security – Even if we decide we are happy for someone to have our data for a particular purpose, can we trust them to keep it safe?
Data discrimination – When everything is known, will it become acceptable to discriminate against people based on data we have on their lives? We already use credit scoring to decide who can borrow money, and insurance is heavily data-driven. We can expect to be analysed and assessed in greater detail, and care must be taken that this isn’t done in a way that contributes to making life more difficult for those who already have fewer resources and access to information.

These are important issues and concerns that big corporations that have access to large amount of personal data need to address.

In conclusion, the amount of data available will only keep increasing, and therefore we’ll have more advancements in the fields of analytics and data science. We’ll become more advanced in studying data and finding patterns and answers to different problems across fields. We also need to keep in mind and be aware of the issues that come with using so much data, and we need to be better at dealing with these concerns.

References:

“Big Data”: ACM Ubiquity Symposium (2018): Jeffrey Johnson, Peter Denning, et al., Big Data, Digitization, and Social Change (Opening Statement), Ubiquity 2017, (December 2017).

Bernad Marr. What is Big Bata. https://www.bernardmarr.com/default.asp?contentID=766

Rob Kitchin, The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. London; Thousand Oaks, CA: SAGE Publications, 2014.

 

Big Data in Education Industry and Its Challenges

The revolution is happening at the convergence of two trends: the expansion of the internet into billions of computing devices, and the digitization of almost everything. (Big Data, Digitization, and Social Change) Big data means not only large volume amount and high speed data exchange. The most important characteristic of big data is the connection and digitization of everything. There is no doubt digital data have enormous latent value. We can take an example of big data in education industry.

Online education Industry is flooding with a huge amount of data related to students, faculties, courses, results and what not. It was not long before we realized that the proper study and analysis of this data can provide insights that can be used to improve the operational effectiveness and working of educational institutes. For example, MOOC platforms now collect and analyze every keystroke and gesture of every student, enabling the system to adjust its pace and style to individual learners. (Big Data, Digitization, and Social Change) Customized programs and schemes for each individual can be created using the data collected on the bases of a student’s learning history to benefit all students. This improves the overall student results. An increasing number of companies now confirming the online courses’ certificates. Maybe in one day, online course can take place of traditional courses.

Now, it is the age of attention economy. There are too much information and news online trying to catch us attention. Business needs to learn about what their target users think and want to provide the most attractive information. The best way to learn their users is to get insight from their digital data.

Although digital data has huge latent value, extracting that value is becoming increasingly difficult. (Big Data: Big Data or Big Brother? That Is the Question Now) The process includes finding relevant data, data preparation, data cleaning, data analysis, and data visualization. Data Scientists need not only hard computing knowledge but also soft presentation and communicating skills.

Big data is a two-edged weapon, serving crime and terror with the same indifference that it serves democracy and freedom. (Big Data, Digitization, and Social Change) Therefore, we need to take care of big data usage to avoid possible misuses and manipulations.

references:

Jeffrey   Johnson, Peter Denning, et al., Big Data, Digitization, and Social  Change (Opening Statement), Ubiquity 2017, (December 2017).

Bernardo A. Huberman, “Big Data and the Attention Economy: Big Data,” Ubiquity 2017, (December 2017): 2:1–2:7.

Jeffrey Johnson, Peter Denning, et al., “Big Data: Big Data or Big Brother? That Is the Question Now (Concluding Statement),” Ubiquity 2018, no. August (August 2018): 2:1–2:10.

Gnovicon Response

Last week, I attended Gnovis’ academic conference Gnovicon -which focused on the topics of Big Tech, Data, and Democracy. 

Keynote speaker Siva Vaidhyanathan brought forward some important points on the future of democracy. He points out that Facebook was designed to share mass amounts of data – that it’s not a mistake but was an intentional design decision. We talk about this specific topic a lot in CCT – the responsibility of tech companies in how their products and services are designed. Vaidhyanathan draws attention to the fact that in the design process – Facebook developers did not consider what communication power would give people who will intentionally use it for harm. The algorithm choice – is just that – a choice. The FB algorithm rewards reactions and engagement, which keeps the most reactive news at the top of your timeline. This means spreading and circulating negative messages such as hate messages, conspiracy theories, and indignation. 

On another note, advertisement as a political tool works well in that you can pick exactly who sees which message and run different versions of an ad to see which one does better. For these reasons, politicians are moving their campaigns from TV to Facebook. 

The big takeaway from the keynote speech was that social media is not the root of the problem. Facebook is not where we should be focusing our attentions in order to make change. According to Vaidhyanathan (and the CCT community) – the real problem is that we are over-stimulated and distracted by constant noise. In the current climate people are spending time denying serious issues instead of working together to find solutions. He claims that social media and smartphones are habituating us into immediacy, and we are just reacting to things rather than making conscious decisions and long term plans to address large issues. 

After his speech, Emily Varga offered a solution to the large amounts of misinformation online. She said education can be a big factor in helping the situation by informing people on how to distinguish between good and bad information. Additionally Sally Hubbard emphasized the importance of competition in regulating FB – there need to be other options so that there is pressure to do better than the current algorithm. Without any competition to regulate FB, we will continue to see the effects of the current algorithm choices. This leads back to our discussion in class about how algorithms are a business choice and should be recognized as such. The overall message being that we should be looking to implement design choices that are concerned with the ethical aspects and consequences. 

Challenges of Interpreting Big Data

We live in a highly digitalized world that requires constant interaction with technology, creating massive amounts of data related to many aspects of society, from human behavior to the human body, among others. Some data is being collected on purpose with or without consent, and some is a result of digital interaction. The challenge is not necessarily about the size of the data sets but about how to process and interpret them to make sense of the world. To design and represent systems that can put process the data in spite of its big volume, the velocity to which it is acquired, the variety of data/information/meaning, and the veracity of the data in relation to how accurately does it represent the real world (Denning and Martell, 2015).

Johnson et al. challenge the accessibility of data on the internet, presenting various cases with different levels of access to the public that called my attention, “In principle, this means that huge amounts of data are available to anyone in the world over the internet. In practice much data are private and not available” (Johnson et al, 2018). I started to wonder about personal experiences in which my data was collected with or without my consent, and for what purposes.

I deleted my original Facebook years ago and just recently opened a new one as a way to stay connected with my graduate student peers and stay on top of events in my city. I rarely interact directly (like, share, comment, etc) but my less direct interaction (watching a video, expanding an article, etc.) is still collected and somehow interpreted with the interactions of my friends in order to make sense of who I am and what are my interests to provide attention-grabbing content on my page.

In the security page of your Facebook you can see the patters/labels/categories Facebook has placed you in and you can also see the interests associated with you that inform targeted adds into your page. When I checked mine I was surprised by the stark contradictions of the categories I was put in. Somehow Facebook had labelled me as both an “Venezuelan ex-pat” and a “new American”, whatever that means. It placed me as “extreme liberal” but also “extreme anti communism/socialism”. No wonder I was being shown ads for guns and to sign up to the NRA while also getting articles anti “the Israeli agenda”. My data and interactions are being interpreted in a binary way “if she’s anti-socialism she must like guns” “if she’s liberal she must be pro-Palestine”.

You can view, organize, add, delete, modify these categories to better fit your needs and interests. You can also report/hide adds and articles that you don’t want to see. I’ve done both to no change on my timeline. I’ve deleted all categories and associated interests by interaction, and also manually reported NRA adds in vain.

What is interesting to me is that clearly the large sets of data are being interpreted in a way that is not veridic with my experience. However, the data that I am willingly providing is not being taken into account in that process. To try to unpack all the issues behind that fact will require a lot more space than what I have in this post.

Johnson et al (2018) raise concerns regarding the over optimistic approach to big data while dismissing its challenges and misuse, “One can easily imagine what would happen if medical, financial, and behavioral data fused for the targeted individual fell into hands of bankers, insurers, politicians, or criminals. Mayhem would follow, no doubt.” (Johnson et al, 2018)

I have another personal example to illustrate this. A few years ago I randomly started receiving packages, directly addressed to me by name and address, containing information and free samples of products related to motherhood and babies: boxes of baby formula, pacifiers, pads, among others. This was a surprise to me since a) I’m not a mother, b) I was not planning to be at the time, c) no one in my household was pregnant, d) there were no babies in my house. The fact that every correspondence was addressed to me showcased that this was not a mistake.

I couldn’t figure out how this company had gotten hold of my address or why they thought I was a target for these products. When I visited the website, the only contact available was filling a form (and never got a reply from them). I received another package on my birthday with a letter saying something along the lines of “another year, it’s time to start planning on expanding your family” while also addressing the efficiency of birth control methods.

After much thinking, I remembered taking a survey related to birth control practices and medical conditions around chronic ovary illness and reproduction. After much scrolling down on my email, I found the survey link and realized that it was hosted by Amazon MTurk (although I didn’t take the survey on the MTurk website). I figured that’s how they got my address, birthday, age and other extremely detail information about me. It seems Amazon knows my habits very well, it knows my medical conditions and thinks it’s time for me to have a baby. 

I wish I had the time to unpack how the processing of all these sets of data about me determined that as a woman of my age I must absolutely either have babies or be thinking about having babies. Maybe I’ll save it for a final project.

References: