From Paper Records to Big Data: How digitization changed the way we talk about data

When talking about “Big Data” as Irvine explains, what we really mean by that term is the massive amounts of data generated from multiple sources (human and human-designed computational agents) and stored in massive arrays of memory accessible to software processes at other levels in the whole system.  But why is this term such a big deal and overly used now days?

It all starts with the explosion in the amount of data we have generated since using the affordances that come with the age of the internet and the concept of digitization. This is largely due to the rise of computers, the Internet and technology capable of capturing data from the world we live in. Data in itself isn’t a new invention. Going back even before computers and databases, we had paper transaction records, customer records and archive files – all of which are data. Computers, and particularly spreadsheets and databases, gave us a way to store and manage data on a large scale, in an easily accessible way. Suddenly, information was available just with a click.

Every generation of computers since the 1950s has been confronted with problems where data was way too large for the memory and processing power available (Ubiquity Symposium: Big data: big data, digitization, and social change). So what is different about big data today?

Today, every two days we create as much data as we did from the beginning of time until 2000. And the amount of data we’re creating continues to increase rapidly; by 2020, the amount of digital information available will have grown from around 5 zettabytes today to 50 zettabytes. (Marr, 2019)

Nowadays, almost every action we take leaves a digital footprint. We generate data whenever we browse online, when we carry our GPS-equipped smartphones, when we communicate with our friends through social media, and when we shop. So in a way, we leave a digital trail every time we are connected to the internet. On top of this, the amount of machine-generated data is rapidly growing too. Data is generated and shared when our “smart” home devices communicate with each other or with their home servers.

Therefore, the term “Big Data” refers to the collection of all this data and our ability to use it to our advantage across a wide range of fields.

How does big data work?

The more data we have on a specific topic or problem that we want to solve or improve, the more we can make accurate predictions about it. This is done by comparing data points and looking at the relationships and patterns in the data that we have. This is done through a process that involves building models, based on the data we can collect, and then running simulations, tweaking the value of data points each time and monitoring how it impacts our results. This process is automated – today’s advanced analytics technology will run millions of these simulations, tweaking all the possible variables until it finds a pattern – or an insight – that helps solve the problem that we’re working on.

Big Data Concerns

Big Data gives us insights and answers on the problems that we’re trying to solve, but it also raises concerns and questions that must be addressed:

Data privacy – The Big Data we now generate contains a lot of information about our personal lives, much of which we have a right to keep private. Increasingly, we are asked to strike a balance between the amount of personal data we divulge, and the convenience that Big Data-powered apps and services offer.
Data security – Even if we decide we are happy for someone to have our data for a particular purpose, can we trust them to keep it safe?
Data discrimination – When everything is known, will it become acceptable to discriminate against people based on data we have on their lives? We already use credit scoring to decide who can borrow money, and insurance is heavily data-driven. We can expect to be analysed and assessed in greater detail, and care must be taken that this isn’t done in a way that contributes to making life more difficult for those who already have fewer resources and access to information.

These are important issues and concerns that big corporations that have access to large amount of personal data need to address.

In conclusion, the amount of data available will only keep increasing, and therefore we’ll have more advancements in the fields of analytics and data science. We’ll become more advanced in studying data and finding patterns and answers to different problems across fields. We also need to keep in mind and be aware of the issues that come with using so much data, and we need to be better at dealing with these concerns.

References:

“Big Data”: ACM Ubiquity Symposium (2018): Jeffrey Johnson, Peter Denning, et al., Big Data, Digitization, and Social Change (Opening Statement), Ubiquity 2017, (December 2017).

Bernad Marr. What is Big Bata. https://www.bernardmarr.com/default.asp?contentID=766

Rob Kitchin, The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. London; Thousand Oaks, CA: SAGE Publications, 2014.