In our daily life, media often advertises big data as collecting and using a lot of data to get the objective and correct answers. In other words, in the frame of media, the big data technology is described as an ideal input-output black box. Big data here is huge in volume, including the long tail (means cover all the things including minority) and objective (unconscious), so it seems reasonable that the results got from the big data is objective and correct.
In some ways, it makes sense. Take the recommendation system of music app as an example. First, recognize the patterns (regularities) in all the music data by a machine and classify them into different type based on the patterns. Then consider users’ behavior as data and get the behavior patterns. Map the user pattern and the music pattern and constantly adjust the output results (the music recommended) in real time to get better feedback (user clicks the like or favorite button or downloads the music). It’s a use of big data and machine learning and the big data here is fit to the definition of Kitchin: “huge in volume”, all the music and users behaviors here are the data; “high in velocity, being created in or near real-time”, every time the user clicks or does not click, it is generated in real time and will be returned as new data; “fine-grained in resolution and uniquely indexical in identification”, the system is based on each user’s behavior to recommend music and adjust the results, it can be said that each user’s recommendation system is unique; “relational in nature, containing common fields that enable the conjoining of different data sets”, data of the user behaviors, music, etc. are gathered together for real-time analysis; “flexible, holding the traits of extensionality (can add new fields easily) and scalability (can expand in size rapidly)”, the data use for analysis is allowed to add a new user, a new music or a new variable of user behavior.
Big data in the media perspective is actually an empiricist epistemology of big data. But actually, the results of the big data are not that objective and correct like the media describes, since the process of big data collection and analysis contains the participation of human, like the choices of algorithm and models. This is what Kitchin said” data are created within a complex assemblage that actively shapes its constitution”. In fact, the recommendation systems of different music apps are different and the same users will get the different recommendation results when use different music apps, which can prove that the big data technology is not so that objective and correct. (If so, the results should be the same.) What’s more, the big data only give the correlation and insights in the data but cannot explain why. But for the business view, it is useful enough. The operators do not need to know why the user like song A will also like song B, all they need to know is that there is positive relationship between song A and B. Therefore, it is reasonable for media to use the empiricist view of big data, since it can simplify the epistemology, easy for ordinary customers to accept and persuade to buy or use the service.
But it is in different situation when comes to science. The simple input-output model and empiricist epistemology cannot meet the need of science research. Take machine learning of good selfie as an example. The big data and machine learning can return a dataset of good selfies but it does not explain why they are good. The output only shows the phenomenon, or it shows a surface correlation between a good selfie and selfie patterns. It is a result of abduction, which means the machine give the best result in a specific scenario. But for the science, especially the humanities, it is not enough to only get a pattern snapshot. The important thing should be how to explain the correlation or why the machine return this result. Form this point of view, like Kitchin said,” the pattern is not the end-point but rather a starting point for additional analysis” (Kitchin, n.d.). The big data and machine learning gives a new method to find the phenomenon and then science research will do additional deduction or induction work to explain it.
Johnson, J., Denning, P., Delic, K. A., & Sousa-Rodrigues, D. (2018). Big Data or Big Brother? That is the question now. Big Data, 10.
Johnson, J., Denning, P., Sousa-Rodrigues, D., & Delic, K. A. (2017). Big Data, Digitization, and Social Change. Big Data, 8.
Kitchin, R. (n.d.). Big Data, new epistemologies and paradigm shifts. Big Data, 12.