Avoid the Fallacies of Empiricism

With we entering the internet era, Big Data, accordingly, comes out. “The impact of digital data on society is very great and increasing. Social networks and big data determine what is noticed and acted upon” (Johnson et al., 2018). How to use it? How to correctly use it? How to deal with the consequences of it? We need to consider more and more things, especially we increasingly stress it in the domains of AI and ML. How to solve the challenges like capturing data, data storage, data analysis, etc., when we are using Big Data to train our AI/ML algorithms (Wikipedia)? On the way to de-blackbox, those technical issues should not be ignored, but this week, I want to put the emphasis on the attitudes towards it and the ethical problems.

Big data is not a simple term that we can understand as its literal meaning. “Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software” (Wikipedia). In Kitchin’s book, he gives readers a better understanding of data’s meaning by building Knowledge pyramid and classifications standards. He also points out that “how data are ontologically defined and delimited is not a neutral, technical process, but a normative, political, and ethical one that is often contested and has consequences for subsequent analysis, interpretation, and action” (Kitchin, 2014).  The point can also relate to views proposed by Cathy O’Neil, which are, in commonsense, predictive but too negative. She believes that the data is not objective and accountable, and the ‘standards of success’ are full of bias. She strongly suggests issuing laws, proposing independent audits, and building moral norms for practitioners (O’Neil, 2016). Her attitude to big data is overly pessimistic, with many negative examples. Data and algorithms are all tools for human beings. It is us that to determine how to use it. We should develop the ability to access and analyze the data instead of attributing the problems to the data itself. Like gender and ethical issues, the examples she gives should be attributed to human society rather than tracks/data we made in our activities. What she advocates could be seen in the data infrastructure, which contains social and cognitive structures. “Data infrastructures host and link databases into a more complex sociotechnical structure” (Kitchin, 2014). It is complicated and hard to achieve, but I believe it’s the ultimate way to solve all the unfair and biased cases appearing in O’Neil’s book. We need to overcome the fallacies of empiricism to speed up to complete the industry’s criteria. And for the ethical and moral part, the point is the same. It is the person that decides what to do, not the algorithms. There definitely will be lots of negative and harmful cases during the development process, but as time goes by, I believe we could build a reasonable system for Big data and AI/ML, exactly like how we create social systems since ancient times.


Johnson, J., Denning, P., Delic, K. A., & Sousa-Rodrigues, D. (2018). Ubiquity Symposium: Big data: Big data or big brother? That is the question now. Ubiquity, Volume 2018(Number August (2018)), Pages 1-10. https://doi.org/10.1145/3158352

Kitchin, R. (2014). The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. Sage Publications.

O’Neil, C. (2016). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown Publishers.

“Big data.” In Wikipedia, April 10, 2021. https://en.wikipedia.org/wiki/Big_data.