Algorithms and machine learning models continue to proliferate through the many contexts of our lives. This makes understanding how these models work and why these models can be biased is critically important to navigating the current information age. In this paper, I explain how data is created, how models are run, and when and where biases are introduced to the model. Though this is a survey of the types of machine learning models out there, I follow the example of how a machine learning model evaluating resumes handles the data, and how difficult it is at times to come to an objective outcome.
Algorithms are everywhere, in our phones, in our homes, and the systems that run our everyday life. Algorithms are the reason why our digital maps know the best routes, and how Amazon always seems to know what other items we might like. These algorithms have the ability to make life easier for many people around the world but what is thought of usually as algorithms is the partnership between algorithms and machine learning. An algorithm is a formula for making decisions and determinations about something whereas machine learning is a technique we use to create these algorithms (Denning & Martell, 2015). These machine learning models can be complex, containing hundreds of features, and millions of rows to come to results that are highly consistent and accurate. They also have a wide variety of uses from determining insurance premiums (O’Neil, 2016a), writing articles (Marr, 2020), improving the administration of medicine (Ackerman, 2021; Ngiam & Khor, 2019) and setting bail (Kleinberg, Lakkaraju, Leskovec, Ludwig, & Mullainathan, 2018). Though there are a lot of benefits to using algorithms and machine learning, there are times when they cause more harm than good, such as when algorithms give men higher credit limits than women (Condliffe, 2019), bias hiring practices, and recruitment (Hanrahan, 2020), and fail to identify black faces (Simonite, 2019). The reason why this is so important is as these systems continue to expand into new fields, individuals rely on the judgments of these algorithms to make important decisions (even sometimes more than they rely on the judgment of others) (Dietvorst, Simmons, & Massey, 2015; Logg, Minson, & Moore, 2019). These important decisions made by an algorithm can become biased as the algorithm does not eliminate systemic bias but instead multiplies it (O’Neil, 2016b). To understand why some models work better and others, and where systemic bias comes from I will de-black box data, algorithms, and machine learning models using a potential resume sorter as an example.
We come to live in a period of history that some refer to as the information age, as data becomes one of the most valuable commodities to have and own (Birkinshaw, 2014). Data alone is not useful but data in context and relationship to other information is why companies spend millions of dollars a year to harvest information from as many sources as possible. A clear example is in explaining how labeling different data can data alter our perceptions of the magnitude of a specific number. Take the number 14, if I were to label it as 14 days versus 14 years, 14 days will then seem negligible compared to 14 years but I were to then add another piece of information such as 14 days as President versus 14 years in an entry-level position the 14 days will then carry more weight than 14 years. This is how data works, by quantifying existing information in ways in which we can then analyze the differences in the number 14 in the various contexts in which it exists.
The first part of the data process is quantifying the phenomenon of interest. Using the example of a job application, one of the properties that need to be quantified properly is years of experience. Just as with the previous example number magnitudes, not all experience is weighed the same so different features or variables need to be created to differentiate these types of experiences. This could be done by categorizing the type of prior work as entry-level or managerial, or degree of relevance to the position being offered. As with all translations, there is something lost when attempting to change from one language to another. How would one categorize freelance experience or unpaid work? These examples highlight how even when capturing what appears to be the correct objective information through the process of quantifying may end up taking highly complex information and flattening thoughts, expertise, and experiences, ultimately biasing the outcome (O’Neil, 2016b). This standardizes information into a format that is understandable to a computer but may not accurately represent the reality from which the information is derived. This is why companies and researchers attempt to collect many different types of information as it gives a well-rounded context to the data and allows for a fuller picture. A good example of these types of profiles is checking what your Google (https://www.businessinsider.com/what-does-google-know-about-me-search-history-delete-2019-10?op=1)and Facebook (https://www.cnbc.com/2017/11/17/how-to-find-out-what-facebook-knows-about-me.html) add profiles have to say about you (Haselton, 2017; Holmes, 2020).
The information a company has on an individual may not be explicitly given but rather inferred by the other pieces of information especially when the target data is unavailable. This can be done using by making two assumptions. First, that information does not exist in isolation. Second, that relationships between variables occur in a systematic way. Both of these points will be addressed more specifically in the next section about modeling. This is to say, that the more information received, the better the inference we can make about certain qualities of the person. Going back to the example of a job application, if someone reports working for a specific company, say Netflix, knowing other employees who have come from Netflix will allow individuals to make inferences about the work ethic of the applicant. People do this all the time when taking suggestions from their friends on items to buy or places to eat. Though these inferences may be faulty, especially considering people’s tastes differ depending on the type of food, in collecting more information, people can make better judgments based on the information available.
There are major issues though when it comes to data and data relationships.
Second, previous biases may be reflected in the data, which the data does not account for. This has become extremely salient in the past few years with the rise of the Black Lives Matter movement bringing to light the systemic issues when it comes to understanding outcomes in relation to race. An example from a paper by Obermeyer and colleagues (2019) showed that black patients’ health risk scores are chronically sicker than their white counterparts at the same score. This is because black patients generally spend less money on medicine and because health care costs are used as a measurement of the level of sickness black patients are rated as being healthier. This though doesn’t reflect the truth about the severity of the illnesses black and white individuals may face, but more the cultural differences in seeking standard health care. It’s important that when collecting data that the data represents what you believe it represents and that a more holistic picture understood, especially before embarking on creating your model.
“All models are wrong, but some are useful” – George Box
Modeling is how we turn data into algorithms. Each piece of data gives us important sets of information in context but it’s how these data may interact which allows us to make predictions, inferences, and act upon it. It’s important to note that models initially that models are agnostic to what the data is or represents, the chief concern of a model is the potential relationship that data may have with a specified result. Modeling makes the assumption that data varies in a systematic way, meaning that there is a discernable pattern that can be used to predict certain outcomes. This assumption means that data cannot occur in isolation and that there are relationships between phenomena that can explain why something is the way it is. The distinction between these two things leads to the initial distinction between the type of models used, predictive more inferential.
Prediction Versus Inference
Predictive models care about one thing, being accurate. This may mean that a model might find a connection between applicants with the letter Z in their name and potential hire-ability. Though this may seem to be a silly example, it does correctly illustrate the point that these types of models only worry about the potential outcome and maximizing the predictability of these outcomes. There are many benefits to this, as people may not mind how a computer is generating an outcome for a cancer screening, only that the screening is accurate.
Inference on the other hand concentrates on the relationship between the variables. An example of this would be how much does five years of field experience matter when compared to a college degree in the subject. This type of modeling attempts to discern meaningful connections using semantically meaningful data. This would be more useful in the instances when attempting to find the cause of a particular outcome and understanding how one thing may relate to another.
Most modeling you find in business are predictive models, whereas in academia and policy inferential models are much more important. The type of model you decided on will ultimately impact the outcome you arrive at.
Types of Modeling
Modeling is the attempt to create an algorithm which can accurately predict a certain phenomenon. Each of these models then is competing with whether it is able to decern outcomes better than chance, using various different methods to achieve this. Most modeling in computer science involves taking part of the data to create the tests and another part of the data to Modeling can be broken down into two broad categories, classification and regression.
Classification, also referred to as discriminant analysis, attempts to create boundaries around data in an effort to correctly sort data into predefined categories (Denning & Martell, 2015). Most of these techniques are also categorized as non-parametric, meaning that the model does not make assumptions about the structure of the data before attempting to sort the data into groups. There are a few different classification techniques but one that most easily understood and widely used is a decision tree. Decision trees are essentially a sequence of logical statements attempting to create rules around a certain phenomenon. Things like, does the oven feel hot? If ‘Yes’ then the oven might be on. Though models get more complex than this, and the number of rules may increase the goal says the same, how to most accurately sort data into categories using these rules. The program attempts to create a model that is able to increase accuracy while reducing error as much as possible. This may be more or less possible depending on the outcome of the data, and the divisions created may not be meaningful in any way.
The other type of modeling is regression modeling, or linear modeling. This type of modeling makes assumptions about the data’s structure which is why it’s referred to as parametric modeling (Alpaydin, 2016). These assumptions state that the data is should be represented in a normal distribution, and that the data varies in a systematic linear way.
Though there is are whole courses devoted to regression, what regression is attempting to do is take the variation within one variable to explain some of the variations in another. How much can I explain the increased desire to have ice cream with the increase in summer temperatures? In the example of the job applicant, the greater the years of experience may make a more capable candidate. The issue with this obviously is that it relies primarily on the data being linear, and presupposing that a candidate who may have 20 years of experience is always better than a candidate who has 10 years of experience. There are ways around this assumption but most modeling done using these techniques do no account for them. In regression technically the more items used the better we are able to predict the outcome, though this doesn’t mean each variable is contributing a significant amount. Different weights are then placed on different variables which make up these regression formulas, indicated that certain variables contribute more to finding the results then others. Statistical weights are represented by a number that you multiply with the observed variable (e.g. years of experience) in an attempt to create a formula. Each weight then represents how much of a certain value contributes to finding the predicted outcome (e.g. hire-ability). There a couple ways to do regression, using a frequentist approach or a Bayesian approach, regardless of what you are attempting to do is explain the variation in the target variable.
The last type of modeling I want to discuss are Neural Networks. These are a bit complicated so I included a video explaining what these are in greater detail. To summarize, neural networks are a series of interconnected nodes which adjust the statistical weights of the connecting variables/nodes in an attempt to find the best possible configuration to predict the outcome. These statistical weights start in arbitrary but adjust through iterations to create the best model possible. This type of model is being utilized to create complex networks and formulas to predict things such as heart disease (Reuter, 2021). The unfortunate part of Neural Networks are nodes refered to as hidden layers, which processes which occur behind the scenes and makes Neural Networks difficult to interpret beyond the outcome predictor.
All of these different types of modeling are ultimately tools to understand the relationships within the data. This brings us back to the concept of “Garbage in, garbage out”. The most important part of the model is the information that is being used to create it, without good information, we can’t get a useful model.
There many different techniques which are utilized useful algorithms using machine learning. As previously stated, the data we feed into these models will impact the eventual outcome. This is why it is so important to understand what we are attempting to predict and to control for it. Coming back again to the resume sorter to explain why getting data right is so important. In 2016 it found that those who had ethnic-sounding names were not called back at the same rate as those who had whitened their name, even with equivalent resumes (Kang, DeCelles, Tilcsik, & Jun, 2016). When creating these algorithms with machine learning, the data we provide the model will indicate the result we ultimately achieve. If certain types of people or certain types of experiences are not accounted for in the data or misrepresented then those biased are amplified through the process of machine learning. If an applicant for instance takes an opportunity that is hard to define, those opportunities may become disadvantageous when looking for a job in the future. If an applicant does conform to the standard array of qualities, then that applicant may be rejected for being different rather than not right for the position. Though all models may be wrong, models used incorrectly may cause more harm than good. This is to say the solution may be to create the world we want to see rather than the world which we have. By creating data and training machine learning models on these ideals, algorithms will be created to reflect what we want, rather than amplify what already is. To get there we have to understand the relationships between certain factors already existing in our data, such as how to we weight certain experiences, or achievements.
Algorithms have revolutionized the world for the better and will continue to do so as data becomes more abundant and machine learning models become more complex. Understanding how data is used, and why things are the way they are gives us healthly skepticism for when the next time an algorithm tells us what to do.
Ackerman, D. (2021). System detects errors when medication is self- administered. MIT News, pp. 2–5. Retrieved from https://news.mit.edu/2021/inhalers-insulin-errors-medication-0318
Alpaydin, E. (2016). Machine Learning. Cambridge, Massachusetts: The MIT Press.
Birkinshaw, J. (2014). Beyond the Information Age. Wired, 8. Retrieved from https://www.wired.com/insights/2014/06/beyond-information-age/
Condliffe, J. (2019). The Week in Tech: Algorithmic Bias Is Bad. Uncovering It Is Good. The New York Times, 13–15. Retrieved from https://www.nytimes.com/2019/11/15/technology/algorithmic-ai-bias.html
Denning, P. J., & Martell, C. H. (2015). Great Principles Of Computing. The MIT Press. Cambridge, Massachusetts: The MIT Press.
Dietvorst, B. J., Simmons, J. P., & Massey, C. (2015). Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1), 114–126. https://doi.org/10.1037/xge0000033
Hanrahan, C. (2020). Job recruitment algorithms can amplify unconscious bias favouring men , new research finds Women ’ s CVs are ranked lower than men ’ s CVs across each job type. ABC News, 10–13. Retrieved from https://www.abc.net.au/news/2020-12-02/job-recruitment-algorithms-can-have-bias-against-women/12938870
Haselton, T. (2017). How to find out what Facebook knows about you. CNBC, pp. 1–12. Retrieved from https://www.cnbc.com/2017/11/17/how-to-find-out-what-facebook-knows-about-me.html
Holmes, A. (2020). Clicking this link lets you see what Google thinks it knows about you based on your search history — and some of its predictions are eerily accurate. Buisness Insider, 1–8. Retrieved from https://www.businessinsider.com/what-does-google-know-about-me-search-history-delete-2019-10?op=1
Kang, S. K., DeCelles, K. A., Tilcsik, A., & Jun, S. (2016). Whitened Résumés: Race and Self-Presentation in the Labor Market. Administrative Science Quarterly, 61(3), 469–502. https://doi.org/10.1177/0001839216639577
Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J., & Mullainathan, S. (2018). Human Decisions and Machine Predicitons. Quarerly Jorunal of Economics, 237–293. https://doi.org/10.1093/qje/qjx032.Advance
Logg, J. M., Minson, J. A., & Moore, D. A. (2019). Algorithm appreciation: People prefer algorithmic to human judgment. Organizational Behavior and Human Decision Processes, 151(December 2018), 90–103. https://doi.org/10.1016/j.obhdp.2018.12.005
Marr, B. (2020). What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence? Forbes, 2–8. Retrieved from https://www.forbes.com/sites/bernardmarr/2020/10/05/what-is-gpt-3-and-why-is-it-revolutionizing-artificial-intelligence/?sh=2b41e039481a
Ngiam, K. Y., & Khor, I. W. (2019). Big data and machine learning algorithms for health-care delivery. The Lancet Oncology, 20(5), e262–e273. https://doi.org/10.1016/S1470-2045(19)30149-4
O’Neil, C. (2016a). How algorithms rule our working lives. The Guardian, pp. 1–7. Retrieved from https://www.theguardian.com/science/2016/sep/01/how-algorithms-rule-our-working-lives
O’Neil, C. (2016b). Weapons of Math Destruction (Vol. 78). New York, New York: Crown. https://doi.org/10.5860/crl.78.3.403
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453. https://doi.org/10.1126/science.aax2342
Reuter, E. (2021). Mayo Clinic finds algorithm helped clinicians detect heart disease , as part of broader AI diagnostics push. Med City News, 1–5. Retrieved from https://medcitynews.com/2021/05/mayo-clinic-finds-algorithm-helped-clinicians-detect-heart-disease-as-part-of-broader-ai-diagnostics-push/
Simonite, T. (2019). The Best Algorithms Still Struggle to Recognize Black Faces | WIRED. Wired, 1–7. Retrieved from https://www.wired.com/story/best-algorithms-struggle-recognize-black-faces-equally/