Author Archives: Matthew R Leitao

A Survey of Data, Algorithms, and Machine Learning and their roll in Bias

Matthew Leitao

Abstract

Algorithms and machine learning models continue to proliferate through the many contexts of our lives. This makes understanding how these models work and why these models can be biased is critically important to navigating the current information age. In this paper, I explain how data is created, how models are run, and when and where biases are introduced to the model. Though this is a survey of the types of machine learning models out there, I follow the example of how a machine learning model evaluating resumes handles the data, and how difficult it is at times to come to an objective outcome.

Introduction

Algorithms are everywhere, in our phones, in our homes, and the systems that run our everyday life. Algorithms are the reason why our digital maps know the best routes, and how Amazon always seems to know what other items we might like. These algorithms have the ability to make life easier for many people around the world but what is thought of usually as algorithms is the partnership between algorithms and machine learning.  An algorithm is a formula for making decisions and determinations about something whereas machine learning is a technique we use to create these algorithms (Denning & Martell, 2015). These machine learning models can be complex, containing hundreds of features, and millions of rows to come to results that are highly consistent and accurate. They also have a wide variety of uses from determining insurance premiums (O’Neil, 2016a), writing articles (Marr, 2020), improving the administration of medicine (Ackerman, 2021; Ngiam & Khor, 2019) and setting bail (Kleinberg, Lakkaraju, Leskovec, Ludwig, & Mullainathan, 2018). Though there are a lot of benefits to using algorithms and machine learning, there are times when they cause more harm than good, such as when algorithms give men higher credit limits than women (Condliffe, 2019), bias hiring practices, and recruitment (Hanrahan, 2020), and fail to identify black faces (Simonite, 2019).  The reason why this is so important is as these systems continue to expand into new fields, individuals rely on the judgments of these algorithms to make important decisions (even sometimes more than they rely on the judgment of others) (Dietvorst, Simmons, & Massey, 2015; Logg, Minson, & Moore, 2019). These important decisions made by an algorithm can become biased as the algorithm does not eliminate systemic bias but instead multiplies it (O’Neil, 2016b). To understand why some models work better and others, and where systemic bias comes from I will de-black box data, algorithms, and machine learning models using a potential resume sorter as an example.

The Data

We come to live in a period of history that some refer to as the information age, as data becomes one of the most valuable commodities to have and own (Birkinshaw, 2014). Data alone is not useful but data in context and relationship to other information is why companies spend millions of dollars a year to harvest information from as many sources as possible. A clear example is in explaining how labeling different data can data alter our perceptions of the magnitude of a specific number. Take the number 14, if I were to label it as 14 days versus 14 years, 14 days will then seem negligible compared to 14 years but I were to then add another piece of information such as 14 days as President versus 14 years in an entry-level position the 14 days will then carry more weight than 14 years. This is how data works, by quantifying existing information in ways in which we can then analyze the differences in the number 14 in the various contexts in which it exists.

The first part of the data process is quantifying the phenomenon of interest. Using the example of a job application, one of the properties that need to be quantified properly is years of experience. Just as with the previous example number magnitudes, not all experience is weighed the same so different features or variables need to be created to differentiate these types of experiences. This could be done by categorizing the type of prior work as entry-level or managerial, or degree of relevance to the position being offered. As with all translations, there is something lost when attempting to change from one language to another. How would one categorize freelance experience or unpaid work? These examples highlight how even when capturing what appears to be the correct objective information through the process of quantifying may end up taking highly complex information and flattening thoughts, expertise, and experiences, ultimately biasing the outcome (O’Neil, 2016b). This standardizes information into a format that is understandable to a computer but may not accurately represent the reality from which the information is derived. This is why companies and researchers attempt to collect many different types of information as it gives a well-rounded context to the data and allows for a fuller picture. A good example of these types of profiles is checking what your Google (https://www.businessinsider.com/what-does-google-know-about-me-search-history-delete-2019-10?op=1)and Facebook (https://www.cnbc.com/2017/11/17/how-to-find-out-what-facebook-knows-about-me.html) add profiles have to say about you (Haselton, 2017; Holmes, 2020).

Figure 1. Picture taken from Homes (2020).

The information a company has on an individual may not be explicitly given but rather inferred by the other pieces of information especially when the target data is unavailable. This can be done using by making two assumptions. First, that information does not exist in isolation. Second, that relationships between variables occur in a systematic way. Both of these points will be addressed more specifically in the next section about modeling. This is to say, that the more information received, the better the inference we can make about certain qualities of the person. Going back to the example of a job application, if someone reports working for a specific company, say Netflix, knowing other employees who have come from Netflix will allow individuals to make inferences about the work ethic of the applicant. People do this all the time when taking suggestions from their friends on items to buy or places to eat.  Though these inferences may be faulty, especially considering people’s tastes differ depending on the type of food, in collecting more information, people can make better judgments based on the information available.

There are major issues though when it comes to data and data relationships.
First, “Garbage in, garbage out” problem. Data is only as good as the quality of information being put into it. This issue is substantiated most directly when the data being captured is either not accounting for the truth or the question being asked doesn’t accurately measure the construct thought to be measured (O’Neil, 2016b). In the example of the job application, if someone is asked what type of experience they may have with coding python, their answer may be only two years, but their true understanding of coding may have come from the 11 years working with javascript, C++, and SQL. The question attempting to get directly at the at expertise may gloss over a more fundamental understanding of coding in general.

Second, previous biases may be reflected in the data, which the data does not account for. This has become extremely salient in the past few years with the rise of the Black Lives Matter movement bringing to light the systemic issues when it comes to understanding outcomes in relation to race. An example from a paper by Obermeyer and colleagues (2019) showed that black patients’ health risk scores are chronically sicker than their white counterparts at the same score. This is because black patients generally spend less money on medicine and because health care costs are used as a measurement of the level of sickness black patients are rated as being healthier. This though doesn’t reflect the truth about the severity of the illnesses black and white individuals may face, but more the cultural differences in seeking standard health care. It’s important that when collecting data that the data represents what you believe it represents and that a more holistic picture understood, especially before embarking on creating your model.

Figure 2. Taken from Obermeyer et al., 2019

Modeling

“All models are wrong, but some are useful” – George Box

Modeling is how we turn data into algorithms. Each piece of data gives us important sets of information in context but it’s how these data may interact which allows us to make predictions, inferences, and act upon it. It’s important to note that models initially that models are agnostic to what the data is or represents, the chief concern of a model is the potential relationship that data may have with a specified result. Modeling makes the assumption that data varies in a systematic way, meaning that there is a discernable pattern that can be used to predict certain outcomes. This assumption means that data cannot occur in isolation and that there are relationships between phenomena that can explain why something is the way it is. The distinction between these two things leads to the initial distinction between the type of models used, predictive more inferential.

Prediction Versus Inference

Predictive models care about one thing, being accurate. This may mean that a model might find a connection between applicants with the letter Z in their name and potential hire-ability. Though this may seem to be a silly example, it does correctly illustrate the point that these types of models only worry about the potential outcome and maximizing the predictability of these outcomes. There are many benefits to this, as people may not mind how a computer is generating an outcome for a cancer screening, only that the screening is accurate.

Inference on the other hand concentrates on the relationship between the variables. An example of this would be how much does five years of field experience matter when compared to a college degree in the subject. This type of modeling attempts to discern meaningful connections using semantically meaningful data. This would be more useful in the instances when attempting to find the cause of a particular outcome and understanding how one thing may relate to another.

Most modeling you find in business are predictive models, whereas in academia and policy inferential models are much more important. The type of model you decided on will ultimately impact the outcome you arrive at.

Types of Modeling

Modeling is the attempt to create an algorithm which can accurately predict a certain phenomenon. Each of these models then is competing with whether it is able to decern outcomes better than chance, using various different methods to achieve this. Most modeling in computer science involves taking part of the data to create the tests and another part of the data to Modeling can be broken down into two broad categories, classification and regression.

Classification, also referred to as discriminant analysis, attempts to create boundaries around data in an effort to correctly sort data into predefined categories (Denning & Martell, 2015). Most of these techniques are also categorized as non-parametric, meaning that the model does not make assumptions about the structure of the data before attempting to sort the data into groups. There are a few different classification techniques but one that most easily understood and widely used is a decision tree. Decision trees are essentially a sequence of logical statements attempting to create rules around a certain phenomenon. Things like, does the oven feel hot? If ‘Yes’ then the oven might be on. Though models get more complex than this, and the number of rules may increase the goal says the same, how to most accurately sort data into categories using these rules. The program attempts to create a model that is able to increase accuracy while reducing error as much as possible. This may be more or less possible depending on the outcome of the data, and the divisions created may not be meaningful in any way.

Figure 3. An example of a decision tree for a Job Recruiter

The other type of modeling is regression modeling, or linear modeling. This type of modeling makes assumptions about the data’s structure which is why it’s referred to as parametric modeling (Alpaydin, 2016). These assumptions state that the data is should be represented in a normal distribution, and that the data varies in a systematic linear way.

Figure 4. Standard Normal Bell Curve taken from  https://www.mathsisfun.com/data/standard-normal-distribution.html

Though there is are whole courses devoted to regression, what regression is attempting to do is take the variation within one variable to explain some of the variations in another. How much can I explain the increased desire to have ice cream with the increase in summer temperatures? In the example of the job applicant, the greater the years of experience may make a more capable candidate. The issue with this obviously is that it relies primarily on the data being linear, and presupposing that a candidate who may have 20 years of experience is always better than a candidate who has 10 years of experience. There are ways around this assumption but most modeling done using these techniques do no account for them. In regression technically the more items used the better we are able to predict the outcome, though this doesn’t mean each variable is contributing a  significant amount. Different weights are then placed on different variables which make up these regression formulas, indicated that certain variables contribute more to finding the results then others. Statistical weights are represented by a number that you multiply with the observed variable (e.g. years of experience) in an attempt to create a formula. Each weight then represents how much of a certain value contributes to finding the predicted outcome (e.g. hire-ability).  There a couple ways to do regression, using a frequentist approach or a Bayesian approach, regardless of what you are attempting to do is explain the variation in the target variable.

Figure 5. Taken from SPSS-tutorials.com

The last type of modeling I want to discuss are Neural Networks. These are a bit complicated so I included a video explaining what these are in greater detail. To summarize, neural networks are a series of interconnected nodes which adjust the statistical weights of the connecting variables/nodes in an attempt to find the best possible configuration to predict the outcome. These statistical weights start in arbitrary but adjust through iterations to create the best model possible. This type of model is being utilized to create complex networks and formulas to predict things such as heart disease (Reuter, 2021). The unfortunate part of Neural Networks are nodes refered to as hidden layers, which processes which occur behind the scenes and makes Neural Networks difficult to interpret beyond the outcome predictor.

Figure 6. A simple Neural Network taken from https://en.wikipedia.org/wiki/Neural_network

All of these different types of modeling are ultimately tools to understand the relationships within the data. This brings us back to the concept of “Garbage in, garbage out”. The most important part of the model is the information that is being used to create it, without good information, we can’t get a useful model.

Conclusion

There many different techniques which are utilized useful algorithms using machine learning. As previously stated, the data we feed into these models will impact the eventual outcome. This is why it is so important to understand what we are attempting to predict and to control for it. Coming back again to the resume sorter to explain why getting data right is so important. In 2016 it found that those who had ethnic-sounding names were not called back at the same rate as those who had whitened their name, even with equivalent resumes (Kang, DeCelles, Tilcsik, & Jun, 2016). When creating these algorithms with machine learning, the data we provide the model will indicate the result we ultimately achieve. If certain types of people or certain types of experiences are not accounted for in the data or misrepresented then those biased are amplified through the process of machine learning. If an applicant for instance takes an opportunity that is hard to define, those opportunities may become disadvantageous when looking for a job in the future. If an applicant does conform to the standard array of qualities, then that applicant may be rejected for being different rather than not right for the position. Though all models may be wrong, models used incorrectly may cause more harm than good. This is to say the solution may be to create the world we want to see rather than the world which we have. By creating data and training machine learning models on these ideals, algorithms will be created to reflect what we want, rather than amplify what already is. To get there we have to understand the relationships between certain factors already existing in our data, such as how to we weight certain experiences, or achievements.

            Algorithms have revolutionized the world for the better and will continue to do so as data becomes more abundant and machine learning models become more complex. Understanding how data is used, and why things are the way they are gives us healthly skepticism for when the next time an algorithm tells us what to do.

Bibliography

Ackerman, D. (2021). System detects errors when medication is self- administered. MIT News, pp. 2–5. Retrieved from https://news.mit.edu/2021/inhalers-insulin-errors-medication-0318

Alpaydin, E. (2016). Machine Learning. Cambridge, Massachusetts: The MIT Press.

Birkinshaw, J. (2014). Beyond the Information Age. Wired, 8. Retrieved from https://www.wired.com/insights/2014/06/beyond-information-age/

Condliffe, J. (2019). The Week in Tech: Algorithmic Bias Is Bad. Uncovering It Is Good. The New York Times, 13–15. Retrieved from https://www.nytimes.com/2019/11/15/technology/algorithmic-ai-bias.html

Denning, P. J., & Martell, C. H. (2015). Great Principles Of Computing. The MIT Press. Cambridge, Massachusetts: The MIT Press.

Dietvorst, B. J., Simmons, J. P., & Massey, C. (2015). Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1), 114–126. https://doi.org/10.1037/xge0000033

Hanrahan, C. (2020). Job recruitment algorithms can amplify unconscious bias favouring men , new research finds Women ’ s CVs are ranked lower than men ’ s CVs across each job type. ABC News, 10–13. Retrieved from https://www.abc.net.au/news/2020-12-02/job-recruitment-algorithms-can-have-bias-against-women/12938870

Haselton, T. (2017). How to find out what Facebook knows about you. CNBC, pp. 1–12. Retrieved from https://www.cnbc.com/2017/11/17/how-to-find-out-what-facebook-knows-about-me.html

Holmes, A. (2020). Clicking this link lets you see what Google thinks it knows about you based on your search history — and some of its predictions are eerily accurate. Buisness Insider, 1–8. Retrieved from https://www.businessinsider.com/what-does-google-know-about-me-search-history-delete-2019-10?op=1

Kang, S. K., DeCelles, K. A., Tilcsik, A., & Jun, S. (2016). Whitened Résumés: Race and Self-Presentation in the Labor Market. Administrative Science Quarterly, 61(3), 469–502. https://doi.org/10.1177/0001839216639577

Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J., & Mullainathan, S. (2018). Human Decisions and Machine Predicitons. Quarerly Jorunal of Economics, 237–293. https://doi.org/10.1093/qje/qjx032.Advance

Logg, J. M., Minson, J. A., & Moore, D. A. (2019). Algorithm appreciation: People prefer algorithmic to human judgment. Organizational Behavior and Human Decision Processes, 151(December 2018), 90–103. https://doi.org/10.1016/j.obhdp.2018.12.005

Marr, B. (2020). What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence? Forbes, 2–8. Retrieved from https://www.forbes.com/sites/bernardmarr/2020/10/05/what-is-gpt-3-and-why-is-it-revolutionizing-artificial-intelligence/?sh=2b41e039481a

Ngiam, K. Y., & Khor, I. W. (2019). Big data and machine learning algorithms for health-care delivery. The Lancet Oncology, 20(5), e262–e273. https://doi.org/10.1016/S1470-2045(19)30149-4

O’Neil, C. (2016a). How algorithms rule our working lives. The Guardian, pp. 1–7. Retrieved from https://www.theguardian.com/science/2016/sep/01/how-algorithms-rule-our-working-lives

O’Neil, C. (2016b). Weapons of Math Destruction (Vol. 78). New York, New York: Crown. https://doi.org/10.5860/crl.78.3.403

Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453. https://doi.org/10.1126/science.aax2342

Reuter, E. (2021). Mayo Clinic finds algorithm helped clinicians detect heart disease , as part of broader AI diagnostics push. Med City News, 1–5. Retrieved from https://medcitynews.com/2021/05/mayo-clinic-finds-algorithm-helped-clinicians-detect-heart-disease-as-part-of-broader-ai-diagnostics-push/

Simonite, T. (2019). The Best Algorithms Still Struggle to Recognize Black Faces | WIRED. Wired, 1–7. Retrieved from https://www.wired.com/story/best-algorithms-struggle-recognize-black-faces-equally/

 

Synthesis: Inside The Black Box – AI

AI is a great leap forward but in most ways, we are still in its infancy. Like a tool which we find in our grandfather’s workshop, we may find a use for it but we don’t fully know how to use it properly. There are many mysteries concerning AI and how they work but until we truly open the black box we also won’t have control over them.

There are two things I want to touch upon. First, how these technologies work. Second, the implication of their working.

AI, like most of my field of psychology, works on statistical probabilities of a phenomenon occurring based on data from the past. This works on the foundation of three things. First, the understanding that if something has happened before, it will most likely happen again. Second, the past causes the present and the future to happen. Third, that information is interconnected, creating relationships between phenomenon that is measured. This leads to the creating of models to determine how and why phenomena occur in the first place. This is where AI and scientific thought diverges in a lot of ways. AI uses information fed into it to create models which, for the most part, cannot be understood by those who create them. This means, for the most part, the models being created are focused on the end result rather than the journey to get there. This means though that as long as we believe the AI is working, it doesn’t matter what type of biases it may be used to create the model to determine the answer. Data that is fed into a system directly relates to the results we get out, and so if the data set is biased then the results to are biased. The problem then with AI is that there is no oversight into the biases of data which leads to an overconfidence in the validity of the results until someone who is hurt by the systems speaks up. Ultimately we need to look to history to show that humans are wrought with bias, and that when systems are being used this way that there may be hundred people suffering in silence for every one that speaks up.

The second is the implication of these working. It’s a detachment from the method, the exclusion of the human element that makes us so confident in the results of the process. A type of no-holds-barred event where as long as AI gets at the answer, we don’t care about the method. We are just at the tip of the iceberg with AI, with most of the functions of AI on their way. Though most things may be helpful and benign, it’s important to understand that AI will be used a great sorter of things. Just as resources get distributed unevenly, so will the functionality of AI. Choice will become the luxury and that we will be faced with the facsimile of options. For most that will work but there will always be those left in the wake of the oncoming wave. The way AI work presupposed a sense of expertise and knowledge of you, but in truth the way AI has to work is  just like how data works, it has to flatten and categorize in imperfect containers to be able to create proper results. Just like the machines themselves, we will need to operate within a set of harder set parameters. Life is messy, and so are people but by making determinations based on these set parameters will further confine those who are at the bottom. What we may improve on in life we may lose in freedoms. AI isn’t all bad, and not everything it does is an existential crisis but it’s important to have these conversations about AI and it’s implications before we get there. Given the choice, people may chose to live a life without it.

AI is a mystery and is only getting more mysterious. The future I guarantee will at least be getting more interesting. The more you know, the less you may understand but learning about AI is important to be able to make choices about our technologies in the future.

Big Data has Big Problems and Even Bigger Solutions

Big data is something I know a lot about because when the term started to be popularized in Psychology I was enamored, like many others, for the potential of doing anything with such large data sets, and the promises of being able to find truths that would normally be out of reach. Years later, after many studies and attempts to utilize such data I find myself realizing that, aside from the hype, Big data was just like any other technique we utilize, giving us lots of information but not a ton of knowledge.

There are a few things with big data that are problematic. First is the ability to generate information and connection between two seemingly irrelevant things. Which ordinarily sounds amazing until you realize that the best and most effective application of these tools to date is to market and sell ads/products to you more effectively. Amazon and YouTube are great examples of this. It knows what you want to buy before you want to buy it or the video you want to watch before you knew you want to watch it. This also made it so Facebook could control algorithms to improve or depress mood of those who use their site. What these companies who use Big Data care about is the bottom line which leads to the next issue.

Those who use Big Data sometimes don’t understand the ramifications of the work that they do. There was a study I saw a number of years ago which used a data set of faces to see if they could identify faces of Gay men. This is intriguing but also highly invasive and controversial. This isn’t much of an issue outside of a proof of concept but in countries like Iran where being Gay is illegal using something like this system to determine the likelihood someone is Gay (regardless of the true accuracy) is terrible. Data needs to come with the responsibility to use it or else we will end up with scenarios where we do something we can’t easily undo and end up harming a large group of people.

This brings me to my last point, Big Data used responsibly takes a lot of effort. There is a new project happening called the Human Screenome Project which takes pictures of what is shown on your phone every 5 seconds. Amazing large data set which will reveal a lot about how people use their phone, but to even parse through the millions of pictures to derive the information for analysis will take years and thousands of hours. Big data is fantastic but not some easy shortcut just because it’s there. When used responsibly a lot of time and effort needs to go into understanding what exactly it tells you and how to interpret what you’ve found.

Human Screenome Project

 

Challenging The Cloud Colossus

What the readings bring up is both this great unifying force of technology and its reach into everything we do. I wanted to highlight this effect specifically for Google as it’s a prime example of a company making integration easy but at the same time centralizing a lot of work around the Google platform.

It makes sense for Google to be a cloud computing company as its primary services require online services. Still, it is a company that continues to integrate into different online markets as a consequence of its massive infrastructure and first-mover advantage.

This makes it easy to access things such as files easily, using Google drive, access information quickly, or even run complex ML/AI models for businesses. There a couple concerns to be had when great monoliths are created. First, unless the company is constantly incentivized to innovate, the economies of scale effect are achieved, which reduces cost (which always wonderful) but also stifles innovation which may be more incremental as the cost/benefit will never reach the same level as these large tech companies. This is seen more the effects of Amazon and Walmart, but as Google is able to out-compete smaller companies or easily buy them out, this causing services to continue to feed into these growing tech giants. This isn’t all bad as some other companies now may be able to scale up as a function of the lowered prices and added integrated services, leaving companies better able to perform and reallocate resources elsewhere, which may be more beneficial.

This also places a large emphasis on the ability for one company to be financially successful and secure. For success, imagine if Google declared bankruptcy tomorrow; how would the economy be affected by this news. Now, this may be an unfair scenario as Google is one of the most successful companies in the world but imagine how many services would be at risk of going offline. How many years would it take for things to return to the levels before the news? Would the US government have to bail them out? These are all major concerns for these businesses as they are integrated into so much of our digital infrastructure. This also puts an emphasis on security since their is so much private and critical data handled by Google, the moment they have a data breach (like Facebook just did), the amount of information that is now out in the open would be astronomical. This puts a huge amount of pressure on companies to do things correctly the first time and constantly be vigilant to outside threats. Both of these are good as Google is a stable company with a very secure infrastructure. Still, the more Google does and the more integrated into Google services, the more important it is for Google to continue to be successful.

The final thing I want to mention is the question of efficiency. As companies grow to scale, their ability to take care of work also increases per employee. So one employee working for a company that uses Google’s services may be able to do the job of 1.2 employees elsewhere. This is great because people can do more and would open it up for employees to do other work but the major problem being that companies looking to reduce costs will not need to continue to recruit personnel. This is one of the existential crises with the rise of these integrative cloud services and its effect on productivity that the economy is not creating enough jobs to replace the jobs destroyed by the rise in productivity and ML/AI. With so much integration with Google, less needs to be done, which saves hours of time and costs but at the cost of a different kind.

Though the topics I brought up tend to bend to the more negative, it’s only because these the major questions we need to consider before jumping headlong into total integration for these services. Do the benefits outway the cost? What do you do after creating a growing tech Colossus?

 

Paternal Beneficence and Following The Threads of Blame

There are so many aspects to Artificial Intelligence (AI) that call for us to pause the pursuit of progress and take a moment to ensure we do things right the first time. This powerful technology brings so much apprehension not just because of its techniques but because its rollout is quick and there is no face to blame if things go awry. This brings up the two issues which I want to cover today – paternal beneficence and how to attribute blame.

How do we create a system which makes decisions for other people, especially if other people don’t know that decisions are being made for them? This happens all the time, even before the widespread dissemination of AI as businesses and governments decided what is important for one group or another, and what thresholds people meet to before they may have access to benefits. This problem is only exacerbated when using AI to help or make decisions for us. This is because the definitions of goals and outcomes for each agency will differ and because of that it is susceptible to benevolent decisions having malevolent outcomes. Say you wanted to decrease the mortality rate in a hospital, though on the surface is seems like an ideal goal that is because we have assumed idea as to what the parameters around the decisions which should and could be made. An AI system that is agnostic to moral platitudes may simply reduce the rate of high-risk patients coming to the hospital, rerouting them elsewhere to ensure that the cases faced by the doctors have a higher likelihood of success. This would ultimately not be discovered unless there was someone constantly supervising the AI and an audit of the system would be conducted but in the meantime, hundreds of injuries or deaths may have been prevented if the system was not brought online. This goes into the process of deblack-boxing which calls for us to be as explicit with the outcomes and parameters as possible. This though, as with legislation, requires hard lines to be drawn and for people who fall through the cracks in the system as we can’t account for everyone. This also presupposes those who would ultimately be sorted by this system are unable to impact the system in the moment in which they are coming in contact with it. A type of paternalistic choice being made as we believe that the system or administrators has the expertise to make a better more informed decision. Conversely, if we have the user make the decision it may slow down the decision-making process in a moment where time is scarce. Though there really isn’t any best practice approach to this it does lead to the next problem and crisis AI has to contend with.

How do we attribute blame when a system goes awry. If in this same scenario we find ourselves rerouted to a different hospital care clinic and as such don’t receive the level of care necessary leading us to require life long assistance, who should take the blame and be responsible for my care as the situation in this instance would have been preventable if I had gotten to the better hospital which I was routed away from. This is a persistent problem as AI systems are being put in charge of frameworks that can cause larger magnitudes of harm. Do we blame the administration for making the decision to decrease mortality? Do we blame the ambulance driver for following the decision of the algorithm? Do we blame the AI for the parameters it wasn’t given in the first place? Do we blame the developer for not putting the safeguards in the system originally? Who is responsible then for my care? These attributions of blame, just like how it is difficult for companies to single out one person as the cause of the problem make legislation hard and bring justice when tragedy strikes harder. We can’t throw up our hands and say we don’t know either because the stakes are already so high, people’s lives are being altered by the decisions being made by AI systems and people get further out of the loop blame gets harder to attribute.

Though there isn’t an easy solution to any of these issues it does bring up the complexity of the problems with AI and how we need to be thinking about these problems now before we get to the point of no return.

Siri: She may be ridged but she works

Apple’s Siri hit the market in 2010, with full IOS integration in 2011, becoming the famous counterpart to Alexa, and the Google Assistant. But how does it work?

Originally created and integrated as a third-party app, Siri uses many layers to attempt to understand its users input and create a helpful reply based on that input.

It’s starts with the command, it processes the command and how long this command statement is by using two things, a limit to how long of a pause there can be in a statement to determine the end of a statement or when the system memory determines that the statement is too long for memory and cut off after a certain amount of imput.

This moves the language being spoken to the Natural language processor which compares what was spoken to what it believes certain words or phrases sound like using a predictability matrix to determine the most likely candidate for both individual words and now phrase statements. This uses both a dictionary of words and phrases which are the most likely to be uttered before returning to the reply. This process has an emphasis on identifying the individual words which make up the command leaving commands to be as close to possible to the user input.

https://patentimages.storage.googleapis.com/3f/bd/28/5fba99b5c97ea6/US20120016678A1-20120119-D00009.png

This is then passed up to the cloud services for them to determine what do with the transcribed statement being used, returning both the full transcription of what Siri believes the user to have said and the response based on that prompt given.

This means that attempts to ping certain apps based on the given statement and use these networked services to provide an answer. Siri is not usually the one directly providing the answer but perhaps using a voice to text feature to read a small prompt back to the user before verifying that Siri’s response is correct or asking for more or another input

https://patentimages.storage.googleapis.com/08/8a/99/4193df7d912952/US20120016678A1-20120119-D00022.png

https://patentimages.storage.googleapis.com/49/4d/15/bebc2e600bd251/US20120016678A1-20120119-D00033.png

Siri is based on an external working system that relies on other apps and services to return the result wanted for the user. This means that outside of the programmed networked apps such as calendar and directions, Siri relies on the internet, Wolfgram alpha, and other integrated services to return the response to the user. This makes Siri ridged as these commands are not being handled directly by the NLP and require certain keyword statements to achieve the desired result. These keyword statements do not flow as well as natural statements being said but act more as if you had typed a statement into a google search bar and found the top result.

Siri has improved since its inception, integrating better language processing and abilities but still utilized external applications to the bulk of the heavy lifting. Siri is a virtual liaison mostly, as it tossed much of the usability to other platforms leading it to most just handle the Natural Language Processing and to give process and summarize results of the inquiry.

Information retrieved from:

https://en.wikipedia.org/wiki/Siri

https://patents.google.com/patent/US20120016678A1/en

https://en.wikipedia.org/wiki/Virtual_assistant

The Statistics and The Word

For someone who has always had trouble with words, knowing my computer has had the same trouble is a great relief. English is a confusing language, the rules bend and break depending on the context. It’s hard enough for people to learn language context for what they are learning, a computer, which arguably doesn’t understand the semantic meaning of the word, has to suggest and predict based on the rules we give it and what has happened before.

How does this work? Why are google docs becoming so good a guessing the next word in your sentence? The answer is statistics.

As we write, information is flowing out of our fingertips coming to what we expect to be the eventual end of a sentence.  Each sentence has some meaning which comes in a usually predictable way.  (Subject – Predicate is how I learned it.) That means that as I sentence off there should be certain elements of the prose which are more understandable each step I take. We do this all the time when we can guess what will come at the end of a talk, or guess what someone is likely to say next. It’s because computers, like us, build a repertoire of already constructed sentences that allows us to get a good idea of what is likely to come next. The computer builds models based on millions of lines of texts, evaluating each and every way these words have been constructed, taking into account what has already been written, and then generating a suggestion that has the highest likelihood of being correct.

We can see this in action with google docs, as we are writing it makes in the moment suggestions as to what will come next. This does two-fold. First, this is training the model for the language we expect it to use in real-time. If I choose to write what the AI suggests then it knows it was correct in its choice in how the sentence was structured and choosing the right model for the future. Second, it shoehorns the user into using more predictable language which then the AI can better predict. The more language the AI knows, and the more it knows how you write, the better it is as predicting how you will compose a document.

Writing is an arduous task, to do it well takes a tremendous amount of effort and time. I imagine as these AI advances writing will become easier, to the point where all we will need to do is give the AI the subject of our writing and the context of why it’s being written and the AI will be able to give us a decent first draft.

Statistical probabilities are an interesting thing, as models get better we (people) become more predictable. This lends itself an eerie feeling that someone knows what we will do. Though this is a topic for another time, the predictability of how you write and speak is critical to how systems work. The only way AI can write is because what we write, and perhaps how we think, comes in a predictable way. Think about that next time you decide to compose a document, whether your writing foreshadows what is to come next.  For now, we can start to rely on machines for that next step.

The Rumble of A Roomba

I think the most interesting thing for me this week, which was highlighted by the Karpathy article, is how much we need to adjust the data we input into a system before creating a machine learning model. A uniformity must exist with a reduction of noise to accurately embody the true answer to the problem we are attempting to solve.

This most acutely reminds me of a Roomba or a robotic vacuum. People, including myself, will sing the praises of owning one of these devices as it keeps the floor clean regularly without the need of external monitoring, to fulfill a role which I would normally take on as I have now two quite furry companions. The thing about the vacuums though is that I have to make sure everything that I don’t want it to run into is off the floor, that there is no water abounding, that there isn’t any string to get caught up in it’s gears or motors. Essentially I have made my apartment a system in which the robot vacuum can work in peace without any obstruction.

The same goes for this learning model which Karpathy uses. These photos are taken and cleaned in some way to create a uniformity which the system itself can understand and process, with anything deviating from that uniformity not being captured.

What does this tell us about the pictures? That the system knows how to pick out certain types of well done and popular made photos which people have uploaded on the internet. This is amazing! Though does it teach us anything about the photos themselves or how to take photos? No, as these are techniques that could be learned by studying photography and design.

What I worry about is not clean practices of learning by the messiness of the world. Something has to give, will the world become more orderly to accommodate the model? Or will the model eventually be strong enough to be able to handle the messiness of the world? I am sure it’s somewhere in between but it will be interesting where we will find ourselves on that spectrum.

Color, conversion, and data.

Data are an interesting concept because I work with data all day. Unlike information, data is much more meaningful as it coalesces a plethora of pieces of information to give them relationships to each other.

The two things I took away from all this are information about the SQL databases and the true phenomenon of color pictures.

SQL databases sound pretty boring from the outside but a lot of data is stored this way, especially institutional data. I never understood what it was or what organizations use it, especially from the outside it seems silly to store data in separate data frames from each other, needing to write code to retrieve it each time. But in learning about memory constraints and the way memory is written I start to understand it. SQL is a great way to store data when there are constraints, and when data can become infinitely long type and pattern. If you have several hundred million rows of data storing multiple pieces of information, it would be unwise to store it all in one place as it retrieving it would become nightmarish for wait times and leave you with a wall of information you most likely don’t want or need. It’s always been an interest of mine to learn SQL as it seemed like a very fundamental data language, and in understanding this, it is all the more important to understand.

I wanted to explain pictures as they fascinate me. Pictures are broken down into smaller and smaller pieces to become the size of one pixel, that pixel then contains 3 values which ultimately act like levers to create all the colors we can see on a screen. This ranges from white (which are all the colors turned on all the way) to black (which is all the color turned off). These values take the form of numbers. These numbers will change depending on the format you are using to read in and out the information, some programs having more range than others. This applies to file types as well, some file types are richer than others creating a need to adapt these higher-end files into smaller more readable files. Whenever this translation happens you lose something, be it color or resolution (the density of pixels).

Since the information is stored within 3 different colors (red, green, and blue), it requires 3x the space to store one pixel. These pixels ultimately lots of room as identifiable images might require 256, 3000, 10000 pixels on the low end. The more pixels the more we are able to discern smaller details in the background. It was only until the last decade where digital photography became the standard for professional photographers as film was always able to capture better more vibrant images than digital cameras.

The data is stored as binary with metadata informing the structure, size, and type of the file being handled. There many different ways to store this color information, as hexadecimal, as RGB color output but what is important in the conversion of that information into the format you are intending to use. Jpg which is widely used for documents loses some of the information on color and resolution to make files smaller, whereas .Png files favor rich data outputs of shrinking the file size. It’s important to note, each time you download and convert the file, you are always losing something so it’s important to try and download from as close to the source as possible.

Questions:

How do video games work in as sequence of colors and text as this is a moving and changing format?

Is the text in movies stored as text or as color?

At which point will we not need to compress information as information storage and transfer speeds will be good enough?

Signal Transmission Theory and the Brain

In doing the readings this week what was most salient to me was the similarities of signal transmission theory, the transmission of data, and the brain. Shannon’s information transmission theory as explained by Professor Irvine in his article requires the inception, transmission, and then interpretation of signal by a separate entity. This means that if I wanted to send a message or even an image what is required is the successful transmission of a message, even when marred by error, is left interpretable by the receiving entity. This is in many ways how the brain works.

Take for instance when we see, there is a transmission of information from our eyes, through a nerve, to the vision area in our brain to encode and decode that visual information. The brain doesn’t have access to the visual precept but many small precepts that are strung together to create whatever we are seeing. In the example of using an apple, our eyes break down that apple, and our brain reassembles the message. Now, not all messages will come through, and even some are conflicting but our brain using the information it has and all the other information around it will help fill in the likely gaps (not perfectly but works well enough). This representation is very much like how the internet works, the deconstruction and reconstruction of information to transfer it from one place to the other as mentioned by Professor Irvine and, Denning and Martell.

These signals by themselves have no semantic value though, though we may be able to see the apple at this stage we do not know what the apple is, how it tastes, what it’s color is, or all the things an apple can be used for. This is a completely different stage in the brain and in a lot of ways a different stage of computing. This meaning-making is the next step of the process for computing and requires more than just Shannon’s transference of information. This transference of information is important in making sure the data being received is consistent and interpretable but beyond that point, it requires another process to understand if the message itself is worthwhile.

I am excited to see where this goes and how computers are attempting to jump from information transmitters to information interpreters.

Reference

Irvine, Introducing Information Theory: The Context of Electrical Signals Engineering and Digital Encoding (2021).

Prof. Irvine, “Using the Principle of “Levels of Abstraction” to Understand “Information,” “Data,” and “Meaning” (Internet design) (2021).

Peter J. Denning and Craig H. Martell. Great Principles of Computing. Cambridge, MA: The MIT Press, 2015. 

The Information Science Revolution

It’s interesting as an approach to say that most of the sciences are moving towards this concept of being an information science. Even my home in Psychology is rooted in this idea of being driven by the data. As the large data revolution took hold, and big data has become more available there is a question that lingers over the discipline of how we should pursue getting answers to questions we may have in our discipline. Should it be driven by the person or by the data?

There are many good reasons for doing either, with machine learning approaches co-opting the same tools that Psychologists use to run analysis, linear regression, latent frameworks, and covariation between variables. As data has gotten more rich and complex there has been a surge of different types of modeling needed to meet the demand of researchers for their experiments. I believe it comes down to the problem of being able to de-black box the journey of the data and not just how to find its solution. To be able to understand the ramifications of the question being asked and not simply running it.

It was a couple years ago when I sat in a stuffy conference room and two scientists from Iran had come up with a machine learning approach of determining whether someone was gay using photographs and profile pictures. An interesting application of data that already seems problematic but especially so when considering being gay in Iran is illegal. Lots of questions came up about the validity of the project and whether the data was valid but these are the things we are going to have to contend with. As we get more sophisticated models and richer data, even though each piece of the data may contribute only a small margin to the greater statistical story, when adding 10,000 variables with 100,000,000 rows we can start to predict just about anything, the question is, should we?

 

Questions – There were many but these are the once I am going to start with:

How do we try to understand the data which goes through these computational models?

Is network security, like physical security measures (e.g. locks on doors), play more of a role of security theatre and deterrence rather than being fully secure?

Citations –

Peter J. Denning and Craig H. Martell. Great Principles of Computing. Cambridge, MA: The MIT Press, 2015. 

John D. Kelleher, Deep Learning (Cambridge, MA: MIT Press, 2019).

The Following of Man By Machine – Matthew Leitao

There are two things that I got out of the readings and it has to do with the purpose of AI generally. AI has had a long and interesting history with many great minds (who I would have loved to meet and talk to) attempting to push forward the design and progress of code and the idea of artificial intelligence. What is interesting to me is that there is this split where we are trying to imitate people and trying to solve problems at the same time.

I think the explanation by Wooldridge of the Immitation Game used by Turing is a prime example of this conflict. Is the purpose to imitate or embody as these are two very different tasks. It’s like teaching a system how to beat someone in Chess or Go, we give them the system the rules but what is best will be determined by the computer and not necessarily the operator. This is why computers can leverage their data to solve problems in a different way, as AlphaGo solved this problem by analyzing final board states which would be meaningless to individuals for the most part.

This also brought me to wanting to understand the ways we attempt to make AI through different approaches explained by Bolden. Each of those systems would work perfectly for a specialize AI system as it works in a format that is conditional and two dimensional, especially considering we expect such a high success rate from these programs. I can tell you from having a psychological background, people make are prone to making mistakes all the time but we manage because there are consequences to when we get things wrong. A computer is agnostic to the right and wrong process as there is no programmed “suffering”. People and animals are machines meant to do the ambiguous goal of survive to propagate (as poised by Richard Dawkins). I wonder if we were to create a computer using these unsupervised methods and give it an ambiguous goal, positive and negative inputs, and needs if they also would be a full “human” by the time they are around 18-25 years old.

 

Questions:
Would making stimulus meaningful to AI make them better or worse at solving problems? If AI knew what an apple is as we know what an apple is would they improve?

Would forgoing the idea of trying to make computers like humans actually be more beneficial?