How to Eliminate Bias in AI | Built In Colorado

Anaïs Nin once wrote, “We do not see things as they are, we see things as we are.” When it comes to machine learning, there are few concepts more dangerous.

Take the home appraisal market, for instance. As reported by ABC7 in San Francisco, a Black family from the Bay Area recently put $400,000 of renovations into their home, including adding an extra level. Yet when their house was appraised, it was valued at nearly the same price prior to renovation.

The family then got a second appraisal, but this time had a white neighbor stand in and pretend to be the owner. The result: the home’s value immediately went up by 50 percent.

This incident is not isolated, as the New York Times and Indy Star also reported separate incidents of Black owners raising their home values by 40 percent — but only after they stripped their houses of cultural identifiers and used white friends or loved ones as stand-ins for themselves.

As home appraisal studies have shown that racial bias devalues Black-owned homes across the country, the danger for machine learning models is that this biased data might be fed into previously unbiased algorithms at face value. Once that happens, the algorithm might incorporate that bias into a predictive model, and thus become little more than a faster, more efficient edition of the racially biased human system that already exists.

The answer to this conundrum, in the housing industry and many others, lies in carefully examining the training data in a machine learning model, and making the necessary adjustments to ensure the model’s unbiased accuracy. To understand how to accomplish this delicate task, we sat down with AI expert Daniela Moody of Arturo, who has firm ideas on how to help machine learning eliminate bias and finally “see things as they are.”

Daniela Moody

VP of AI • Arturo

Daniela Moody is vice president of AI at Arturo, a platform for AI-powered property analytics. She believes that the key to removing bias from machine learning is to have more than one person working on both model development and testing at the same time. With two skilled viewpoints and methodologies used to seek out and eliminate bias, Moody thinks that much greater clarity and success is possible.

What’s the most important thing to consider when choosing the right learning model for a given problem? And how does this help you get ahead of bias early on in the process?

The key to choosing the right learning model for a problem lies in understanding your data. Does it have sufficient size, variability and class representation? Is it sparse or not? What is the quality of the data — noise, processing artifacts and so on — and is the feature you’re trying to “learn” visually discernible in the data? I think of this effort generally as data conditioning art and science, and it is absolutely critical to any learning problem. Once you have satisfactory answers to all these questions, the arduous task of model selection and hyperparameter optimization begins. At this stage you want to be careful about machine learning model bias. Some ML models might have an unequal decision risk probability between false alarms and missed detections, some may be seeking to identify rare classes in large volumes of data, some may only “remember” a small set of training data at the time and they constantly adapt to new inputs — and these are just a few examples. At the end of the day, we have to assume we bring conscious and unconscious bias to dealing with our ML problems, and our ability to identify and mitigate those biases will determine the success or failure of our ML models.

What steps do you take to ensure your training data set is diverse and representative of different groups?

I like to employ Python data exploration and visualization tools and drive toward objective measurements of correlations, non-linear dependencies and class separation. From a qualitative perspective, it is important to visualize random samples from each class in their native domain, such as image chips, and verify that classes don’t include partial or occluded data, and that labels are not conflicting.

Past experiences are a great way to create not a bag of ML tools, but rather a robust and repeatable methodology for approaching any end-to-end ML problem.”

When it comes to testing and monitoring your model over time, what’s a strategy you’ve found to be particularly useful for identifying and eliminating bias?

I think my go-to strategy is to always have more than one person working on model development and testing. The reality is all of us perceive data and models through the lens of our past experiences. It is part of our nature, I believe, to continuously try to build upon that and avoid starting over every time a new ML problem presents itself. And now we might have just introduced the first source of bias to our new problem: we expect it to work out the same way as the previous one.

On the other hand, past experiences are a great way to create not a bag of ML tools, but rather a robust and repeatable methodology for approaching any end-to-end ML problem. If we now leverage at least a team of two, with their respective individual methodologies for tackling an ML problem, we might be pleasantly surprised by what we can all learn from one another and become perhaps a bit more self-aware about biases we didn’t think of before. We are very driven in the tech world to create great teams that are cognitively diverse and inclusive to mitigate individual bias, so what prevents us from leveraging that kind of teamwork when it comes to model development and testing?

Arturo is Hiring | View 1 Jobs

Recent Articles