4 Ways to Maintain Machine Learning Model Accuracy
By Josh Poduska2021-12-157 min read
Algorithms may be the toast of today’s high-performance technology races, but sometimes proponents forget that, like cars, models also need a regular tune-up. A highly visible and catastrophic AI model failure recently shamed Zillow, the online real estate company that was forced to shutter its home-buying business.
As reported in The Wall Street Journal and other sources, the application of its real-estate algorithm to house flipping ingloriously flopped to an early death. The company’s shares plunged 25% as it announced a quarterly loss of $328 million and reduction of its workforce by 25% (about 2,000 people) due to closing its “Zillow Offers” service. CEO Rich Barton told investors, “We've been unable to accurately forecast future home prices at different times in both directions by much more than we modeled as possible."
What is Machine Learning Model Accuracy?
Machine learning model accuracy is a measure of how well a machine learning model performs on new data, and it is important to understand in order to assess the model's usefulness. There are a variety of ways to measure accuracy, and the most appropriate metric will depend on the specifics of the problem. For instance, accuracy can be measured in terms of classification error, which is the number of misclassified examples divided by the total number of examples. Other metrics such as precision and recall may also be used, depending on the type of problem. It is important to select the right metric for the task at hand, and to understand how to interpret the results in order to make informed decisions about machine learning models.
What's the Difference Between Model Accuracy and Model Performance?
The terms model accuracy and model performance are often used interchangeably, but they actually refer to two different things. Model accuracy is a measure of how well a model predicts the correct output for a given input. In other words, it measures how accurate the model is in general. Model performance, on the other hand, is a measure of how well the model performs on a specific task. In other words, it measures how well the model can perform a specific function. For example, if you were trying to predict whether or not a person would vote for a certain candidate, model accuracy would measure how often the model correctly predicted which candidate the person would vote for. Model performance would measure how well the model predicted the outcome of the election (i.e., how many people actually voted for the candidate).
Improving Model Accuracy
Now that we have a clear understanding of model accuracy, let's explore some strategies for maintaining accuracy. By making just a few simple tweaks, we can significantly maintain and improve the accuracy of our models. Let's get started!
Planning for Model Risk
While there is no public consensus yet on why Zillow’s model did not work as planned, this blog is not about Zillow, per se. Our topic is about the lesson that all of us relying on data science should take to heart: Never assume a production model is “done”; something can always go wrong!
Even the best-performing model will eventually degrade for a variety of reasons: changes to products or policies can affect how customers behave; adversarial actors can adapt their behavior; data pipelines can break; and sometimes the world simply evolves. Any of these factors lead to data drift and concept drift, which can result in a drop of predictive accuracy.
To meet such challenges, a model-driven business must adopt a model monitoring policy designed to continuously improve model accuracy. Here are four ways that model monitoring can help you fix bad algorithms.
Retrain the Model
If a model has drifted, improving model accuracy can take the form of retraining it with fresher data, along with its associated ground truth labels, that is more representative of the prediction data. However, in cases where ground truth data is available, the training data set can be curated to mimic the distribution of prediction data, thereby reducing drift.
Watch for two types of drift: data drift and concept drift. For data drift, the patterns in production data that a deployed model uses for predictions gradually diverge from the patterns in the model’s original training data, which lowers predictive power of the model. Concept drift occurs when expectations of what constitutes a correct prediction change overtime – despite there being no change in the input data distribution.
Rollback the Model
Sometimes rolling back to a previous version of the model can fix performance issues. To enable this form of continuous improvement, you need an archive of each version of the model. You can then evaluate the performance of each prior model version against the current production version by simulating how it would have performed with the same inference data. If you find a prior version that performs better than the current model version, you can then deploy it as the champion model in production.
Fix the Model Pipeline
While drift may occur because the ground truth data has changed, sometimes it happens when unforeseen changes occur in the upstream data pipeline feeding prediction data into a model. Retraining with fresher data sourced from the data pipeline may fix the model or fixing the data pipeline itself may be easier.
Repair the Model
In order to ensure you are continuously improving model accuracy, you may sometimes need to repair a model in a development environment. To diagnose the cause of the model degradation it helps to use a platform that supports reproducibility, where you can effectively simulate the production environment in a development setting. Once a suspected cause is identified you can choose the best method for repairing the model, whether modifying hyperparameters, or something more invasive.
Model monitoring is a critical, ongoing process that is essential for a model-driven business. However, an unmonitored model can lead to disastrous business results. If you’d like to learn more about how to create a rigorous model-monitoring process for your data science program, read our new white paper, Don’t Let Models Derail You: Strategies to Control Risk with Model Monitoring.
Josh Poduska is the Chief Field Data Scientist at Domino Data Lab and has 20+ years of experience in analytics. Josh has built data science solutions across domains including manufacturing, public sector, and retail. Josh has also managed teams and led data science strategy at multiple companies, and he currently manages Domino’s Field Data Science team. Josh has a Masters in Applied Statistics from Cornell University. You can connect with Josh at https://www.linkedin.com/in/joshpoduska/
Subscribe to the Domino Newsletter
Receive data science tips and tutorials from leading Data Science leaders, right to your inbox.