The importance of machine learning model validation and how it works

David Weedmark2022-11-15 | 6 min read

Model validation is a core component of developing machine learning or artificial intelligence (ML/AI). While it’s separate from training and deployment, it should pervade the entire data science lifecycle.

What is model validation?

Model validation is a set of processes and activities designed to ensure that an ML/AI model is performing as it should, including both its design objectives and its utility for the end user. An important part of validation is testing the model, but validation doesn’t end there.

As an integral part of model risk management, validation is designed to ensure that the model doesn’t create more problems than it solves, and that it conforms to governance requirements. In addition to testing, the validation process includes examining the construction of the model, the tools used to create it and the data it used, to ensure that the model will run effectively.

The role of model validation

After a model has been trained, a process is required to ensure that the model is performing the way it was intended and that it solves the problem it was designed to solve. This is the purpose of model validation.

It’s important that the validation be done in an unbiased manner. For this reason, the validation team is usually independent of the data-science team that trained the model and those who will be using the model. Often, smaller organizations will contract model validation out to a third party. In some sectors that are highly regulated, this is often a team that is familiar with regulations, to ensure compliance.

Model validation vs. model evaluation

Model validation is completely separate from model evaluation. Evaluation is part of the training phase. It is done with the training data: You select an algorithm, train it on the training data and then compare its performance to other models.

Once evaluation is done, you can then move on to validation of the winning models. Validation is never done on the testing data, but rather on a fresh data set — the testing data.

How to validate a model

Model validation should be done after model testing but before deployment. Model validation may also be required right after deployment, particularly if any changes have been made to it. Additionally, validation should be done routinely after deployment, such as on an annual basis, as part of the monitoring process.

There are three primary areas of validation: input, calculation and output.

Input

The input component includes the assumptions and data used in model calculations. The data should be reconciled to its source and measured against industry benchmarks, as well as the team’s experience with this model or similar ones. If the data comes from another model, the parent model’s output used should also be validated.

Back-testing is often a part of this validation. In a predictive model, for example, the model can be provided with test data and then the model results can be compared to the actual outcomes from the tested data.

Calculation

It’s important to examine the model logic, to ensure that calculations are reasonable and stable and that input is incorporated correctly. Two common ways of testing the calculation component of a model are sensitivity testing and dynamic validation.

Sensitivity testing involves quantifying how the uncertainty in the model’s output corresponds to the uncertainty in its input data. Dynamic validation involves changing attributes while the model is running, to test its response.

Output

A model’s output includes not just the results of the calculations but also the format and presentation of the data. The output needs to be clear without risk of misleading users. A good way to test a model’s output is to compare it to the output of a similar model, if one is available. Two other testing methods are historical back-testing and version control.

Final thoughts

Without competent validation, your organization runs the potentially catastrophic risk of deploying a flawed model into production. In business, this can put a company in jeopardy and even a position of legal liability. A single flawed model used repeatedly in the same organization can cost millions, if not billions, in losses. In sectors like health and science, a poorly validated model puts human lives at risk.

Beyond the model itself, many organizations who leverage machine learning models, including KPMG and Axene Health Partners, include model governance, model risk management, and documentation oversight as fundamental components of the validation process. Of course, the validation process itself must also be thoroughly documented.

Model-driven organizations that take their data-science initiatives seriously rely on enterprise-grade MLOps platforms like Domino’s. Not only does Domino provide the libraries and collaboration tools that data-science teams need, it offers a wealth of documentation and governance tools that can direct the entire data science lifecycle.