Model drift is the decay of models' predictive power as a result of the changes in real world environments. It is caused due to a variety of reasons including changes in the digital environment and ensuing changes in relationship between variables. There are two types of model drift,
Let’s understand this using a real-world scenario. Natural language processing algorithms have been used for spam filtering since their introduction. They have been used to classify emails as spam and non-spam to defend users from different types of attacks from spammers. Here is an example of an older spam email.
Models learn features such as a "very high amount," "lottery,” etc. to identify this kind of spam. But through the years spammers have improved and have introduced a lot of new spamming methods, which the model hasn’t seen before. This leads to the degradation of model performance. For example,
A model trained with data collected five or ten years ago can’t perform well against such new forms of spam. Here, one of the basic assumptions of machine learning has been violated. That is that future data will have similar distribution as the training data. It is necessary to identify these changes and make updates to models.
Models and data distributions should be continuously monitored to detect model drift. Some popular methods to quantify model drift are shown below.
Setting up pipelines and thresholds to detect model drift in production can be a tedious job. Domino’s integrated model monitoring tool helps you detect model drift for operationalized models. Domino will automatically alert you when drift, divergence, and data quality checks exceed thresholds. It's easy to drill down to model features to modify, retrain, and redeploy models quickly. Integrated model monitoring strealines monitoring setup with simplified configuration and reproducible development environments, designed to save time when diagnosing, rebuilding and redeploying models.