Also known as distributional shift, data drift is where changes in the underlying data causes models to become less accurate over time. Since models tend to be trained on an initial set of data and are then put into production, if the underlying data changes the model will become unaccurate.
There are many potential causes of data drift.
→
The process that produces the data could change. Example: An IoT sensor measuring data could become uncalibrated.
→
Unexpected changes to the data infrastructure: Example: A team serving data in an enterprise setting changes their data definitions.
→
General exogenous changes to the data. Example: Behavioral changes that cause people to drive faster or slower would change a traffic dataset.
To combat data drift, it's important to monitor the data and frequently retrain models.