(Should you haven’t read Part 1 yet, test it out here.)
Missing data in time-series evaluation is a recurring problem.
As we explored in Part 1, easy imputation techniques and even regression-based models-linear regression, decision trees can get us a good distance.
But what if we have to handle more subtle patterns and capture the fine-grained fluctuation within the complex time-series data?
In this text we are going to explore K-Nearest Neighbors. The strengths of this model include few assumptions with reference to nonlinear relationships in your data; hence, it becomes a flexible and robust solution for missing data imputation.
We might be using the identical mock energy production dataset that you just’ve already seen in Part 1, with 10% values missing, introduced randomly.
We are going to impute missing data in using a dataset that you would be able to easily generate yourself, allowing you to follow along and apply the techniques in real-time as you explore the method step-by-step!