Cleaning and Preparing Time-Based Data for Building Experts: A Beginner’s Guide to Timeseries Cleaning and Data Imputation

Christian Baghai
2 min readFeb 8, 2023

--

Photo by Chris Liverani on Unsplash

Today, we’re going to dive into the first tutorial of our series on “Building Experts”. In this tutorial, we’ll focus on two key aspects of data cleaning and preparation: timeseries cleaning and data imputation. These two processes are critical for the success of any data science project, especially when working with time-based data.

First, let’s take a look at timeseries cleaning. Timeseries data is data collected over a specific period of time, and it’s often used in building science to understand things like energy consumption, weather patterns, and occupancy levels. However, when working with timeseries data, it’s common to encounter issues such as missing values, irregular intervals, and outliers.

To clean timeseries data, we need to first identify these issues and then remove or correct them. To identify missing values, we can plot the data and visually inspect it for gaps. To correct missing values, we can use methods such as forward and backward filling, linear interpolation, or spline interpolation. These methods fill in missing values with estimates based on the surrounding data.

Next, let’s talk about data imputation. Data imputation is the process of filling in missing or incorrect values in a dataset. There are several methods to do this, including mean imputation, median imputation, and multiple imputation. The method you choose will depend on the nature of your data and the problem you’re trying to solve.

Mean imputation replaces missing values with the mean of the existing data, while median imputation replaces missing values with the median. Multiple imputation, on the other hand, uses statistical models to generate multiple sets of imputed data, which are then combined to provide a more accurate estimate.

In conclusion, timeseries cleaning and data imputation are two critical processes for preparing and cleaning data before analysis. By removing errors, outliers, and missing values, we can ensure that our data is accurate and reliable, which is essential for building experts and making informed decisions based on data.

We hope you enjoyed this tutorial on timeseries cleaning and data imputation. Stay tuned for our next tutorial in the “Building Experts” series where we’ll dive into more advanced topics and explore new ways to use data science to improve the built environment. Until then, happy cleaning and imputing! 🔍🧼

--

--

Christian Baghai
Christian Baghai

No responses yet