- Read Tutorial
- Watch Guide Video
In this section we're going to cover what I consider to be two of the most important topics of this course. And those are data management and preprocessing. You probably won't be too excited to hear this, but the unfortunate truth is that a lot of the real world data that you'll be working with is going to be incomplete, inconsistent or at best, just mismanaged. A few years ago, there was even a study released by Harvard Business Review that reported only 3% of the companies that they worked with had data that met basic quality standards. They also found that 47% of the collected data had critical errors.
That's why I believe preprocessing is so consequential. It allows us to address any possible error and turn raw data into manageable data.
So over the course of the next few guides, we'll be discussing the importance of proper data management, how to import and segment a data frame and we'll finish with a high level dissection of each step involved in preprocessing.