#7: Using the previous or next value
· 7 min read · Nov 11, 2021
--
All the images were created by the author unless stated otherwise.
Missing values might be the most undesired values in data science. We definitely do not want to have them. However, they are always around.
Since it is not reasonable to ignore missing values, we need to find ways to handle them efficiently and properly.
Pandas, being one of the best data analysis and manipulation libraries, is quite flexible in handling missing values.
In this article, we will go over 8 different methods to make the missing values go away without causing a lot of trouble. Which method fits best to a particular situation depends on the data and task.
Let’s start by creating a sample data frame and adding some missing values to it.
We have a data frame with 10 rows and 6 columns.
The next step is to add the missing values. We will use the loc method to select the row and column combinations and make them equal to “np.nan” which is one of the standard missing value representations.
Here is how the data frame looks now:
The item and measure 1 columns had integer values but they have been upcasted to float because of the missing values.
With Pandas 1.0, an integer type missing value representation (<NA>) was introduced so we can have missing values in integer columns as well. However, we need to explicitly declare the data type.