pandas: Replace NaN (missing values) with fillna() | note.nkmk.me (2024)

You can replace NaN in pandas.DataFrame and pandas.Series with any value using the fillna() method.

Contents

  • Replace NaN with the same value
  • Replace NaN with different values for each column
  • Replace NaN with mean, median, mode, etc., for each column
  • Replace NaN with previous/following valid values: method, limit
  • Update the original object: inplace
  • For pandas.Series

While this article primarily deals with NaN (Not a Number), it's important to note that in pandas, None is also treated as a missing value.

  • Missing values in pandas (nan, None, pd.NA)

To fill missing values with linear or spline interpolation, consider using the interpolate() method.

  • pandas: Interpolate NaN (missing values) with interpolate()

See the following article on extracting, removing, and counting missing values.

  • pandas: Find rows/columns with NaN (missing values)
  • pandas: Remove NaN (missing values) with dropna()
  • pandas: Detect and count NaN (missing values) with isnull(), isna()

The sample code in this article uses pandas version 2.0.3. As an example, read a CSV file with missing values.

import pandas as pdprint(pd.__version__)# 2.0.3df = pd.read_csv('data/src/sample_pandas_normal_nan.csv')print(df)# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 NaN NaN NaN NaN NaN# 2 Charlie NaN CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen NaN CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

Replace NaN with the same value

By specifying the scalar value for the first argument value in fillna(), all NaN values are replaced with this value.

print(df.fillna(0))# name age state point other# 0 Alice 24.0 NY 0.0 0.0# 1 0 0.0 0 0.0 0.0# 2 Charlie 0.0 CA 0.0 0.0# 3 Dave 68.0 TX 70.0 0.0# 4 Ellen 0.0 CA 88.0 0.0# 5 Frank 30.0 0 0.0 0.0

Note that the data type (dtype) of a column of numbers including NaN is float, so even if you replace NaN with an integer number, the data type remains float. If you want to convert it to int, use astype().

  • pandas: How to use astype() to cast dtype of DataFrame

Replace NaN with different values for each column

By specifying a dictionary (dict) for the first argument value in fillna(), you can assign different values to each column.

You can specify a dictionary in the form {column_name: value}.

NaN in unspecified columns are not replaced and thus remain as they are. Furthermore, any key not matching a column name is simply ignored.

print(df.fillna({'name': 'XXX', 'age': 20, 'ZZZ': 100}))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 XXX 20.0 NaN NaN NaN# 2 Charlie 20.0 CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 20.0 CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

You can also specify Series. The labels of Series correspond to the key of dict.

s_for_fill = pd.Series(['XXX', 20, 100], index=['name', 'age', 'ZZZ'])print(s_for_fill)# name XXX# age 20# ZZZ 100# dtype: objectprint(df.fillna(s_for_fill))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 XXX 20.0 NaN NaN NaN# 2 Charlie 20.0 CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 20.0 CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

The mean() method can be used to calculate the mean of each column, returning a Series. NaN is excluded, but the result for a column where all elements are NaN is NaN. The numeric_only argument can be set to True to include only numeric columns.

print(df.mean(numeric_only=True))# age 40.666667# point 79.000000# other NaN# dtype: float64

If you specify this Series for the first argument value in fillna(), it replaces NaN in the relevant column with the mean.

print(df.fillna(df.mean(numeric_only=True)))# name age state point other# 0 Alice 24.000000 NY 79.0 NaN# 1 NaN 40.666667 NaN 79.0 NaN# 2 Charlie 40.666667 CA 79.0 NaN# 3 Dave 68.000000 TX 70.0 NaN# 4 Ellen 40.666667 CA 88.0 NaN# 5 Frank 30.000000 NaN 79.0 NaN

Similarly, to replace NaN values with the median, use the median() method. If the number of elements is even, the average of the two median values is returned.

print(df.fillna(df.median(numeric_only=True)))# name age state point other# 0 Alice 24.0 NY 79.0 NaN# 1 NaN 30.0 NaN 79.0 NaN# 2 Charlie 30.0 CA 79.0 NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 30.0 CA 88.0 NaN# 5 Frank 30.0 NaN 79.0 NaN

The mode can be obtained with the mode() method. Since mode() returns a DataFrame, and in this example, iloc[0] is used to retrieve the first row as a Series. Please note that mode() can also handle strings.

print(df.fillna(df.mode().iloc[0]))# name age state point other# 0 Alice 24.0 NY 70.0 NaN# 1 Alice 24.0 CA 70.0 NaN# 2 Charlie 24.0 CA 70.0 NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 24.0 CA 88.0 NaN# 5 Frank 30.0 CA 70.0 NaN

Replace NaN with previous/following valid values: method, limit

The method argument of fillna() can be used to replace NaN with previous/following valid values.

If method is set to 'ffill' or 'pad', NaN are replaced with previous valid values (= forward fill), and if 'bfill' or 'backfill', they are replaced with the following valid values (= backward fill).

print(df.fillna(method='ffill'))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 Alice 24.0 NY NaN NaN# 2 Charlie 24.0 CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 68.0 CA 88.0 NaN# 5 Frank 30.0 CA 88.0 NaNprint(df.fillna(method='bfill'))# name age state point other# 0 Alice 24.0 NY 70.0 NaN# 1 Charlie 68.0 CA 70.0 NaN# 2 Charlie 68.0 CA 70.0 NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 30.0 CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

If the method argument is specified, as in the example above, all consecutive NaN will be replaced by default. The limit argument can be used to specify the maximum number of consecutive replacements.

print(df.fillna(method='ffill', limit=1))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 Alice 24.0 NY NaN NaN# 2 Charlie NaN CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 68.0 CA 88.0 NaN# 5 Frank 30.0 CA 88.0 NaNprint(df.fillna(method='bfill', limit=1))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 Charlie NaN CA NaN NaN# 2 Charlie 68.0 CA 70.0 NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 30.0 CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

Although it might not be a common use case, you can set the axis argument to 1 or 'columns' to replace NaN with the values from the left and right.

print(df.fillna(method='ffill', axis=1))# name age state point other# 0 Alice 24.0 NY NY NY# 1 NaN NaN NaN NaN NaN# 2 Charlie Charlie CA CA CA# 3 Dave 68.0 TX 70.0 70.0# 4 Ellen Ellen CA 88.0 88.0# 5 Frank 30.0 30.0 30.0 30.0print(df.fillna(method='bfill', axis=1))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 NaN NaN NaN NaN NaN# 2 Charlie CA CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen CA CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

Methods that correspond to the method argument are also provided individually.

ffill() is equivalent to fillna(method='ffill'), and bfill() is equivalent to fillna(method='bfill'). You can also specify limit.

print(df.ffill())# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 Alice 24.0 NY NaN NaN# 2 Charlie 24.0 CA NaN NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 68.0 CA 88.0 NaN# 5 Frank 30.0 CA 88.0 NaNprint(df.bfill(limit=1))# name age state point other# 0 Alice 24.0 NY NaN NaN# 1 Charlie NaN CA NaN NaN# 2 Charlie 68.0 CA 70.0 NaN# 3 Dave 68.0 TX 70.0 NaN# 4 Ellen 30.0 CA 88.0 NaN# 5 Frank 30.0 NaN NaN NaN

pad() and backfill() are also provided, but have been deprecated since version 2.0.0.

Update the original object: inplace

By default, as shown above, a new object is returned without changing the original. However, if inplace=True, the original object will be updated in place.

df.fillna(0, inplace=True)print(df)# name age state point other# 0 Alice 24.0 NY 0.0 0.0# 1 0 0.0 0 0.0 0.0# 2 Charlie 0.0 CA 0.0 0.0# 3 Dave 68.0 TX 70.0 0.0# 4 Ellen 0.0 CA 88.0 0.0# 5 Frank 30.0 0 0.0 0.0

For pandas.Series

As demonstrated in the previous DataFrame examples, you can also apply fillna() to Series.

s = pd.read_csv('data/src/sample_pandas_normal_nan.csv')['age']print(s)# 0 24.0# 1 NaN# 2 NaN# 3 68.0# 4 NaN# 5 30.0# Name: age, dtype: float64print(s.fillna(100))# 0 24.0# 1 100.0# 2 100.0# 3 68.0# 4 100.0# 5 30.0# Name: age, dtype: float64print(s.fillna({1: 100, 4: -100}))# 0 24.0# 1 100.0# 2 NaN# 3 68.0# 4 -100.0# 5 30.0# Name: age, dtype: float64print(s.fillna(method='bfill', limit=1))# 0 24.0# 1 NaN# 2 68.0# 3 68.0# 4 30.0# 5 30.0# Name: age, dtype: float64

Methods that correspond to the method argument are also provided individually for Series.

print(s.bfill(limit=1))# 0 24.0# 1 NaN# 2 68.0# 3 68.0# 4 30.0# 5 30.0# Name: age, dtype: float64

pad() and backfill() are also provided, but have been deprecated since version 2.0.0.

pandas: Replace NaN (missing values) with fillna() | note.nkmk.me (2024)
Top Articles
Latest Posts
Article information

Author: Arielle Torp

Last Updated:

Views: 6273

Rating: 4 / 5 (41 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Arielle Torp

Birthday: 1997-09-20

Address: 87313 Erdman Vista, North Dustinborough, WA 37563

Phone: +97216742823598

Job: Central Technology Officer

Hobby: Taekwondo, Macrame, Foreign language learning, Kite flying, Cooking, Skiing, Computer programming

Introduction: My name is Arielle Torp, I am a comfortable, kind, zealous, lovely, jolly, colorful, adventurous person who loves writing and wants to share my knowledge and understanding with you.