NumPy: Replace NaN (np.nan) using np.nan_to_num() and np.isnan() | note.nkmk.me (2024)

In NumPy, to replace NaN (np.nan) in an array (ndarray) with any values like 0, use np.nan_to_num(). Additionally, while np.isnan() is primarily used to identify NaN, its results can be used to replace NaN. You can also replace NaN with the mean of the non-NaN values.

Contents

  • NaN (np.nan) in NumPy
  • Replace NaN using np.genfromtxt() with filling_values
  • Replace NaN using np.nan_to_num()
  • Identify and replace NaN using np.isnan()

To delete the row or column containing NaN instead of replacing them, see the following article.

  • NumPy: Remove NaN (np.nan) from an array

For handling missing values in pandas, see the following article.

  • Missing values in pandas (nan, None, pd.NA)

The NumPy version used in this article is as follows. Note that functionality may vary between versions.

import numpy as npprint(np.__version__)# 1.26.1

NaN (np.nan) in NumPy

When you read a CSV file with np.genfromtxt(), by default, missing data is represented as NaN (Not a Number). These are displayed as nan when output with print().

  • sample_nan.csv
  • NumPy: Read and write CSV files (np.loadtxt, np.genfromtxt, np.savetxt)
a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')print(a)# [[11. 12. nan 14.]# [21. nan nan 24.]# [31. 32. 33. 34.]]

If you want to generate NaN explicitly, use np.nan or float('nan'). You can also import the math module of the standard library and use math.nan. They are all the same.

  • What is nan in Python (float('nan'), math.nan, np.nan)

Since comparing NaN with == returns False, use np.isnan() to check if the value is NaN.

print(np.nan == np.nan)# Falseprint(np.isnan(np.nan))# True

np.isnan() can also check if each element of an ndarray is NaN.

print(a_nan == np.nan)# [False False False False]print(np.isnan(a_nan))# [False False True True]

Replace NaN using np.genfromtxt() with filling_values

To fill missing data in a CSV file, use the filling_values argument with np.genfromtxt().

For example, fill NaN with 0:

a_fill = np.genfromtxt('data/src/sample_nan.csv', delimiter=',', filling_values=0)print(a_fill)# [[11. 12. 0. 14.]# [21. 0. 0. 24.]# [31. 32. 33. 34.]]

Note that filling with the mean of the non-NaN values is not possible during the initial read with np.genfromtxt(). For this, refer to the method described below.

Replace NaN using np.nan_to_num()

You can use np.nan_to_num() to replace NaN.

Note that np.nan_to_num() also replaces infinity (inf). See the following article for details.

  • Infinity (inf) in Python

When you specify the array (ndarray) as the first argument to np.nan_to_num(), by default, a new ndarray is generated with NaN replaced by 0. The original ndarray remains unchanged.

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')print(a)# [[11. 12. nan 14.]# [21. nan nan 24.]# [31. 32. 33. 34.]]print(np.nan_to_num(a))# [[11. 12. 0. 14.]# [21. 0. 0. 24.]# [31. 32. 33. 34.]]print(a)# [[11. 12. nan 14.]# [21. nan nan 24.]# [31. 32. 33. 34.]]

Setting the second argument (copy) to False modifies the original ndarray.

np.nan_to_num(a, copy=False)print(a)# [[11. 12. 0. 14.]# [21. 0. 0. 24.]# [31. 32. 33. 34.]]

From NumPy version 1.17, the third argument (nan) allows you to specify the value to replace NaN.

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')print(a)# [[11. 12. nan 14.]# [21. nan nan 24.]# [31. 32. 33. 34.]]print(np.nan_to_num(a, nan=-1))# [[11. 12. -1. 14.]# [21. -1. -1. 24.]# [31. 32. 33. 34.]]

You can use np.nanmean() to replace NaN with the mean of non-NaN values. This replacement can be done for the entire array or separately for each row or column.

  • NumPy: Functions ignoring NaN (np.nansum, np.nanmean, etc.)
print(np.nanmean(a))# 23.555555555555557print(np.nan_to_num(a, nan=np.nanmean(a)))# [[11. 12. 23.55555556 14. ]# [21. 23.55555556 23.55555556 24. ]# [31. 32. 33. 34. ]]print(np.nanmean(a, axis=0, keepdims=True))# [[21. 22. 33. 24.]]print(np.nan_to_num(a, nan=np.nanmean(a, axis=0, keepdims=True)))# [[11. 12. 33. 14.]# [21. 22. 33. 24.]# [31. 32. 33. 34.]]print(np.nanmean(a, axis=1, keepdims=True))# [[12.33333333]# [22.5 ]# [32.5 ]]print(np.nan_to_num(a, nan=np.nanmean(a, axis=1, keepdims=True)))# [[11. 12. 12.33333333 14. ]# [21. 22.5 22.5 24. ]# [31. 32. 33. 34. ]]

If you specify an ndarray as the third argument (nan) in np.nan_to_num(), it will be broadcast to match the shape of the ndarray specified as the first argument.

  • NumPy: Broadcasting rules and examples

If keepdims is set to True in np.nanmean(), the resulting array is broadcast correctly. While keepdims=False (default) is fine for axis=0, it is less error-prone to always set keepdims=True regardless of the axis.

  • NumPy: Meaning of the axis parameter (0, 1, -1)

For versions before 1.17, where the nan argument is not implemented, use the following method to replace NaN with values other than 0.

Identify and replace NaN using np.isnan()

You can use np.isnan() to check if values in an ndarray are NaN.

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')print(a)# [[11. 12. nan 14.]# [21. nan nan 24.]# [31. 32. 33. 34.]]print(np.isnan(a))# [[False False True False]# [False True True False]# [False False False False]]

With the result from np.isnan(), you can assign a specific value to replace NaN.

a[np.isnan(a)] = 0print(a)# [[11. 12. 0. 14.]# [21. 0. 0. 24.]# [31. 32. 33. 34.]]

You can also use np.nanmean() to replace NaN with the mean of the non-missing values.

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')a[np.isnan(a)] = np.nanmean(a)print(a)# [[11. 12. 23.55555556 14. ]# [21. 23.55555556 23.55555556 24. ]# [31. 32. 33. 34. ]]

To replace with the mean value for each row or column, use np.where().

  • numpy.where(): Manipulate elements depending on conditions
a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')print(np.where(np.isnan(a), np.nanmean(a, axis=0, keepdims=True), a))# [[11. 12. 33. 14.]# [21. 22. 33. 24.]# [31. 32. 33. 34.]]print(np.where(np.isnan(a), np.nanmean(a, axis=1, keepdims=True), a))# [[11. 12. 12.33333333 14. ]# [21. 22.5 22.5 24. ]# [31. 32. 33. 34. ]]
NumPy: Replace NaN (np.nan) using np.nan_to_num() and np.isnan() | note.nkmk.me (2024)

References

Top Articles
Latest Posts
Article information

Author: Zonia Mosciski DO

Last Updated:

Views: 5585

Rating: 4 / 5 (51 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Zonia Mosciski DO

Birthday: 1996-05-16

Address: Suite 228 919 Deana Ford, Lake Meridithberg, NE 60017-4257

Phone: +2613987384138

Job: Chief Retail Officer

Hobby: Tai chi, Dowsing, Poi, Letterboxing, Watching movies, Video gaming, Singing

Introduction: My name is Zonia Mosciski DO, I am a enchanting, joyous, lovely, successful, hilarious, tender, outstanding person who loves writing and wants to share my knowledge and understanding with you.