In NumPy, to replace NaN
(np.nan
) in an array (ndarray
) with any values like 0
, use np.nan_to_num()
. Additionally, while np.isnan()
is primarily used to identify NaN
, its results can be used to replace NaN
. You can also replace NaN
with the mean of the non-NaN values.
Contents
- NaN (np.nan) in NumPy
- Replace NaN using np.genfromtxt() with filling_values
- Replace NaN using np.nan_to_num()
- Identify and replace NaN using np.isnan()
To delete the row or column containing NaN
instead of replacing them, see the following article.
- NumPy: Remove NaN (np.nan) from an array
For handling missing values in pandas, see the following article.
- Missing values in pandas (nan, None, pd.NA)
The NumPy version used in this article is as follows. Note that functionality may vary between versions.
import numpy as npprint(np.__version__)# 1.26.1
source: numpy_nan_replace.py
NaN
(np.nan
) in NumPy
When you read a CSV file with np.genfromtxt()
, by default, missing data is represented as NaN
(Not a Number). These are displayed as nan
when output with print()
.
- sample_nan.csv
- NumPy: Read and write CSV files (np.loadtxt, np.genfromtxt, np.savetxt)
a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')print(a)# [[11. 12. nan 14.]# [21. nan nan 24.]# [31. 32. 33. 34.]]
source: numpy_nan_replace.py
If you want to generate NaN
explicitly, use np.nan
or float('nan')
. You can also import the math
module of the standard library and use math.nan
. They are all the same.
- What is nan in Python (float('nan'), math.nan, np.nan)
a_nan = np.array([0, 1, np.nan, float('nan')])print(a_nan)# [ 0. 1. nan nan]
Since comparing NaN
with ==
returns False
, use np.isnan()
to check if the value is NaN
.
print(np.nan == np.nan)# Falseprint(np.isnan(np.nan))# True
source: numpy_nan_replace.py
np.isnan()
can also check if each element of an ndarray
is NaN
.
print(a_nan == np.nan)# [False False False False]print(np.isnan(a_nan))# [False False True True]
source: numpy_nan_replace.py
Replace NaN
using np.genfromtxt()
with filling_values
To fill missing data in a CSV file, use the filling_values
argument with np.genfromtxt()
.
For example, fill NaN
with 0
:
a_fill = np.genfromtxt('data/src/sample_nan.csv', delimiter=',', filling_values=0)print(a_fill)# [[11. 12. 0. 14.]# [21. 0. 0. 24.]# [31. 32. 33. 34.]]
source: numpy_nan_replace.py
Note that filling with the mean of the non-NaN values is not possible during the initial read with np.genfromtxt()
. For this, refer to the method described below.
Replace NaN
using np.nan_to_num()
You can use np.nan_to_num()
to replace NaN
.
Note that np.nan_to_num()
also replaces infinity (inf
). See the following article for details.
- Infinity (inf) in Python
When you specify the array (ndarray
) as the first argument to np.nan_to_num()
, by default, a new ndarray
is generated with NaN
replaced by 0
. The original ndarray
remains unchanged.
a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')print(a)# [[11. 12. nan 14.]# [21. nan nan 24.]# [31. 32. 33. 34.]]print(np.nan_to_num(a))# [[11. 12. 0. 14.]# [21. 0. 0. 24.]# [31. 32. 33. 34.]]print(a)# [[11. 12. nan 14.]# [21. nan nan 24.]# [31. 32. 33. 34.]]
source: numpy_nan_replace.py
Setting the second argument (copy
) to False
modifies the original ndarray.
np.nan_to_num(a, copy=False)print(a)# [[11. 12. 0. 14.]# [21. 0. 0. 24.]# [31. 32. 33. 34.]]
source: numpy_nan_replace.py
From NumPy version 1.17, the third argument (nan
) allows you to specify the value to replace NaN
.
a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')print(a)# [[11. 12. nan 14.]# [21. nan nan 24.]# [31. 32. 33. 34.]]print(np.nan_to_num(a, nan=-1))# [[11. 12. -1. 14.]# [21. -1. -1. 24.]# [31. 32. 33. 34.]]
source: numpy_nan_replace.py
You can use np.nanmean()
to replace NaN
with the mean of non-NaN values. This replacement can be done for the entire array or separately for each row or column.
- NumPy: Functions ignoring NaN (np.nansum, np.nanmean, etc.)
print(np.nanmean(a))# 23.555555555555557print(np.nan_to_num(a, nan=np.nanmean(a)))# [[11. 12. 23.55555556 14. ]# [21. 23.55555556 23.55555556 24. ]# [31. 32. 33. 34. ]]print(np.nanmean(a, axis=0, keepdims=True))# [[21. 22. 33. 24.]]print(np.nan_to_num(a, nan=np.nanmean(a, axis=0, keepdims=True)))# [[11. 12. 33. 14.]# [21. 22. 33. 24.]# [31. 32. 33. 34.]]print(np.nanmean(a, axis=1, keepdims=True))# [[12.33333333]# [22.5 ]# [32.5 ]]print(np.nan_to_num(a, nan=np.nanmean(a, axis=1, keepdims=True)))# [[11. 12. 12.33333333 14. ]# [21. 22.5 22.5 24. ]# [31. 32. 33. 34. ]]
source: numpy_nan_replace.py
If you specify an ndarray
as the third argument (nan
) in np.nan_to_num()
, it will be broadcast to match the shape of the ndarray
specified as the first argument.
- NumPy: Broadcasting rules and examples
If keepdims
is set to True
in np.nanmean()
, the resulting array is broadcast correctly. While keepdims=False
(default) is fine for axis=0
, it is less error-prone to always set keepdims=True
regardless of the axis.
- NumPy: Meaning of the axis parameter (0, 1, -1)
For versions before 1.17, where the nan
argument is not implemented, use the following method to replace NaN
with values other than 0
.
Identify and replace NaN
using np.isnan()
You can use np.isnan()
to check if values in an ndarray
are NaN
.
a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')print(a)# [[11. 12. nan 14.]# [21. nan nan 24.]# [31. 32. 33. 34.]]print(np.isnan(a))# [[False False True False]# [False True True False]# [False False False False]]
source: numpy_nan_replace.py
With the result from np.isnan()
, you can assign a specific value to replace NaN
.
a[np.isnan(a)] = 0print(a)# [[11. 12. 0. 14.]# [21. 0. 0. 24.]# [31. 32. 33. 34.]]
source: numpy_nan_replace.py
You can also use np.nanmean()
to replace NaN
with the mean of the non-missing values.
a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')a[np.isnan(a)] = np.nanmean(a)print(a)# [[11. 12. 23.55555556 14. ]# [21. 23.55555556 23.55555556 24. ]# [31. 32. 33. 34. ]]
source: numpy_nan_replace.py
To replace with the mean value for each row or column, use np.where()
.
- numpy.where(): Manipulate elements depending on conditions
a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')print(np.where(np.isnan(a), np.nanmean(a, axis=0, keepdims=True), a))# [[11. 12. 33. 14.]# [21. 22. 33. 24.]# [31. 32. 33. 34.]]print(np.where(np.isnan(a), np.nanmean(a, axis=1, keepdims=True), a))# [[11. 12. 12.33333333 14. ]# [21. 22.5 22.5 24. ]# [31. 32. 33. 34. ]]
source: numpy_nan_replace.py