Pandas缺失值
1 | import numpy as np |
1 | dct = {'name':['Python','Java','C++','Sql','Html','Css'], |
| name | score | pop | |
|---|---|---|---|
| 0 | Python | 90.0 | 65.0 |
| 1 | Java | 30.0 | 43.0 |
| 2 | C++ | NaN | NaN |
| 3 | Sql | NaN | 88.0 |
| 4 | Html | 78.0 | NaN |
| 5 | Css | NaN | 90.0 |
检查缺失值
- isnull()
- isna
1 | df.isnull().sum() |
name 0
score 3
pop 2
dtype: int64
name 0
score 3
pop 2
dtype: int64
处理缺失值
平均值填充缺失值
- fillna(Series.mean())
- fillna(Series.interpolate())
1 | # 填充缺失值:该列的平均值 |
| name | score | pop | |
|---|---|---|---|
| 0 | Python | 90.0 | 65.0 |
| 1 | Java | 30.0 | 43.0 |
| 2 | C++ | 66.0 | 65.5 |
| 3 | Sql | 66.0 | 88.0 |
| 4 | Html | 78.0 | 89.0 |
| 5 | Css | 66.0 | 90.0 |
1 | dct = {'name':['Python','Java','C++','Sql','Html','Css'], |
| name | score | pop | |
|---|---|---|---|
| 0 | Python | 90.0 | 65.0 |
| 1 | Java | 30.0 | 43.0 |
| 2 | C++ | NaN | NaN |
| 3 | Sql | NaN | 88.0 |
| 4 | Html | 78.0 | NaN |
| 5 | Css | NaN | 90.0 |
| name | score | pop | |
|---|---|---|---|
| 0 | Python | 90.0 | 65.0 |
| 1 | Java | 30.0 | 43.0 |
| 2 | C++ | 46.0 | 65.5 |
| 3 | Sql | 62.0 | 88.0 |
| 4 | Html | 78.0 | 89.0 |
| 5 | Css | 78.0 | 90.0 |
缺失值前后的值填充
- fillna(axis=0/1,method = ‘ffill/bfill/pad’)
- axis : 0: 纵向, 1:横向
- method: ffill: 填充前值,bfill:填充后值,pad:优先取前值,否则取后值
1 | dct = {'name':['Python','Java','C++','Sql','Html','Css'], |
| name | score | pop | |
|---|---|---|---|
| 0 | Python | 90.0 | 65.0 |
| 1 | Java | 30.0 | 43.0 |
| 2 | C++ | NaN | NaN |
| 3 | Sql | NaN | 88.0 |
| 4 | Html | 78.0 | NaN |
| 5 | Css | NaN | 90.0 |
| name | score | pop | |
|---|---|---|---|
| 0 | Python | 90.0 | 65.0 |
| 1 | Java | 30.0 | 43.0 |
| 2 | C++ | 30.0 | 43.0 |
| 3 | Sql | 30.0 | 88.0 |
| 4 | Html | 78.0 | 88.0 |
| 5 | Css | 78.0 | 90.0 |
1 | dct = {'name':['Python','Java','C++','Sql','Html','Css'], |
| name | score | pop | |
|---|---|---|---|
| 0 | Python | 90.0 | 65.0 |
| 1 | Java | 30.0 | 43.0 |
| 2 | C++ | NaN | NaN |
| 3 | Sql | NaN | 88.0 |
| 4 | Html | 78.0 | NaN |
| 5 | Css | NaN | 90.0 |
| name | score | pop | |
|---|---|---|---|
| 0 | Python | 90.0 | 65.0 |
| 1 | Java | 30.0 | 43.0 |
| 2 | C++ | 78.0 | 88.0 |
| 3 | Sql | 78.0 | 88.0 |
| 4 | Html | 78.0 | 90.0 |
| 5 | Css | NaN | 90.0 |
1 | dct = {'name':['Python','Java','C++','Sql','Html','Css'], |
| name | score | pop | |
|---|---|---|---|
| 0 | Python | 90.0 | 65.0 |
| 1 | Java | 30.0 | 43.0 |
| 2 | C++ | NaN | NaN |
| 3 | Sql | NaN | 88.0 |
| 4 | Html | 78.0 | NaN |
| 5 | Css | NaN | 90.0 |
| name | score | pop | |
|---|---|---|---|
| 0 | Python | 90.0 | 65.0 |
| 1 | Java | 30.0 | 43.0 |
| 2 | C++ | 30.0 | 43.0 |
| 3 | Sql | 30.0 | 88.0 |
| 4 | Html | 78.0 | 88.0 |
| 5 | Css | 78.0 | 90.0 |
删除缺失值
df.dropna(axis,how,threst,subset,inplace)- axis:0为行 1为列,default 0,数据删除维度
- how:{‘any’, ‘all’}, default ‘any’,any:删除带有nan的行;all:删除全为nan的行
- threst:int,保留至少 int 个非nan行
- subset:list,在特定列缺失值处理
- inplace:bool,是否修改源文件
1 | dct = {'name':['Python','Java','C++','Sql','Html','Css'], |
| name | score | pop | |
|---|---|---|---|
| 0 | Python | 90.0 | 65.0 |
| 1 | Java | 30.0 | 43.0 |
| 2 | C++ | NaN | NaN |
| 3 | Sql | NaN | 88.0 |
| 4 | Html | 78.0 | NaN |
| 5 | Css | NaN | 90.0 |
| name | score | pop | |
|---|---|---|---|
| 0 | Python | 90.0 | 65.0 |
| 1 | Java | 30.0 | 43.0 |
| name | |
|---|---|
| 0 | Python |
| 1 | Java |
| 2 | C++ |
| 3 | Sql |
| 4 | Html |
| 5 | Css |
1 | # thresh 和 how 不能同时使用 |
| name | score | pop | |
|---|---|---|---|
| 0 | Python | 90.0 | 65.0 |
| 1 | Java | 30.0 | 43.0 |
| 3 | Sql | NaN | 88.0 |
| 4 | Html | 78.0 | NaN |
| 5 | Css | NaN | 90.0 |
1 | df.dropna(axis=0, |
| name | score | pop | |
|---|---|---|---|
| 0 | Python | 90.0 | 65.0 |
| 1 | Java | 30.0 | 43.0 |
| 4 | Html | 78.0 | NaN |
na_action
- 主要应用在 map函数下
- None:默认值 忽略NA
- ignore:NA值将不会被传递给映射函数
- raise:如果存在缺失值,则抛出异常
1 | dct = {'name':['Python','Java','C++','Sql','Html','Css'], |
| name | score | pop | |
|---|---|---|---|
| 0 | Python | 90.0 | 65.0 |
| 1 | Java | 30.0 | 43.0 |
| 2 | C++ | NaN | NaN |
| 3 | Sql | NaN | 88.0 |
| 4 | Html | 78.0 | NaN |
| 5 | Css | NaN | 90.0 |
| name | score | pop | |
|---|---|---|---|
| 0 | Python | 90.00 | 65 |
| 1 | Java | 30.00 | 43 |
| 2 | C++ | nan | NaN |
| 3 | Sql | nan | 88 |
| 4 | Html | 78.00 | NaN |
| 5 | Css | nan | 90 |