Working with Null Values

Real-world data is rarely clean. Often, you will encounter missing records, blank cells, or corrupted data. In Python/Pandas, these are represented as NaN (Not a Number) or None.

In this tutorial, we will learn how to detect, remove, and fill these missing values using the provided dataset.

🔹 1. Setup: Creating Data with Nulls

To simulate missing data, we use np.nan from the NumPy library.

🧪 Code Example


import pandas as pd
import numpy as np

# Creating a dataset with intentional missing values (np.nan)
data = {
    'name': ['Atul', 'Aman', 'Riya', 'Sneha', 'Dev'],
    'class': [10, 10, 11, 12, np.nan],       
    'roll': [1, 2, 3, 4, np.nan],
    'subject': ['Math', 'Science', 'English', np.nan, np.nan],
    'status': ['pass', 'fail', np.nan, 'pass', 'fail']
}

table = pd.DataFrame(data)
print(table)

📤 Output


    name  class  roll  subject status
0   Atul   10.0   1.0     Math   pass
1   Aman   10.0   2.0  Science   fail
2   Riya   11.0   3.0  English    NaN
3  Sneha   12.0   4.0      NaN   pass
4    Dev    NaN   NaN      NaN   fail

📝 Explanation

We imported numpy as np to access np.nan.
Pandas automatically detects np.nan as a missing value (displayed as NaN).
Notice that rows 2, 3, and 4 have holes in the data.

🔹 2. Finding Null Values

Before fixing data, we need to know exactly how much data is missing.

A. The `isna()` Method

This checks every single cell in the table. If it's null, it returns True; otherwise, False.

🧪 Code Example


print(table.isna())

📤 Output


    name  class   roll  subject  status
0  False  False  False    False   False
1  False  False  False    False   False
2  False  False  False    False    True
3  False  False  False     True   False
4  False   True   True     True   False

B. The `isna().sum()` Method (Count)

The boolean table above is hard to read. Instead, we sum up the "True" values to see the total errors per column.

🧪 Code Example


# Sums up the True values for each column
print(table.isna().sum())

📤 Output


name       0
class      1
roll       1
subject    2
status     1
dtype: int64

📝 Explanation

subject: 2 means there are 2 missing values in the "subject" column.
name: 0 means the name column is perfect (no missing data).

🔹 3. Removing Null Values (`dropna`)

One strategy to handle nulls is to simply delete the rows that contain them.

However, if we delete every row with a null, we might lose too much data. We use the thresh (Threshold) parameter to be smarter.

Rule: thresh=3 means "Keep the row ONLY if it has at least 3 valid (non-null) values."

🧪 Code Example


# Drop rows that have less than 3 valid values
print(table.dropna(thresh=3))

📤 Output


    name  class  roll  subject status
0   Atul   10.0   1.0     Math   pass
1   Aman   10.0   2.0  Science   fail
2   Riya   11.0   3.0  English    NaN
3  Sneha   12.0   4.0      NaN   pass

📝 Explanation

Row 4 (Dev) was deleted. Why?
- Valid values: "Dev" (Name), "fail" (Status).
- Total Valid = 2.
- Since 2 < 3, it is dropped.
Row 3 (Sneha) was kept. Why?
- Valid values: Name, Class, Roll, Status. (Subject is missing).
- Total Valid = 4.
- Since 4 >= 3, it is saved.

🔹 4. Filling Null Values (`fillna`)

Deleting data is risky. A safer option is Imputation: filling the empty cells with a meaningful default value.

A. Simple Fill (Fill All)

This replaces every single NaN in the entire DataFrame with the same value.

🧪 Code Example


# Replace ALL nulls with "Khali Hai"
print(table.fillna(value="Khali Hai"))

📤 Output


    name      class       roll    subject     status
0   Atul       10.0        1.0       Math       pass
1   Aman       10.0        2.0    Science       fail
2   Riya       11.0        3.0    English  Khali Hai
3  Sneha       12.0        4.0  Khali Hai       pass
4    Dev  Khali Hai  Khali Hai  Khali Hai       fail

B. Smart Fill (Dictionary Method)

Usually, you need different default values for different columns. (e.g., "Class" should be 0, but "Subject" should be text). We use a dictionary to map column names to their specific fill values.

🧪 Code Example


# Define rules for specific columns
fill_values = {
    "name": "name not found",
    "class": "class not found",
    "roll": "roll no not found",
    "subject": "no subject selected"
}

# Apply these rules
print(table.fillna(value=fill_values))

📤 Output


    name             class               roll              subject status
0   Atul              10.0                1.0                 Math   pass
1   Aman              10.0                2.0              Science   fail
2   Riya              11.0                3.0              English    NaN
3  Sneha              12.0                4.0  no subject selected   pass
4    Dev   class not found  roll no not found  no subject selected   fail

📝 Explanation

Row 4 (Dev): Class was NaN → became "class not found". Roll was NaN → became "roll no not found".
Row 3 (Sneha): Subject was NaN → became "no subject selected".
Row 2 (Riya): Status was NaN. Since we did not include "status" in our dictionary, it remains NaN.

📌 Summary

Method	Description
`df.isna().sum()`	Counts missing values per column.
`df.dropna(thresh=N)`	Keeps row ONLY if it has N valid values.
`df.fillna("Val")`	Fills all missing cells with "Val".
`df.fillna(dict)`	Fills specific columns with specific values.

Trending

Extract Text from Image using Python - python project

AI and MACHINE LEARNING ROADMAP: From Basic to Advanced

Text-to-Speech (TTS) in Python using pyttsx3

Text-to-Speech Converter using gTTS in Python

Build Your Own Gemini AI Chatbot in Python

Handling Null Values

Working with Null Values

🔹 1. Setup: Creating Data with Nulls

🧪 Code Example

📤 Output

📝 Explanation

🔹 2. Finding Null Values

A. The `isna()` Method

🧪 Code Example

📤 Output

B. The `isna().sum()` Method (Count)

🧪 Code Example

📤 Output

📝 Explanation

🔹 3. Removing Null Values (`dropna`)

🧪 Code Example

📤 Output

📝 Explanation

🔹 4. Filling Null Values (`fillna`)

A. Simple Fill (Fill All)

🧪 Code Example

📤 Output

B. Smart Fill (Dictionary Method)

🧪 Code Example

📤 Output

📝 Explanation

📌 Summary

Post a Comment

Contact Form

Handling Null Values

Working with Null Values

🔹 1. Setup: Creating Data with Nulls

🧪 Code Example

📤 Output

📝 Explanation

🔹 2. Finding Null Values

A. The isna() Method

🧪 Code Example

📤 Output

B. The isna().sum() Method (Count)

🧪 Code Example

📤 Output

📝 Explanation

🔹 3. Removing Null Values (dropna)

🧪 Code Example

📤 Output

📝 Explanation

🔹 4. Filling Null Values (fillna)

A. Simple Fill (Fill All)

🧪 Code Example

📤 Output

B. Smart Fill (Dictionary Method)

🧪 Code Example

📤 Output

📝 Explanation

📌 Summary

Post a Comment

Contact Form

A. The `isna()` Method

B. The `isna().sum()` Method (Count)

🔹 3. Removing Null Values (`dropna`)

🔹 4. Filling Null Values (`fillna`)