Pandas Advanced Selection & Filtering

Pandas Advanced Selection & Filtering

In the previous tutorial, we learned how to select rows and columns using manual indices (like row 0 or row 1). However, in real-world data analysis, we rarely know the row numbers.

Instead, we usually want to select data based on logic (e.g., "Show me all students who passed"). In this tutorial, we will learn how to chain selections and use Boolean Indexing to filter data.


๐Ÿ”น 1. Chaining Selections (Rows + Columns)

Sometimes you need to grab specific rows and immediately select specific columns from them. There are two ways to do this: the "One-Liner" way and the "Step-by-Step" way.

Method A: The "One-Liner" (Chaining)

You can chain the commands together. First, we select the rows using .loc, and then immediately select the column using [[ ]].

๐Ÿงช Code Example


import pandas as pd

data = {
    "Name": ["Atul", "Aman", "Harsh", "Rishi", "Devansh"],
    "Class": [11, 39, 47, 56, 76],
    "Status": ["pass", "fail", "pass", "pass", "fail"]
}
table = pd.DataFrame(data)

# Select rows 1 and 4, then keep only the "Status" column
print(table.loc[[1, 4]][["Status"]])

๐Ÿ“ค Output


  Status
1   fail
4   fail

Method B: The "Smart Way" (Step-by-Step)

While one-liners are fast, breaking the code into steps makes it easier to debug and understand. You create an intermediate DataFrame first.

๐Ÿงช Code Example


# Step 1: Select the specific rows first
selected_row = table.loc[[1, 4]]
print("--- Selected Rows ---")
print(selected_row)

print("*" * 30)

# Step 2: Select the column from that result
selected_column = selected_row[["Status"]]
print("--- Final Result ---")
print(selected_column)

๐Ÿ“ค Output


--- Selected Rows ---
      Name  Class Status
1     Aman     39   fail
4  Devansh     76   fail
******************************
--- Final Result ---
  Status
1   fail
4   fail

๐Ÿ”น 2. Filtering by Condition (Boolean Indexing)

This is the most essential skill in Pandas. Instead of asking for "Row 0", we ask for "Rows where Status is pass". This is called Filtering.

A. Basic Filtering

We create a condition inside the brackets: table["Status"] == "pass". Pandas checks every row, and if the condition is True, it keeps the row.

๐Ÿงช Code Example


# Select only rows where Status is "pass"
pass_table = table[table["Status"] == "pass"]
print(pass_table)

๐Ÿ“ค Output


    Name  Class Status
0   Atul     11   pass
2  Harsh     47   pass
3  Rishi     56   pass

B. Filter + Column Selection

Often, you want to filter the data first, and then only see specific columns (e.g., "Get the Names of everyone who passed").

๐Ÿงช Code Example


# Step 1: Filter the table
pass_table = table[table["Status"] == "pass"]

# Step 2: Select the Name column from the filtered table
pass_name = pass_table[["Name"]]

print(pass_name)

๐Ÿ“ค Output


    Name
0   Atul
2  Harsh
3  Rishi

๐Ÿ”น 3. Filtering with Multiple Conditions

What if you have two requirements? For example, you want students who Passed AND are in Class 11.

In Pandas, we use the & symbol for "AND".

⚠ Important Syntax Rule: You must wrap each condition in parentheses ( ).

๐Ÿงช Code Example


# Filter: Status is "pass" AND Class is 11
pass_11_table = table[(table["Status"] == "pass") & (table["Class"] == 11)]

# Select just the Name column
pass_11_name = pass_11_table[["Name"]]

print(pass_11_name)

๐Ÿ“ค Output


   Name
0  Atul

๐Ÿ“ Explanation

  • (table["Status"] == "pass"): Finds all students who passed.
  • &: The logic operator for "AND".
  • (table["Class"] == 11): Finds all students in class 11.
  • Pandas returns only the rows where both statements are True.

๐Ÿ“Œ Summary

Technique Syntax Code Use Case
Chaining df.loc[[1,2]][["Col"]] Quick extraction of specific cells.
Condition df[df["Col"] == "Val"] Filtering rows by value.
Multi-Condition df[(Cond1) & (Cond2)] Strict filtering with AND logic.

Post a Comment

Do Leave Your Comments...

Previous Post Next Post

Contact Form