The Crosstab (Frequency Table)

While pivot_table is great for summing and averaging numbers, sometimes you just want to count things.
"How many males vs. females are in each department?"
"How many students passed vs. failed in Class A?"

For this, Pandas gives us a specialized function called pd.crosstab().

🔹 Syntax

It is simpler than Pivot Table because it doesn't strictly need a DataFrame. You can pass columns directly.

pd.crosstab(index, columns, margins=True, normalize=...)

🔹 Example 1: Basic Frequency Count

Goal: Count how many employees of each Gender are in each Department.

🧪 The Code

import pandas as pd

# 1. Create Data
data = {
    'Department': ['HR', 'IT', 'HR', 'IT', 'Sales', 'Sales', 'IT'],
    'Gender': ['F', 'M', 'F', 'F', 'M', 'F', 'M']
}
df = pd.DataFrame(data)

# 2. Create Crosstab
# We just pass the two columns we want to compare
count_table = pd.crosstab(df['Department'], df['Gender'])

print(count_table)

📤 The Output

Gender F M Department HR 2 0 IT 1 2 Sales 1 1

📝 How it works:
Pandas simply counts the occurrences.

HR: Has 2 Females (F) and 0 Males (M).
IT: Has 1 Female and 2 Males.

🔹 Example 2: Percentages (Normalization)

This is the superpower of Crosstab. Instead of raw counts, you often want percentages.
"What % of the IT department is Male?"

We use the normalize parameter:
'index' = Calculate % across the row (Row total = 100%).
'columns' = Calculate % down the column (Col total = 100%).

🧪 The Code

# Show percentages per Department (Row)
percent_table = pd.crosstab(
    df['Department'], 
    df['Gender'], 
    normalize='index' # Important!
)

# Multiply by 100 to make it readable
print(percent_table * 100)

📤 The Output

Gender F M Department HR 100.000000 0.000000 IT 33.333333 66.666667 Sales 50.000000 50.000000

📝 Insight:
Look at the IT row.
Instead of saying "1 Female, 2 Males", it tells us "33% Female, 66% Male". This is much better for comparing large datasets.

🔹 Example 3: Adding Totals (Margins)

Just like Pivot Tables, we can add "All" row/column totals.

🧪 The Code

full_table = pd.crosstab(
    df['Department'], 
    df['Gender'], 
    margins=True, 
    margins_name="Total Employees"
)

print(full_table)

📤 The Output

Gender F M Total Employees Department HR 2 0 2 IT 1 2 3 Sales 1 1 2 Total Employees 4 3 7

⚔️ Pivot Table vs. Crosstab: The Difference

Since both functions create similar tables, beginners often get confused. Here is the definitive difference.

1. pd.pivot_table() is for AGGREGATION

Use this when you have Values (numbers) that you want to calculate (Sum, Mean, Max).

Example: "Calculate the Average Salary of Employees."

2. pd.crosstab() is for COUNTING

Use this when you just want to count Frequencies (How many times did X happen?).

Example: "Count how many Male vs Female employees exist."

🔍 Side-by-Side Code Comparison

Feature	pd.pivot_table()	pd.crosstab()
Primary Goal	Aggregation (Sum, Mean, Max)	Counting (Frequency)
Input Data	Must be a DataFrame	Can be Lists, Arrays, or Columns
Normalization	Difficult (Manual math needed)	Easy (Has `normalize` param)
Default Behavior	Calculates the Mean (Average)	Calculates the Count

💡 The Golden Rule:

Use Pivot Table if you are asking "How much?" (Money, Score, Height).
Use Crosstab if you are asking "How many?" (People, Items, Events).

Trending

Extract Text from Image using Python - python project

AI and MACHINE LEARNING ROADMAP: From Basic to Advanced

Text-to-Speech (TTS) in Python using pyttsx3

Text-to-Speech Converter using gTTS in Python

Build Your Own Gemini AI Chatbot in Python

Pandas Crosstab: Analyzing Frequencies

The Crosstab (Frequency Table)

🔹 Syntax

🔹 Example 1: Basic Frequency Count

🧪 The Code

📤 The Output

🔹 Example 2: Percentages (Normalization)

🧪 The Code

📤 The Output

🔹 Example 3: Adding Totals (Margins)

🧪 The Code

📤 The Output

⚔️ Pivot Table vs. Crosstab: The Difference

1. pd.pivot_table() is for AGGREGATION

2. pd.crosstab() is for COUNTING

🔍 Side-by-Side Code Comparison

Post a Comment

Contact Form