Pandas Crosstab: Analyzing Frequencies

The Crosstab (Frequency Table)

While pivot_table is great for summing and averaging numbers, sometimes you just want to count things.
"How many males vs. females are in each department?"
"How many students passed vs. failed in Class A?"

For this, Pandas gives us a specialized function called pd.crosstab().


๐Ÿ”น Syntax

It is simpler than Pivot Table because it doesn't strictly need a DataFrame. You can pass columns directly.

pd.crosstab(index, columns, margins=True, normalize=...)

๐Ÿ”น Example 1: Basic Frequency Count

Goal: Count how many employees of each Gender are in each Department.

๐Ÿงช The Code

import pandas as pd

# 1. Create Data
data = {
    'Department': ['HR', 'IT', 'HR', 'IT', 'Sales', 'Sales', 'IT'],
    'Gender': ['F', 'M', 'F', 'F', 'M', 'F', 'M']
}
df = pd.DataFrame(data)

# 2. Create Crosstab
# We just pass the two columns we want to compare
count_table = pd.crosstab(df['Department'], df['Gender'])

print(count_table)

๐Ÿ“ค The Output

Gender F M Department HR 2 0 IT 1 2 Sales 1 1
๐Ÿ“ How it works:
Pandas simply counts the occurrences.
  • HR: Has 2 Females (F) and 0 Males (M).
  • IT: Has 1 Female and 2 Males.

๐Ÿ”น Example 2: Percentages (Normalization)

This is the superpower of Crosstab. Instead of raw counts, you often want percentages.
"What % of the IT department is Male?"

We use the normalize parameter:
'index' = Calculate % across the row (Row total = 100%).
'columns' = Calculate % down the column (Col total = 100%).

๐Ÿงช The Code

# Show percentages per Department (Row)
percent_table = pd.crosstab(
    df['Department'], 
    df['Gender'], 
    normalize='index' # Important!
)

# Multiply by 100 to make it readable
print(percent_table * 100)

๐Ÿ“ค The Output

Gender F M Department HR 100.000000 0.000000 IT 33.333333 66.666667 Sales 50.000000 50.000000
๐Ÿ“ Insight:
Look at the IT row.
Instead of saying "1 Female, 2 Males", it tells us "33% Female, 66% Male". This is much better for comparing large datasets.

๐Ÿ”น Example 3: Adding Totals (Margins)

Just like Pivot Tables, we can add "All" row/column totals.

๐Ÿงช The Code

full_table = pd.crosstab(
    df['Department'], 
    df['Gender'], 
    margins=True, 
    margins_name="Total Employees"
)

print(full_table)

๐Ÿ“ค The Output

Gender F M Total Employees Department HR 2 0 2 IT 1 2 3 Sales 1 1 2 Total Employees 4 3 7


⚔️ Pivot Table vs. Crosstab: The Difference

Since both functions create similar tables, beginners often get confused. Here is the definitive difference.

1. pd.pivot_table() is for AGGREGATION

Use this when you have Values (numbers) that you want to calculate (Sum, Mean, Max).

Example: "Calculate the Average Salary of Employees."

2. pd.crosstab() is for COUNTING

Use this when you just want to count Frequencies (How many times did X happen?).

Example: "Count how many Male vs Female employees exist."

๐Ÿ” Side-by-Side Code Comparison

Feature pd.pivot_table() pd.crosstab()
Primary Goal Aggregation (Sum, Mean, Max) Counting (Frequency)
Input Data Must be a DataFrame Can be Lists, Arrays, or Columns
Normalization Difficult (Manual math needed) Easy (Has `normalize` param)
Default Behavior Calculates the Mean (Average) Calculates the Count
๐Ÿ’ก The Golden Rule:
  • Use Pivot Table if you are asking "How much?" (Money, Score, Height).
  • Use Crosstab if you are asking "How many?" (People, Items, Events).

Post a Comment

Do Leave Your Comments...

Previous Post Next Post

Contact Form