The Crosstab (Frequency Table)
While pivot_table is great for summing and averaging numbers, sometimes you just want to count things.
"How many males vs. females are in each department?"
"How many students passed vs. failed in Class A?"
For this, Pandas gives us a specialized function called pd.crosstab().
๐น Syntax
It is simpler than Pivot Table because it doesn't strictly need a DataFrame. You can pass columns directly.
pd.crosstab(index, columns, margins=True, normalize=...)
๐น Example 1: Basic Frequency Count
Goal: Count how many employees of each Gender are in each Department.
๐งช The Code
import pandas as pd
# 1. Create Data
data = {
'Department': ['HR', 'IT', 'HR', 'IT', 'Sales', 'Sales', 'IT'],
'Gender': ['F', 'M', 'F', 'F', 'M', 'F', 'M']
}
df = pd.DataFrame(data)
# 2. Create Crosstab
# We just pass the two columns we want to compare
count_table = pd.crosstab(df['Department'], df['Gender'])
print(count_table)
๐ค The Output
Pandas simply counts the occurrences.
- HR: Has 2 Females (F) and 0 Males (M).
- IT: Has 1 Female and 2 Males.
๐น Example 2: Percentages (Normalization)
This is the superpower of Crosstab.
Instead of raw counts, you often want percentages.
"What % of the IT department is Male?"
We use the normalize parameter:
'index' = Calculate % across the row (Row total = 100%).
'columns' = Calculate % down the column (Col total = 100%).
๐งช The Code
# Show percentages per Department (Row)
percent_table = pd.crosstab(
df['Department'],
df['Gender'],
normalize='index' # Important!
)
# Multiply by 100 to make it readable
print(percent_table * 100)
๐ค The Output
Look at the IT row.
Instead of saying "1 Female, 2 Males", it tells us "33% Female, 66% Male". This is much better for comparing large datasets.
๐น Example 3: Adding Totals (Margins)
Just like Pivot Tables, we can add "All" row/column totals.
๐งช The Code
full_table = pd.crosstab(
df['Department'],
df['Gender'],
margins=True,
margins_name="Total Employees"
)
print(full_table)
๐ค The Output
⚔️ Pivot Table vs. Crosstab: The Difference
Since both functions create similar tables, beginners often get confused. Here is the definitive difference.
1. pd.pivot_table() is for AGGREGATION
Use this when you have Values (numbers) that you want to calculate (Sum, Mean, Max).
Example: "Calculate the Average Salary of Employees."
2. pd.crosstab() is for COUNTING
Use this when you just want to count Frequencies (How many times did X happen?).
Example: "Count how many Male vs Female employees exist."
๐ Side-by-Side Code Comparison
| Feature | pd.pivot_table() | pd.crosstab() |
|---|---|---|
| Primary Goal | Aggregation (Sum, Mean, Max) | Counting (Frequency) |
| Input Data | Must be a DataFrame | Can be Lists, Arrays, or Columns |
| Normalization | Difficult (Manual math needed) | Easy (Has `normalize` param) |
| Default Behavior | Calculates the Mean (Average) | Calculates the Count |
- Use Pivot Table if you are asking "How much?" (Money, Score, Height).
- Use Crosstab if you are asking "How many?" (People, Items, Events).