Mastering Pivot Tables in Pandas
In the previous tutorial, we learned the theory. Now, let's write the code!
The function we use is pd.pivot_table().
๐น The Syntax
The code follows this structure:
pd.pivot_table(data, values=..., index=..., columns=..., aggfunc=...)
- data: Your DataFrame.
- index: Keys to group by on the vertical axis (Rows).
- columns: Keys to group by on the horizontal axis.
- values: The column containing the numbers to calculate.
- aggfunc: The math function (
'sum','mean','count', etc.).
๐น Example 1: Sales Data (Summing)
Goal: Calculate the Total Sales for each Product in each City.
๐งช The Code
import pandas as pd
import numpy as np
# 1. Create the DataFrame
data = {
'City': ['Delhi', 'Mumbai', 'Delhi', 'Mumbai', 'Delhi', 'Mumbai'],
'Product': ['Pen', 'Pencil', 'Pen', 'Pen', 'Pencil', 'Pencil'],
'Sales': [100, 50, 200, 150, 20, 30]
}
df = pd.DataFrame(data)
# 2. Create the Pivot Table
report = pd.pivot_table(
df,
index='City',
columns='Product',
values='Sales',
aggfunc='sum'
)
print(report)
๐ค The Output
Product Pen Pencil
City
Delhi 300 20
Mumbai 150 80
๐ How it works:
1. index='City': Puts "Delhi" and "Mumbai" as row labels.
2. columns='Product': Puts "Pen" and "Pencil" as column headers.
3. values='Sales': Tells Pandas to look at the Sales numbers.
4. aggfunc='sum': Adds up the numbers (e.g., Delhi Pen: 100 + 200 = 300).
1. index='City': Puts "Delhi" and "Mumbai" as row labels.
2. columns='Product': Puts "Pen" and "Pencil" as column headers.
3. values='Sales': Tells Pandas to look at the Sales numbers.
4. aggfunc='sum': Adds up the numbers (e.g., Delhi Pen: 100 + 200 = 300).
๐น Example 2: Student Grades (Averaging)
Goal: Calculate the Average Score for each Subject.
๐งช The Code
# 1. Create Data
data2 = {
'Student': ['Riya', 'Aman', 'Riya', 'Aman'],
'Subject': ['Math', 'Math', 'Science', 'Science'],
'Score': [90, 80, 70, 60]
}
df2 = pd.DataFrame(data2)
# 2. Pivot Table (Average)
# Note: We don't use 'columns' here because we just want a simple list
grades = pd.pivot_table(
df2,
index='Subject',
values='Score',
aggfunc='mean'
)
print(grades)
๐ค The Output
Score
Subject
Math 85
Science 65
๐ How it works:
1. aggfunc='mean': Instead of adding, it calculates the average.
2. (90 + 80) / 2 = 85 for Math.
3. Since we skipped
1. aggfunc='mean': Instead of adding, it calculates the average.
2. (90 + 80) / 2 = 85 for Math.
3. Since we skipped
columns=, it created a simple vertical table.
๐น Example 3: Attendance (Counting)
Goal: Count how many days each employee was Present.
๐งช The Code
# 1. Create Data
data3 = {
'Name': ['John', 'John', 'Alice', 'John'],
'Date': ['Jan 1', 'Jan 2', 'Jan 1', 'Jan 3'],
'Status': ['Present', 'Present', 'Present', 'Absent']
}
df3 = pd.DataFrame(data3)
# 2. Pivot Table (Count)
attendance = pd.pivot_table(
df3,
index='Name',
columns='Status',
values='Date', # We count the dates
aggfunc='count',
fill_value=0 # Replace NaN with 0
)
print(attendance)
๐ค The Output
Status Absent Present
Name
Alice 0 1
John 1 2
๐ How it works:
1. aggfunc='count': Counts the number of entries.
2. fill_value=0: Crucial! Alice was never "Absent", so normally that cell would be
1. aggfunc='count': Counts the number of entries.
2. fill_value=0: Crucial! Alice was never "Absent", so normally that cell would be
NaN (Empty). This forces it to show 0 instead.
๐น Bonus: Adding Totals (Margins)
Often, you want a "Grand Total" row at the bottom. We use margins=True.
๐งช The Code
total_report = pd.pivot_table(
df,
index='City',
columns='Product',
values='Sales',
aggfunc='sum',
margins=True, # Add Totals
margins_name='Grand Total' # Rename the total label
)
print(total_report)
๐ค The Output
Product Pen Pencil Grand Total
City
Delhi 300 20 320
Mumbai 150 80 230
Grand Total 450 100 550
๐ Summary Cheat Sheet
| Parameter | Description | Example |
|---|---|---|
data |
The DataFrame to use | df |
index |
Rows (Vertical grouping) | 'City' |
columns |
Columns (Horizontal grouping) | 'Product' |
values |
The numbers to calculate | 'Sales' |
aggfunc |
The math logic | 'sum', 'mean', 'count', 'max' |
fill_value |
Replace empty cells | 0 |
margins |
Add Grand Totals | True |