Working with Pandas Pivot Tables

Mastering Pivot Tables in Pandas

In the previous tutorial, we learned the theory. Now, let's write the code! The function we use is pd.pivot_table().


๐Ÿ”น The Syntax

The code follows this structure:

pd.pivot_table(data, values=..., index=..., columns=..., aggfunc=...)
  • data: Your DataFrame.
  • index: Keys to group by on the vertical axis (Rows).
  • columns: Keys to group by on the horizontal axis.
  • values: The column containing the numbers to calculate.
  • aggfunc: The math function ('sum', 'mean', 'count', etc.).

๐Ÿ”น Example 1: Sales Data (Summing)

Goal: Calculate the Total Sales for each Product in each City.

๐Ÿงช The Code

import pandas as pd
import numpy as np

# 1. Create the DataFrame
data = {
    'City': ['Delhi', 'Mumbai', 'Delhi', 'Mumbai', 'Delhi', 'Mumbai'],
    'Product': ['Pen', 'Pencil', 'Pen', 'Pen', 'Pencil', 'Pencil'],
    'Sales': [100, 50, 200, 150, 20, 30]
}
df = pd.DataFrame(data)

# 2. Create the Pivot Table
report = pd.pivot_table(
    df, 
    index='City', 
    columns='Product', 
    values='Sales', 
    aggfunc='sum'
)

print(report)

๐Ÿ“ค The Output

Product Pen Pencil City Delhi 300 20 Mumbai 150 80
๐Ÿ“ How it works:
1. index='City': Puts "Delhi" and "Mumbai" as row labels.
2. columns='Product': Puts "Pen" and "Pencil" as column headers.
3. values='Sales': Tells Pandas to look at the Sales numbers.
4. aggfunc='sum': Adds up the numbers (e.g., Delhi Pen: 100 + 200 = 300).

๐Ÿ”น Example 2: Student Grades (Averaging)

Goal: Calculate the Average Score for each Subject.

๐Ÿงช The Code

# 1. Create Data
data2 = {
    'Student': ['Riya', 'Aman', 'Riya', 'Aman'],
    'Subject': ['Math', 'Math', 'Science', 'Science'],
    'Score': [90, 80, 70, 60]
}
df2 = pd.DataFrame(data2)

# 2. Pivot Table (Average)
# Note: We don't use 'columns' here because we just want a simple list
grades = pd.pivot_table(
    df2, 
    index='Subject', 
    values='Score', 
    aggfunc='mean'
)

print(grades)

๐Ÿ“ค The Output

Score Subject Math 85 Science 65
๐Ÿ“ How it works:
1. aggfunc='mean': Instead of adding, it calculates the average.
2. (90 + 80) / 2 = 85 for Math.
3. Since we skipped columns=, it created a simple vertical table.

๐Ÿ”น Example 3: Attendance (Counting)

Goal: Count how many days each employee was Present.

๐Ÿงช The Code

# 1. Create Data
data3 = {
    'Name': ['John', 'John', 'Alice', 'John'],
    'Date': ['Jan 1', 'Jan 2', 'Jan 1', 'Jan 3'],
    'Status': ['Present', 'Present', 'Present', 'Absent']
}
df3 = pd.DataFrame(data3)

# 2. Pivot Table (Count)
attendance = pd.pivot_table(
    df3, 
    index='Name', 
    columns='Status', 
    values='Date',    # We count the dates
    aggfunc='count', 
    fill_value=0      # Replace NaN with 0
)

print(attendance)

๐Ÿ“ค The Output

Status Absent Present Name Alice 0 1 John 1 2
๐Ÿ“ How it works:
1. aggfunc='count': Counts the number of entries.
2. fill_value=0: Crucial! Alice was never "Absent", so normally that cell would be NaN (Empty). This forces it to show 0 instead.

๐Ÿ”น Bonus: Adding Totals (Margins)

Often, you want a "Grand Total" row at the bottom. We use margins=True.

๐Ÿงช The Code

total_report = pd.pivot_table(
    df, 
    index='City', 
    columns='Product', 
    values='Sales', 
    aggfunc='sum',
    margins=True,              # Add Totals
    margins_name='Grand Total' # Rename the total label
)

print(total_report)

๐Ÿ“ค The Output

Product Pen Pencil Grand Total City Delhi 300 20 320 Mumbai 150 80 230 Grand Total 450 100 550

๐Ÿ“Œ Summary Cheat Sheet

Parameter Description Example
data The DataFrame to use df
index Rows (Vertical grouping) 'City'
columns Columns (Horizontal grouping) 'Product'
values The numbers to calculate 'Sales'
aggfunc The math logic 'sum', 'mean', 'count', 'max'
fill_value Replace empty cells 0
margins Add Grand Totals True

Post a Comment

Do Leave Your Comments...

Previous Post Next Post

Contact Form