Pandas Joining Tables by Index

๐Ÿ“Š Pandas join() Method

While merge() is used to combine tables based on a shared column (like "Employee ID"), the join() method is used to combine tables based on their Index (Row Labels).

Think of join() as pasting two tables side-by-side, matching them strictly by their row numbers.


๐Ÿ”น 1. Basic Joining (Default Indices)

Let's create two DataFrames.
Table 1: Employee Details.
Table 2: Employee Addresses.

๐Ÿงช Code Example


import pandas as pd
import numpy as np

# Table 1 Data
data1 = {
    'employ id': [1, 2, 3, 4, 5],
    'name': ['Atul', 'Aman', 'Riya', 'Sneha', 'Dev'],
    'class': [1, 2, 3, 4, 5],
    'roll': [1, 2, 3, 4, 5],
    'subject': ['Atul_subject', 'Aman_subject', 'Riya_subject', 'Sneha_subject', 'Dev_subject'],
    'status': ['Atul_status', 'Aman_status', 'Riya_status', 'Sneha_status', 'Dev_status']
}

# Table 2 Data
data2 = {
    'address': ['Atu_Delhi', 'Aman_Mumbai', 'Riya_Pune', 'Sneha_Chennai', 'Dev_Kolkata'],
}

# Creating DataFrames (Default Index: 0, 1, 2, 3, 4)
table1 = pd.DataFrame(data1)
table2 = pd.DataFrame(data2)

# Join table2 to the right side of table1
print(table1.join(table2))

๐Ÿ“ค Output


   employ id   name  ...       status        address
0          1   Atul  ...  Atul_status      Atu_Delhi
1          2   Aman  ...  Aman_status    Aman_Mumbai
2          3   Riya  ...  Riya_status      Riya_Pune
3          4  Sneha  ... Sneha_status  Sneha_Chennai
4          5    Dev  ...   Dev_status    Dev_Kolkata

Result: Since both tables have the exact same index (0 to 4), they align perfectly.


๐Ÿ”น 2. Important Rules for .join()

1. Column Names Must Be Unique

If both tables have a column with the same name (e.g., both have a "status" column), Pandas will raise an error. Unlike merge(), join() does not automatically rename columns unless you provide a suffix.

2. It Matches by Index, Not Position

This is the most critical difference. It does not just paste Row 1 next to Row 1. It looks at the Index Label. If Row 1 is labeled "A" in Table 1, it looks for "A" in Table 2.


๐Ÿ”น 3. The Index Mismatch Problem

What happens if the indices are different? Let's manually change the index of Table 2 to start from 1 instead of 0.

⚠ Setup

  • Table 1 Index: 0, 1, 2, 3, 4 (Default)
  • Table 2 Index: 1, 2, 3, 4, 5 (Manually set)

๐Ÿงช Code Example


# Re-create Table 2 with a shifted index
table2 = pd.DataFrame(data2, index=[1, 2, 3, 4, 5])

print("--- Joining with Mismatched Indices ---")
print(table1.join(table2))

๐Ÿ“ค Output


   employ id   name  ...       status        address
0          1   Atul  ...  Atul_status            NaN
1          2   Aman  ...  Aman_status      Atu_Delhi
2          3   Riya  ...  Riya_status    Aman_Mumbai
3          4  Sneha  ... Sneha_status      Riya_Pune
4          5    Dev  ...   Dev_status  Sneha_Chennai

๐Ÿ“ Explanation of the Output

Row 0 (Atul):

  • Table 1 has Index 0.
  • Pandas looks for Index 0 in Table 2.
  • Result: It does not exist in Table 2! So, we get NaN.

Row 1 (Aman):

  • Table 1 has Index 1.
  • Pandas looks for Index 1 in Table 2.
  • Result: It finds "Atu_Delhi" (which is at index 1 in Table 2).

Row 5 (Table 2):

  • Table 2 has Index 5 ("Dev_Kolkata").
  • Since join() acts like a Left Join by default, it only cares about indices in Table 1. Index 5 is ignored.

๐Ÿ“Œ Summary: Merge vs. Join

Feature Merge Join
Alignment Basis Column Values (e.g., "ID") Index Labels (Row Numbers)
Use Case Database-style relations Quickly combining datasets
Handling Duplicates Auto-renames columns (x, y) Throws error (unless suffix used)

Post a Comment

Do Leave Your Comments...

Previous Post Next Post

Contact Form