๐ Pandas join() Method
While merge() is used to combine tables based on a shared column (like "Employee ID"), the join() method is used to combine tables based on their Index (Row Labels).
Think of join() as pasting two tables side-by-side, matching them strictly by their row numbers.
๐น 1. Basic Joining (Default Indices)
Let's create two DataFrames.
Table 1: Employee Details.
Table 2: Employee Addresses.
๐งช Code Example
import pandas as pd
import numpy as np
# Table 1 Data
data1 = {
'employ id': [1, 2, 3, 4, 5],
'name': ['Atul', 'Aman', 'Riya', 'Sneha', 'Dev'],
'class': [1, 2, 3, 4, 5],
'roll': [1, 2, 3, 4, 5],
'subject': ['Atul_subject', 'Aman_subject', 'Riya_subject', 'Sneha_subject', 'Dev_subject'],
'status': ['Atul_status', 'Aman_status', 'Riya_status', 'Sneha_status', 'Dev_status']
}
# Table 2 Data
data2 = {
'address': ['Atu_Delhi', 'Aman_Mumbai', 'Riya_Pune', 'Sneha_Chennai', 'Dev_Kolkata'],
}
# Creating DataFrames (Default Index: 0, 1, 2, 3, 4)
table1 = pd.DataFrame(data1)
table2 = pd.DataFrame(data2)
# Join table2 to the right side of table1
print(table1.join(table2))
๐ค Output
employ id name ... status address
0 1 Atul ... Atul_status Atu_Delhi
1 2 Aman ... Aman_status Aman_Mumbai
2 3 Riya ... Riya_status Riya_Pune
3 4 Sneha ... Sneha_status Sneha_Chennai
4 5 Dev ... Dev_status Dev_Kolkata
Result: Since both tables have the exact same index (0 to 4), they align perfectly.
๐น 2. Important Rules for .join()
1. Column Names Must Be Unique
If both tables have a column with the same name (e.g., both have a "status" column), Pandas will raise an error.
Unlike merge(), join() does not automatically rename columns unless you provide a suffix.
2. It Matches by Index, Not Position
This is the most critical difference. It does not just paste Row 1 next to Row 1. It looks at the Index Label. If Row 1 is labeled "A" in Table 1, it looks for "A" in Table 2.
๐น 3. The Index Mismatch Problem
What happens if the indices are different? Let's manually change the index of Table 2 to start from 1 instead of 0.
⚠ Setup
- Table 1 Index:
0, 1, 2, 3, 4(Default) - Table 2 Index:
1, 2, 3, 4, 5(Manually set)
๐งช Code Example
# Re-create Table 2 with a shifted index
table2 = pd.DataFrame(data2, index=[1, 2, 3, 4, 5])
print("--- Joining with Mismatched Indices ---")
print(table1.join(table2))
๐ค Output
employ id name ... status address
0 1 Atul ... Atul_status NaN
1 2 Aman ... Aman_status Atu_Delhi
2 3 Riya ... Riya_status Aman_Mumbai
3 4 Sneha ... Sneha_status Riya_Pune
4 5 Dev ... Dev_status Sneha_Chennai
๐ Explanation of the Output
Row 0 (Atul):
- Table 1 has Index
0. - Pandas looks for Index
0in Table 2. - Result: It does not exist in Table 2! So, we get
NaN.
Row 1 (Aman):
- Table 1 has Index
1. - Pandas looks for Index
1in Table 2. - Result: It finds "Atu_Delhi" (which is at index 1 in Table 2).
Row 5 (Table 2):
- Table 2 has Index
5("Dev_Kolkata"). - Since
join()acts like a Left Join by default, it only cares about indices in Table 1. Index 5 is ignored.
๐ Summary: Merge vs. Join
| Feature | Merge | Join |
|---|---|---|
| Alignment Basis | Column Values (e.g., "ID") | Index Labels (Row Numbers) |
| Use Case | Database-style relations | Quickly combining datasets |
| Handling Duplicates | Auto-renames columns (x, y) | Throws error (unless suffix used) |