Merging Tables (merge)
In real-world projects, data is often split across multiple files or database tables to save space. For example, one table might have "Employee Personal Details" and another might have "Employee Contact Info".
To analyze them together, we need to Merge (or Join) them based on a common "Key" (like an ID number).
๐น 1. Setup: Creating Two DataFrames
Let's create two separate tables.
Table 1: Contains Name, Class, and Status.
Table 2: Contains Address and Mobile No.
Common Key: Both tables share the employ id column. This is the bridge that connects them.
๐งช Code Example
import pandas as pd
import numpy as np
# Table 1: Personal Details
data1 = {
'employ id': [1, 2, 3, 10, 11],
'name': ['Atul', 'Aman', 'Riya', 'Sneha', 'Dev'],
'class': [10, 0, 11, 12, 12],
'roll': [1, 2, 3, 4, 0],
'subject': ['Math', np.nan, 'English', np.nan, np.nan],
'status': ['pass', 'fail', np.nan, 'pass', np.nan]
}
# Table 2: Contact Details
data2 = {
'employ id': [1, 2, 3, 4, 5, 6, 7, 8],
'address': ['Delhi', 'Mumbai', 'Pune', 'Chennai', 'Kolkata', 'Pune', 'Chennai', 'Kolkata'],
'mobile no': ['9876543210', '8765432109', np.nan, '7654321098', '6543210987', np.nan, '7654321098', '6543210987']
}
table1 = pd.DataFrame(data1)
table2 = pd.DataFrame(data2)
print("--- Table 1 (Personal) ---")
print(table1)
print("\n--- Table 2 (Contact) ---")
print(table2)
There must be at least one common column (unique identifier) present in both tables. If you try to merge tables that share no column names, Pandas will throw a
MergeError.
๐น 2. Inner Merge (Intersection)
This is the default merge type. It returns only the rows where the ID exists in BOTH tables.
- IDs in Table 1:
1, 2, 3, 10, 11 - IDs in Table 2:
1, 2, 3, 4, 5, 6... - Common IDs:
1, 2, 3(Only these will show up).
๐งช Code Example
# how="inner" is optional (it is the default)
print(pd.merge(table1, table2, on="employ id", how="inner"))
๐ค Output
employ id name class ... status address mobile no
0 1 Atul 10 ... pass Delhi 9876543210
1 2 Aman 0 ... fail Mumbai 8765432109
2 3 Riya 11 ... NaN Pune NaN
Notice that IDs 10, 11 (from T1) and 4, 5, 6... (from T2) are gone because they didn't match.
๐น 3. Outer Merge (Union)
This keeps ALL rows from BOTH tables.
- If an ID exists in Table 1 but not Table 2 (like ID 10), Pandas keeps it and fills the missing contact info with
NaN. - If an ID exists in Table 2 but not Table 1 (like ID 5), Pandas keeps it and fills the missing name info with
NaN.
๐งช Code Example
print(pd.merge(table1, table2, on="employ id", how="outer"))
๐ค Output
employ id name class ... address mobile no
0 1 Atul 10.0 ... Delhi 9876543210 (Matched)
...
3 10 Sneha 12.0 ... NaN NaN (In T1 only)
4 11 Dev 12.0 ... NaN NaN (In T1 only)
5 4 NaN NaN ... Chennai 7654321098 (In T2 only)
6 5 NaN NaN ... Kolkata 6543210987 (In T2 only)
...
๐น 4. Left Merge (Priority to Table 1)
This keeps ALL rows from the Left Table (Table 1).
It matches data from the Right Table (Table 2) where possible. If no match is found, it puts NaN.
- We care about all our Employees (Table 1).
- We don't care about extra contact info in Table 2 if it belongs to an unknown person.
๐งช Code Example
# Table 1 is Left, Table 2 is Right
print(pd.merge(table1, table2, on="employ id", how="left"))
๐ค Output
employ id name class ... address mobile no
0 1 Atul 10 ... Delhi 9876543210
1 2 Aman 0 ... Mumbai 8765432109
2 3 Riya 11 ... Pune NaN
3 10 Sneha 12 ... NaN NaN
4 11 Dev 12 ... NaN NaN
Notice IDs 10 and 11 are kept (because they are in the Left table), but IDs 4, 5, 6 are dropped (because they are only in the Right table).
๐ Summary
| Merge Type | Venn Diagram | Description |
|---|---|---|
| Inner | Intersection (A ∩ B) | Only common IDs. No NaNs generated from mismatch. |
| Outer | Union (A ∪ B) | All IDs from both. Lots of NaNs. |
| Left | Left Circle (A) | All rows from Left table. NaNs for missing Right data. |