Pandas Merging Tables

Merging Tables (merge)

In real-world projects, data is often split across multiple files or database tables to save space. For example, one table might have "Employee Personal Details" and another might have "Employee Contact Info".

To analyze them together, we need to Merge (or Join) them based on a common "Key" (like an ID number).


๐Ÿ”น 1. Setup: Creating Two DataFrames

Let's create two separate tables.
Table 1: Contains Name, Class, and Status.
Table 2: Contains Address and Mobile No.

Common Key: Both tables share the employ id column. This is the bridge that connects them.

๐Ÿงช Code Example


import pandas as pd
import numpy as np

# Table 1: Personal Details
data1 = {
    'employ id': [1, 2, 3, 10, 11],
    'name': ['Atul', 'Aman', 'Riya', 'Sneha', 'Dev'],
    'class': [10, 0, 11, 12, 12],
    'roll': [1, 2, 3, 4, 0],
    'subject': ['Math', np.nan, 'English', np.nan, np.nan],
    'status': ['pass', 'fail', np.nan, 'pass', np.nan]
}

# Table 2: Contact Details
data2 = {
    'employ id': [1, 2, 3, 4, 5, 6, 7, 8],
    'address': ['Delhi', 'Mumbai', 'Pune', 'Chennai', 'Kolkata', 'Pune', 'Chennai', 'Kolkata'],
    'mobile no': ['9876543210', '8765432109', np.nan, '7654321098', '6543210987', np.nan, '7654321098', '6543210987']
}

table1 = pd.DataFrame(data1)
table2 = pd.DataFrame(data2)

print("--- Table 1 (Personal) ---")
print(table1)
print("\n--- Table 2 (Contact) ---")
print(table2)
⚠ Prerequisite for Merging:
There must be at least one common column (unique identifier) present in both tables. If you try to merge tables that share no column names, Pandas will throw a MergeError.

๐Ÿ”น 2. Inner Merge (Intersection)

This is the default merge type. It returns only the rows where the ID exists in BOTH tables.

  • IDs in Table 1: 1, 2, 3, 10, 11
  • IDs in Table 2: 1, 2, 3, 4, 5, 6...
  • Common IDs: 1, 2, 3 (Only these will show up).

๐Ÿงช Code Example


# how="inner" is optional (it is the default)
print(pd.merge(table1, table2, on="employ id", how="inner"))

๐Ÿ“ค Output


   employ id  name  class  ... status address   mobile no
0          1  Atul     10  ...   pass   Delhi  9876543210
1          2  Aman      0  ...   fail  Mumbai  8765432109
2          3  Riya     11  ...    NaN    Pune         NaN

Notice that IDs 10, 11 (from T1) and 4, 5, 6... (from T2) are gone because they didn't match.


๐Ÿ”น 3. Outer Merge (Union)

This keeps ALL rows from BOTH tables.

  • If an ID exists in Table 1 but not Table 2 (like ID 10), Pandas keeps it and fills the missing contact info with NaN.
  • If an ID exists in Table 2 but not Table 1 (like ID 5), Pandas keeps it and fills the missing name info with NaN.

๐Ÿงช Code Example


print(pd.merge(table1, table2, on="employ id", how="outer"))

๐Ÿ“ค Output


   employ id   name  class ... address   mobile no
0          1   Atul   10.0 ...   Delhi  9876543210  (Matched)
...
3         10  Sneha   12.0 ...     NaN         NaN  (In T1 only)
4         11    Dev   12.0 ...     NaN         NaN  (In T1 only)
5          4    NaN    NaN ... Chennai  7654321098  (In T2 only)
6          5    NaN    NaN ... Kolkata  6543210987  (In T2 only)
...

๐Ÿ”น 4. Left Merge (Priority to Table 1)

This keeps ALL rows from the Left Table (Table 1). It matches data from the Right Table (Table 2) where possible. If no match is found, it puts NaN.

  • We care about all our Employees (Table 1).
  • We don't care about extra contact info in Table 2 if it belongs to an unknown person.

๐Ÿงช Code Example


# Table 1 is Left, Table 2 is Right
print(pd.merge(table1, table2, on="employ id", how="left"))

๐Ÿ“ค Output


   employ id   name  class ... address   mobile no
0          1   Atul     10 ...   Delhi  9876543210
1          2   Aman      0 ...  Mumbai  8765432109
2          3   Riya     11 ...    Pune         NaN
3         10  Sneha     12 ...     NaN         NaN
4         11    Dev     12 ...     NaN         NaN

Notice IDs 10 and 11 are kept (because they are in the Left table), but IDs 4, 5, 6 are dropped (because they are only in the Right table).


๐Ÿ“Œ Summary

Merge Type Venn Diagram Description
Inner Intersection (A ∩ B) Only common IDs. No NaNs generated from mismatch.
Outer Union (A ∪ B) All IDs from both. Lots of NaNs.
Left Left Circle (A) All rows from Left table. NaNs for missing Right data.
โ„น Note: Merging is a binary operation. You can only merge two DataFrames at a time. If you have 3 tables, you must merge T1 and T2 first, and then merge the result with T3.

Post a Comment

Do Leave Your Comments...

Previous Post Next Post

Contact Form