Concatenating Tables
While merge() and join() are about connecting data based on logic (IDs or Indices),
Concatenation is simpler: it's about stacking data.
Think of it like stacking Lego blocks. You can stack them on top of each other (Row-wise) or place them side-by-side (Column-wise).
๐น 1. Setup: Creating Two DataFrames
Let's create two simple tables to demonstrate this.
๐งช Code Example
import pandas as pd
import numpy as np
# Table 1
data1 = {
'name': ['Atul', 'Aman'],
'class': [10, 12]
}
# Table 2
data2 = {
'name': ['Riya', 'Sneha'],
'class': [11, 10],
'address': ['Pune', 'Chennai']
}
table1 = pd.DataFrame(data1)
table2 = pd.DataFrame(data2)
print("--- Table 1 ---")
print(table1)
print("\n--- Table 2 ---")
print(table2)
๐น 2. Column-wise Concatenation (axis=1)
This places the two tables side-by-side. It is useful when you have two separate datasets for the same rows (e.g., student grades in one file, student addresses in another).
๐งช Code Example
# axis=1 means "Along Columns" (Side-by-Side)
result_col = pd.concat([table1, table2], axis=1)
print(result_col)
๐ค Output
name class name class address
0 Atul 10 Riya 11 Pune
1 Aman 12 Sneha 10 Chennai
๐ What Happened?
- We now have duplicate column names (two "name" columns).
- If we try to select row 1, we get data from BOTH original tables.
# Select row 1
print(result_col.loc[[1]])
name class name class address
1 Aman 12 Sneha 10 Chennai
๐น 3. Row-wise Concatenation (axis=0)
This places one table on top of the other. This is useful when you have data split into multiple files (e.g., "Sales_January.csv" and "Sales_February.csv") and want to combine them into one big list.
๐งช Code Example
# axis=0 means "Along Rows" (Top-to-Bottom)
# ignore_index=True resets the index to 0, 1, 2, 3...
result_row = pd.concat([table1, table2], axis=0)
print(result_row)
๐ค Output
name class address
0 Atul 10 NaN
1 Aman 12 NaN
0 Riya 11 Pune
1 Sneha 10 Chennai
๐ Explanation
- Indices are repeated: Notice we have index
0, 1followed by another0, 1. This can cause issues later. - NaN Values: Table 1 didn't have an "address" column, so Pandas filled those cells with
NaN.
⚠ Fixing the Index
If we select row 1 now, we get TWO rows because the index is duplicated!
print(result_row.loc[1])
name class address
1 Aman 12 NaN
1 Sneha 10 Chennai
To fix this, we usually use ignore_index=True during concatenation so the new table counts 0, 1, 2, 3 smoothly.
๐ Summary
| Axis | Direction | Analogy | Key Issue |
|---|---|---|---|
| axis=0 | Row-wise (Vertical) | Stacking boxes | Indices might duplicate. |
| axis=1 | Col-wise (Horizontal) | Books on a shelf | Column names might duplicate. |