Pandas Concatenating Tables

Concatenating Tables

While merge() and join() are about connecting data based on logic (IDs or Indices), Concatenation is simpler: it's about stacking data.

Think of it like stacking Lego blocks. You can stack them on top of each other (Row-wise) or place them side-by-side (Column-wise).


๐Ÿ”น 1. Setup: Creating Two DataFrames

Let's create two simple tables to demonstrate this.

๐Ÿงช Code Example


import pandas as pd
import numpy as np

# Table 1
data1 = {
    'name': ['Atul', 'Aman'],
    'class': [10, 12]
}

# Table 2
data2 = {
    'name': ['Riya', 'Sneha'],
    'class': [11, 10],
    'address': ['Pune', 'Chennai']
}

table1 = pd.DataFrame(data1)
table2 = pd.DataFrame(data2)

print("--- Table 1 ---")
print(table1)
print("\n--- Table 2 ---")
print(table2)

๐Ÿ”น 2. Column-wise Concatenation (axis=1)

This places the two tables side-by-side. It is useful when you have two separate datasets for the same rows (e.g., student grades in one file, student addresses in another).

⚠ Important Rule: Pandas aligns data based on the Index. If Table 1 has indices 0, 1 and Table 2 has indices 0, 1, they sit next to each other perfectly.

๐Ÿงช Code Example


# axis=1 means "Along Columns" (Side-by-Side)
result_col = pd.concat([table1, table2], axis=1)

print(result_col)

๐Ÿ“ค Output


   name  class   name  class  address
0  Atul     10   Riya     11     Pune
1  Aman     12  Sneha     10  Chennai

๐Ÿ“ What Happened?

  • We now have duplicate column names (two "name" columns).
  • If we try to select row 1, we get data from BOTH original tables.

# Select row 1
print(result_col.loc[[1]])

   name  class   name  class  address
1  Aman     12  Sneha     10  Chennai

๐Ÿ”น 3. Row-wise Concatenation (axis=0)

This places one table on top of the other. This is useful when you have data split into multiple files (e.g., "Sales_January.csv" and "Sales_February.csv") and want to combine them into one big list.

๐Ÿงช Code Example


# axis=0 means "Along Rows" (Top-to-Bottom)
# ignore_index=True resets the index to 0, 1, 2, 3...
result_row = pd.concat([table1, table2], axis=0)

print(result_row)

๐Ÿ“ค Output


    name  class  address
0   Atul     10      NaN
1   Aman     12      NaN
0   Riya     11     Pune
1  Sneha     10  Chennai

๐Ÿ“ Explanation

  • Indices are repeated: Notice we have index 0, 1 followed by another 0, 1. This can cause issues later.
  • NaN Values: Table 1 didn't have an "address" column, so Pandas filled those cells with NaN.

⚠ Fixing the Index

If we select row 1 now, we get TWO rows because the index is duplicated!


print(result_row.loc[1])

    name  class  address
1   Aman     12      NaN
1  Sneha     10  Chennai

To fix this, we usually use ignore_index=True during concatenation so the new table counts 0, 1, 2, 3 smoothly.


๐Ÿ“Œ Summary

Axis Direction Analogy Key Issue
axis=0 Row-wise (Vertical) Stacking boxes Indices might duplicate.
axis=1 Col-wise (Horizontal) Books on a shelf Column names might duplicate.

Post a Comment

Do Leave Your Comments...

Previous Post Next Post

Contact Form