Creating DataFrame Structure

๐Ÿ“Š Pandas Level 2: Creating DataFrames

In this tutorial, we will master the foundational skill of creating Pandas DataFrames. Whether you are preparing data for machine learning or simply analyzing sales figures, understanding how to structure your data manually is the first step in any AI workflow.


๐Ÿ”น What is a DataFrame?

A DataFrame is the primary data structure in Pandas. It is a 2-dimensional labeled data structure with columns of potentially different types.

Think of it like a programmable Excel spreadsheet.
  • Rows: Horizontal entries, automatically indexed (0, 1, 2...) unless specified otherwise.
  • Columns: Vertical stacks of data, usually identified by a label (Header).

1. Creating a DataFrame via Dictionary

Best for: Column-oriented data (when you have lists of values for each category).

When using a dictionary:

  • Keys → Become Column Headers.
  • Values → Become the column data (rows).

๐Ÿงช Code Example

import pandas as pd

# Define the dictionary
data = {
    "Name": ["Atul", "Aman", "Harsh", "Rishi", "Devansh"],
    "Class": [11, 39, 47, 56, 76],
    "Status": ["pass", "fail", "pass", "pass", "fail"]
}

# Create the DataFrame
df = pd.DataFrame(data)
print(df)

๐Ÿ“ค Output

      Name  Class Status
0     Atul     11   pass
1     Aman     39   fail
2    Harsh     47   pass
3    Rishi     56   pass
4  Devansh     76   fail
⚠ The Golden Rule: Equal Length
Pandas enforces strict structural integrity. All lists within your dictionary must have the same length. If "Name" has 5 items but "Class" has 4, Pandas will raise a ValueError.

2. Creating a DataFrame via List of Lists

Best for: Row-oriented data (when you are scraping data row by row).

When using a list of lists, each inner list represents a single record (row).

๐Ÿงช Code Example (Without Headers)

import pandas as pd

# Define a list of lists (Rows)
data_rows = [
    ['Anna', 30, 'Paris', 70000],
    ['Peter', 32, 'Berlin', 62000],
    ['Linda', 28, 'New York', 65000],
    ['Berlin', 62, 'London', 85000]
]

print(pd.DataFrame(data_rows))

๐Ÿ“ค Output

        0   1         2      3
0    Anna  30     Paris  70000
1   Peter  32    Berlin  62000
2   Linda  28  New York  65000
3  Berlin  62    London  85000

Note: Since we didn't provide headers, Pandas assigned default integers (0, 1, 2...) to the columns.


3. Adding Custom Headers

To make the "List of Lists" method readable, we must explicitly define the column names using the columns parameter.

๐Ÿงช Code Example

# Define headers matching the order of data in the rows
headers = ["Name", "Age", "Location", "Salary"]

# Pass the data AND the columns argument
df_rows = pd.DataFrame(data_rows, columns=headers)
print(df_rows)

๐Ÿ“ค Output

     Name  Age  Location  Salary
0    Anna   30     Paris   70000
1   Peter   32    Berlin   62000
2   Linda   28  New York   65000
3  Berlin   62    London   85000

๐Ÿ“Œ Summary: Dictionary vs. List

Feature Dictionary Method List of Lists Method
Structure Column-wise Row-wise
Headers Automatic (from Keys) Manual (must specify columns=)
Best Use When you have full columns of data ready When processing data record-by-record

Key Takeaways

  1. Indices: Pandas generates row numbers (0 to N-1) automatically.
  2. Consistency: Ensure all rows have the same number of elements to avoid errors.
  3. Readability: Always use column headers to make your data self-explanatory.

Post a Comment

Do Leave Your Comments...

Previous Post Next Post

Contact Form