๐ Pandas Level 2: Creating DataFrames
In this tutorial, we will master the foundational skill of creating Pandas DataFrames. Whether you are preparing data for machine learning or simply analyzing sales figures, understanding how to structure your data manually is the first step in any AI workflow.
๐น What is a DataFrame?
A DataFrame is the primary data structure in Pandas. It is a 2-dimensional labeled data structure with columns of potentially different types.
Think of it like a programmable Excel spreadsheet.
- Rows: Horizontal entries, automatically indexed (0, 1, 2...) unless specified otherwise.
- Columns: Vertical stacks of data, usually identified by a label (Header).
1. Creating a DataFrame via Dictionary
Best for: Column-oriented data (when you have lists of values for each category).
When using a dictionary:
- Keys → Become Column Headers.
- Values → Become the column data (rows).
๐งช Code Example
import pandas as pd
# Define the dictionary
data = {
"Name": ["Atul", "Aman", "Harsh", "Rishi", "Devansh"],
"Class": [11, 39, 47, 56, 76],
"Status": ["pass", "fail", "pass", "pass", "fail"]
}
# Create the DataFrame
df = pd.DataFrame(data)
print(df)
๐ค Output
Name Class Status
0 Atul 11 pass
1 Aman 39 fail
2 Harsh 47 pass
3 Rishi 56 pass
4 Devansh 76 fail
Pandas enforces strict structural integrity. All lists within your dictionary must have the same length. If "Name" has 5 items but "Class" has 4, Pandas will raise a
ValueError.
2. Creating a DataFrame via List of Lists
Best for: Row-oriented data (when you are scraping data row by row).
When using a list of lists, each inner list represents a single record (row).
๐งช Code Example (Without Headers)
import pandas as pd
# Define a list of lists (Rows)
data_rows = [
['Anna', 30, 'Paris', 70000],
['Peter', 32, 'Berlin', 62000],
['Linda', 28, 'New York', 65000],
['Berlin', 62, 'London', 85000]
]
print(pd.DataFrame(data_rows))
๐ค Output
0 1 2 3
0 Anna 30 Paris 70000
1 Peter 32 Berlin 62000
2 Linda 28 New York 65000
3 Berlin 62 London 85000
Note: Since we didn't provide headers, Pandas assigned default integers (0, 1, 2...) to the columns.
3. Adding Custom Headers
To make the "List of Lists" method readable, we must explicitly define the column names
using the columns parameter.
๐งช Code Example
# Define headers matching the order of data in the rows
headers = ["Name", "Age", "Location", "Salary"]
# Pass the data AND the columns argument
df_rows = pd.DataFrame(data_rows, columns=headers)
print(df_rows)
๐ค Output
Name Age Location Salary
0 Anna 30 Paris 70000
1 Peter 32 Berlin 62000
2 Linda 28 New York 65000
3 Berlin 62 London 85000
๐ Summary: Dictionary vs. List
| Feature | Dictionary Method | List of Lists Method |
|---|---|---|
| Structure | Column-wise | Row-wise |
| Headers | Automatic (from Keys) | Manual (must specify columns=) |
| Best Use | When you have full columns of data ready | When processing data record-by-record |
Key Takeaways
- Indices: Pandas generates row numbers (0 to N-1) automatically.
- Consistency: Ensure all rows have the same number of elements to avoid errors.
- Readability: Always use column headers to make your data self-explanatory.