Pandas is a popular data analysis library in Python that provides the DataFrame structure for working with tabular data. DataFrames come with indexes by default to identify the rows, but sometimes we may want to remove these indexes for further analysis or due to compatibility issues with other libraries.
In this comprehensive guide, we‘ll explore the ins and outs of removing indexes from Pandas DataFrames using practical examples. Whether you need to drop a single-level or multi-level index, reset the index while keeping the data, or delete the index altogether, this post has you covered.
Overview of Pandas DataFrame Indexes
A Pandas DataFrame index refers to the labels used to identify the rows of data. By default, Pandas assigns numeric indexes starting from 0 to number of rows minus one.
import pandas as pd
data = {"name":["John", "Mary", "Steve"],
"age":[25, 30, 28]}
df = pd.DataFrame(data)
print(df)
This prints:
name age
0 John 25
1 Mary 30
2 Steve 28
Here the index column contains the numeric values 0, 1, 2 automatically assigned by Pandas.
We can also set a specific column to be the index using set_index()
:
df = df.set_index("name")
print(df)
Now the name column becomes the DataFrame index:
age
name
John 25
Mary 30
Steve 28
Indexes in Pandas can be single-level or multi-level (hierarchical) using multiple columns. Indexes make it convenient to access the rows and slice data.
Removing a Single-Level Index
The easiest way to remove an index from a Pandas DataFrame is by using reset_index()
method.
1. Reset Index and Keep Data
To reset the index but keep the index data as a column, set drop
param to False
:
df = df.reset_index(drop=False)
print(df)
name age
0 John 25
1 Mary 30
2 Steve 28
This resets the index to default numeric labels, while moving the old index ("name") as a column.
2. Reset Index and Drop Data
To delete the index altogether set drop
to True
:
df = df.reset_index(drop=True)
print(df)
age
0 25
1 30
2 28
Now the index is numeric starting from 0, and old index data is deleted.
3. Specify Index Level to Remove
If your DataFrame has multi-level indexes, you can specify which level to remove by index level number or name:
df = df.set_index(["name", "country"])
df = df.reset_index(level="country") # remove 2nd level index
You can also use 0, 1 integers to denote levels to drop.
This gives you precise control over which index levels to reset.
Deleting MultiIndex from DataFrame
MultiIndexes (hierarchical indexes) are indexes made up of multiple columns in Pandas DataFrames. Here‘s an example:
data = {"name":["John", "Mary", "Steve"],
"country":["USA", "UK", "Japan"],
"age":[25, 30, 28]}
df = pd.DataFrame(data).set_index(["name", "country"])
print(df)
This prints a DataFrame with a 2-level MultiIndex:
age
name country
John USA 25
Mary UK 30
Steve Japan 28
To remove MultiIndexes, use same reset_index()
method by passing drop=True
:
df = df.reset_index(drop=True)
print(df)
name country age
0 John USA 25
1 Mary UK 30
2 Steve Japan 28
This reduced the MultiIndex to simple numeric index.
Alternatively, specify level=0
to only remove 1st level index:
df = df.reset_index(level=0)
print(df)
country age
name
John USA 25
Mary UK 30
Steve Japan 28
Second-level index is preserved while first index is reset.
Note: By default, reset_index()
will move the index columns into DataFrame. To delete index data use drop=True
.
Inserting Old Index Values as Column
When resetting Pandas DataFrame indexes, you may also want to retain the index values as an additional column. This can be done by:
-
Not passing
drop=True
inreset_index()
. -
Specifying custom column name for index using
names
attribute:
df = df.reset_index(drop=True, names="former_index")
print(df)
Here we deleted the index but added a new former_index
column containing original index values.
You can pass a list of column names if resetting a MultiIndex.
Example Use Cases
Now let‘s go through some examples highlighting situations where removing Pandas DataFrame Indexes is necessary:
1. Index Reset Before Concatenation
When concatenating multiple DataFrames using concat()
, it‘s required they have same indexes. So to combine DataFrames with different indexes, we‘ll first reset them:
df1 = pd.DataFrame({"A":[1, 2], "B":[3,4]}, index=[1, 2])
df2 = pd.DataFrame({"A":[5, 6], "B":[7, 8]}, index=[3, 4])
df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
df_concat = pd.concat([df1, df2])
print (df_concat)
This prints:
A B
0 1 3
1 2 4
0 5 7
1 6 8
Resetting the index allowed correct concatenation output.
2. Fixing Index Issues when Plotting
Pandas plot() method can sometimes fail to set indexes properly on plotted Axes. To fix such issues, we‘ll reset index beforehand:
df = pd.DataFrame({"Sales" : [5,8,6]}, index = ["Jan", "Feb", "Mar"])
df = df.reset_index(drop=True)
df.plot(x="index", y="Sales", kind="bar")
Now Pandas can properly set "Jan", "Feb", "Mar" as x-axis labels for the bar plot.
3. Simplifying DataFrame before Export
Resetting MultiIndexes into clean numeric index can simplify DataFrame structure considerably. This helps when exporting the data to CSV file:
df = df.reset_index(drop=True)
df.to_csv("output.csv", index=False)
The exported CSV file will now have minimal indexes making it ready for usage in other apps like Excel.
As you can see, judiciously removing DataFrame indexes using reset_index()
provides flexibility in many analytical situations.
Alternative: Set New Index from Column
While we‘ve focused on removing indexes so far, another option is to directly index your DataFrame using an existing non-index column:
df = df.set_index("name") #1 Set "name" column as new index
df = df.reset_index() #2 Reset existing index
print(df)
This achieves similar effect of removing old index and setting a new one based on a data column.
The main difference is reset_index keeps old index values as column while set_index()
directly overrides.
Conclusion
Pandas DataFrames come equipped with row indexes for intuitive access and analysis. But to connect Pandas with other Python libraries, plot data, simplify outputs or fix issues, removing these indexes is often required.
We learned how to:
- Reset index while keeping or deleting old index data
- Remove single-level or multi-level indexes
- Insert index columns using custom names
- Use cases like concatenation and plotting where index reset helps
The reset_index()
method makes it very convenient to manage DataFrame indexes. Mastering index manipulation unlocks the full potential of Pandas for data science and analytics applications.