How to get the column names of a dataframe in Python

Need to get column names from a DataFrame in Python? Learn different methods, tips, real-world applications, and how to debug common errors.

How to get the column names of a dataframe in Python
Published on: 
Fri
Feb 20, 2026
Updated on: 
Mon
Apr 6, 2026
The Replit Team

To work with a pandas DataFrame, you need its column names. This is a crucial first step in data analysis that lets you understand a dataset's structure and select specific data.

In this article, we'll cover several techniques to retrieve column names. We'll also share practical tips, explore real-world applications, and offer advice to help you debug common issues.

Using df.columns to get column names

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
column_names = df.columns
print(column_names)--OUTPUT--Index(['A', 'B', 'C'], dtype='object')

The df.columns attribute is the most straightforward way to get column names from a DataFrame. It directly accesses the labels of the columns, which the code then assigns to the column_names variable. This approach works regardless of how you go about creating a DataFrame in Python.

Notice the output isn't a standard Python list—it's a pandas Index object. This object is optimized for performance and holds the axis labels. While it behaves like a list, it's immutable, meaning you can't change its elements directly. This design helps maintain your DataFrame's integrity. If you need a regular list, you can convert it using list(df.columns).

Standard approaches to column name access

Beyond the direct df.columns attribute, you can also convert the column Index to a list or select columns based on their data type.

Converting column index to a list with tolist()

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
column_list = df.columns.tolist()
print(column_list)--OUTPUT--['A', 'B', 'C']

The tolist() method is a clean way to convert the column Index into a standard Python list. While the Index object is great for performance within pandas, you'll often need a regular list for other tasks, like passing column names to other functions or libraries that expect a list.

  • A Python list is mutable, so you can modify it by adding, removing, or reordering column names as needed. This is something you can't do with the original Index object.

Using list() conversion for column names

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
column_list = list(df.columns)
print(column_list)--OUTPUT--['A', 'B', 'C']

You can also use Python's built-in list() constructor to get a list of column names. This approach is functionally identical to using tolist(), but it leverages a standard Python feature that might feel more familiar if you're used to general Python programming.

  • The result is a standard, mutable Python list that you can easily modify.
  • Choosing between list(df.columns) and df.columns.tolist() is often a matter of coding style—both are valid and widely used.

Getting column names by data type with select_dtypes()

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['x', 'y', 'z'], 'C': [1.1, 2.2, 3.3]})
numeric_columns = df.select_dtypes(include=np.number).columns.tolist()
print(f"Numeric columns: {numeric_columns}")--OUTPUT--Numeric columns: ['A', 'C']

Sometimes you only need columns of a specific data type. The select_dtypes() method is perfect for this, as it filters your DataFrame to include or exclude columns based on their type. By using include=np.number, you're telling pandas to select all numeric columns—like integers and floats.

  • This technique is great for separating numerical data for calculations. Once the DataFrame is filtered, you can chain .columns.tolist() to get the final list of column names.

Advanced column name techniques

Beyond simple retrieval, you can also tackle more advanced tasks like filtering names with patterns, renaming columns, and navigating complex MultiIndex structures through vibe coding.

Filtering column names with pattern matching

import pandas as pd
df = pd.DataFrame({'A_1': [1, 2], 'A_2': [3, 4], 'B_1': [5, 6], 'B_2': [7, 8]})
a_columns = [col for col in df.columns if col.startswith('A')]
print(a_columns)--OUTPUT--['A_1', 'A_2']

When your DataFrame has many columns with a consistent naming scheme, you can use a list comprehension to filter them. The example code builds a new list, a_columns, by checking if each column name starts with 'A' using the startswith() method. It’s a common and efficient Pythonic way to select columns that follow a specific pattern.

  • This technique is highly flexible. You can easily adapt it by using other string methods like endswith() or checking for substrings with in.

Renaming columns and retrieving updated names

import pandas as pd
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df.columns = [col.upper() for col in df.columns]
print(f"New column names: {df.columns.tolist()}")--OUTPUT--New column names: ['A', 'B']

You can rename columns by assigning a new list of names directly to the df.columns attribute. The code uses a list comprehension to generate this new list, applying the upper() method to each column name. It’s an efficient way to perform bulk renaming, like standardizing all column names to uppercase.

  • This direct assignment overwrites all existing column names, so you must provide a new list with the same number of elements as there are columns in the DataFrame.

Working with MultiIndex columns

import pandas as pd
multi_cols = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'), ('B', 'one')])
df = pd.DataFrame(columns=multi_cols, data=[[1, 2, 3]])
top_level = df.columns.get_level_values(0).unique().tolist()
print(f"Top level columns: {top_level}")--OUTPUT--Top level columns: ['A', 'B']

When your data has hierarchical columns, pandas uses a MultiIndex for nested labels. To get names from a specific layer of this hierarchy, you need a more targeted approach than just accessing df.columns.

  • The get_level_values(0) method is your key tool here. It lets you grab all the names from a specific level—in this case, the top level (level 0).
  • After selecting the level, chaining .unique().tolist() is a common pattern to remove duplicates and convert the result into a standard Python list.

Move faster with Replit

Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly. It’s designed to help you move from learning individual techniques to building complete applications. With Agent 4, you can describe what you want to build, and it will handle the code, databases, APIs, and deployment.

Instead of manually piecing together methods like df.columns.tolist() or select_dtypes(), you can describe the final product you need. The Agent can then build it for you. For example:

  • A data filtering tool that reads a CSV and extracts only columns starting with a specific prefix, like 'user_' or 'prod_', into a new file.
  • A data cleaning utility that automatically renames columns to a standard format and separates numeric columns for statistical analysis.
  • A report generator that flattens a hierarchical MultiIndex DataFrame by extracting top-level column names to prepare data for a charting library.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

Even straightforward tasks have pitfalls, so here’s how to handle common errors when working with column names in your DataFrame.

Fixing errors when accessing non-existent column names with df['column']

Trying to access a column that doesn't exist using square brackets, like df['non_existent'], will raise a KeyError and stop your script. This is a frequent issue, especially when dealing with unfamiliar datasets or typos in column names. For more comprehensive strategies on solving KeyError in Python, there are additional techniques beyond DataFrame column access.

  • To prevent this, you can check if a column exists before trying to access it with a simple conditional: if 'column_name' in df.columns:.
  • A more direct approach is to use the df.get('column_name') method. Instead of raising an error, it safely returns None if the column isn't found, allowing your code to continue running.

Handling duplicate column names in DataFrames

Pandas allows duplicate column names, which can lead to confusing results. When you select a column that has a duplicate name, pandas returns a DataFrame containing all columns with that name—not a single Series as you might expect. This can cause issues later in your code if it assumes a one-dimensional data structure.

  • You can quickly check for duplicates by running df.columns.duplicated().any(), which returns True if any duplicates exist.
  • It's best practice to resolve duplicates by renaming or removing them to ensure your column references are unambiguous and your code behaves predictably.

Common mistake when modifying column names with .str methods

A common mistake is trying to modify column names without reassigning the result. The df.columns attribute is immutable, meaning you can't change its values in place. Simply calling a method like df.columns.str.lower() won't have any effect on your DataFrame's columns.

  • When using string methods available through the .str accessor, you must assign the modified names back to df.columns.
  • The correct syntax is df.columns = df.columns.str.lower(). This creates a new index with the modified names and replaces the old one.

Fixing errors when accessing non-existent column names with df['column']

Using square brackets like df['column'] is direct, but it will raise a KeyError and halt your program if the column doesn't exist. This is a frequent roadblock, often caused by a simple typo. The code below shows this error in action.

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
# This will cause an error
value = df['C'][0]

This code triggers a KeyError because the DataFrame doesn't contain a column named 'C', halting the script instantly. The next example shows how to prevent this crash and handle the missing column gracefully.

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
# Check if column exists before accessing
if 'C' in df.columns:
value = df['C'][0]
else:
print("Column 'C' does not exist")

This code avoids a KeyError by first checking if the column exists. It's a simple but effective defensive programming technique that keeps your script from crashing unexpectedly.

  • The key is the conditional check if 'C' in df.columns:, which verifies the column's presence before any operation is attempted.

This is especially useful when you're working with unfamiliar datasets or when column names might contain typos.

Handling duplicate column names in DataFrames

While pandas permits duplicate column names, this flexibility can introduce ambiguity. When you select a column by a name that appears more than once, you get a DataFrame with all matching columns, not the expected single Series. The code below shows this in action.

import pandas as pd
# Creating a DataFrame with duplicate column names
df = pd.DataFrame([[1, 2, 3]], columns=['A', 'B', 'A'])
# This returns both columns named 'A', causing ambiguity
value = df['A']
print(value)

Since the DataFrame contains two columns named 'A', selecting df['A'] returns both. This ambiguity can break any subsequent operations that expect a single column. The code below shows how you can check for duplicates to avoid this issue.

import pandas as pd
# Creating a DataFrame with duplicate column names
df = pd.DataFrame([[1, 2, 3]], columns=['A', 'B', 'A'])
# Rename columns to make them unique
df.columns = ['A_1', 'B', 'A_2']
# Now we can access them unambiguously
value = df['A_1']
print(value)

The best way to fix duplicate columns is to rename them. The code assigns a new list of unique names like ['A_1', 'B', 'A_2'] directly to df.columns. This makes each column distinct, so you can select df['A_1'] without any confusion.

  • You should watch for this issue when merging datasets or reading files where column names might not be unique, as it ensures your selections are predictable and error-free.

Common mistake when modifying column names with .str methods

Common mistake when modifying column names with .str methods

A frequent error is trying to modify column names without reassigning the result. Because the df.columns attribute is immutable, you can't change its values in place. Simply calling a method like df.columns.str.upper() won't actually update your DataFrame.

The code below shows what happens when you forget to reassign the modified names—the original column names remain unchanged.

import pandas as pd
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
# This doesn't actually change the DataFrame
df.columns.str.upper()
print(df.columns) # Still shows lowercase column names

The df.columns.str.upper() method returns a new Index of uppercase names, but it doesn't alter the DataFrame directly. Since the code doesn't reassign this new object, the original column names remain unchanged. The correct implementation is shown below.

import pandas as pd
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
# Correctly modify column names by assignment
df.columns = df.columns.str.upper()
print(df.columns) # Now shows uppercase column names

The fix is to reassign the modified names back to the DataFrame. The code works because df.columns = df.columns.str.upper() creates a new Index object with the uppercase names and replaces the old one. This is the correct way to perform bulk changes, like standardizing all your column names to a consistent format.

  • Remember this reassignment step any time you use .str methods on columns, or your changes won't actually be applied to the DataFrame.

Real-world applications

Now that you can troubleshoot common column issues, you can apply those skills to real-world data preparation like cleaning and feature selection.

Cleaning and standardizing messy column names with .strip() and .replace()

Real-world datasets often come with messy column names full of extra spaces, inconsistent capitalization, and mixed separators, especially when reading CSV files in Python, but you can quickly clean them up by chaining string methods like strip() and replace().

import pandas as pd
# DataFrame with messy column names
df = pd.DataFrame({' User ID ': [101, 102], 'First Name': ['Alice', 'Bob'],
'LAST_NAME': ['Smith', 'Jones'], 'email-address': ['a@example.com', 'b@example.com']})
# Standardize column names
df.columns = [col.strip().lower().replace(' ', '_').replace('-', '_') for col in df.columns]
print(df.columns.tolist())

This code uses a list comprehension to transform messy column names into a consistent, predictable format. It processes each name through a sequence of operations to ensure uniformity across the DataFrame.

  • First, .strip() removes any leading or trailing whitespace from a name.
  • Next, .lower() converts all characters to lowercase.
  • Finally, .replace() is used twice to substitute both spaces and hyphens with underscores.

The resulting list of clean names is then assigned back to df.columns, making the data much easier to work with.

Using in operator for feature selection with column names

For feature selection, you can use the in operator to easily pick out columns whose names contain specific keywords, creating a focused subset of your data for analysis.

import pandas as pd
# Sales dataset with various metrics
df = pd.DataFrame({
'product_id': [1, 2, 3], 'product_name': ['Widget', 'Gadget', 'Tool'],
'price_usd': [19.99, 24.99, 14.99], 'cost_usd': [8.50, 10.75, 6.25],
'sales_q1': [150, 200, 100], 'sales_q2': [160, 210, 90],
'returns_q1': [5, 8, 3], 'returns_q2': [6, 7, 4]
})
# Select price and sales columns for analysis
price_sales_cols = [col for col in df.columns if 'price' in col or 'sales' in col]
analysis_df = df[price_sales_cols]
print(analysis_df)

This code creates a focused DataFrame by selecting columns based on their names. It uses a list comprehension to build a new list, price_sales_cols, by checking each column name from the original DataFrame.

  • The condition 'price' in col finds columns related to pricing.
  • The or 'sales' in col part adds columns related to sales figures.

Finally, this new list of names is used to slice the original DataFrame, resulting in a smaller analysis_df that contains only the desired price and sales data.

Get started with Replit

Put what you've learned into practice by building a real tool. Just tell Replit Agent what you want: “Build a script to standardize messy column names” or “Create a utility that filters a CSV for sales columns.”

It will write the necessary code, test for errors, and deploy your new application directly from your browser. Start building with Replit.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.