How to extract a column from a dataframe in Python

Learn how to extract a column from a Python dataframe. We cover various methods, tips, real-world uses, and common error debugging.

Published on:

Tue

Feb 24, 2026

Updated on:

Mon

Apr 6, 2026

The Replit Team

ON THIS PAGE

Example H2

You will often need to extract a column from a DataFrame for data analysis in Python. This is a fundamental skill for anyone who works with tabular data, from data scientists to engineers.

In this article, we'll cover several techniques to select columns, like using brackets [] or the .loc accessor. You'll get practical tips, see real-world applications, and learn to debug common errors.

Using bracket notation to extract a column

import pandas as pd data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 24, 35]} df = pd.DataFrame(data) age_column = df['Age'] print(age_column)--OUTPUT--0 28 1 24 2 35 Name: Age, dtype: int64

Using bracket notation is the most direct way to select a single column in pandas. By passing the column name as a string, as in df['Age'], you can quickly isolate the data you need. This method is highly readable and is often the go-to for simple column extraction.

This operation returns a pandas Series, not a new DataFrame. A Series is a one-dimensional labeled array, which is why the output shows the index alongside the values from the 'Age' column. Understanding this distinction is key for performing further memory-efficient operations on the extracted data, especially when creating a DataFrame in Python.

Basic column extraction techniques

While bracket notation is a great starting point, you can also use dot notation or the more advanced .iloc and .loc accessors for extraction.

Using dot notation with `.` attribute access

import pandas as pd data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 24, 35]} df = pd.DataFrame(data) age_column = df.Age print(age_column)--OUTPUT--0 28 1 24 2 35 Name: Age, dtype: int64

Dot notation, like df.Age, offers a convenient shortcut for accessing columns. It treats the column name as an attribute of the DataFrame, which can make your code look cleaner and more concise than bracket notation.

However, this method has some important limitations:

It won't work if your column name contains spaces or special characters, as it must be a valid Python identifier.
It can cause conflicts if the column name is the same as a built-in DataFrame method, like count or mean.

Because of these potential issues, it's often safer to stick with bracket notation, especially in production code.

Using `iloc` for position-based extraction

import pandas as pd data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 24, 35]} df = pd.DataFrame(data) # Extract the second column (index 1) second_column = df.iloc[:, 1] print(second_column)--OUTPUT--0 28 1 24 2 35 Name: Age, dtype: int64

When you need to select data by its integer position rather than its name, you'll use the .iloc accessor. It's a powerful tool for position-based indexing. In the expression df.iloc[:, 1], the colon : tells pandas to grab all rows, while the 1 specifies the column at index 1.

This method is especially useful when column names are dynamic or when you're iterating through columns by their position.
Keep in mind that Python uses zero-based indexing, so index 1 always refers to the second column.

Using `loc` for label-based extraction

import pandas as pd data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 24, 35]} df = pd.DataFrame(data) age_column = df.loc[:, 'Age'] print(age_column)--OUTPUT--0 28 1 24 2 35 Name: Age, dtype: int64

The .loc accessor is your tool for label-based selection, which means you select data by its name instead of its position. In the expression df.loc[:, 'Age'], the colon : grabs every row, and 'Age' specifies the column by its actual label. This approach is often preferred for its clarity and reliability.

Your code becomes more readable because you're using explicit names.
It's less likely to break if the order of columns changes, making it more robust than position-based methods.

Advanced column extraction techniques

Building on those foundational methods, you can now combine them to handle more sophisticated tasks like conditional filtering, multi-column selection, and pattern-based extraction. AI-powered Python development can accelerate this process by suggesting optimal data manipulation patterns.

Extracting columns with boolean filtering

import pandas as pd data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 24, 35]} df = pd.DataFrame(data) # Extract ages greater than 25 filtered_ages = df[df['Age'] > 25]['Age'] print(filtered_ages)--OUTPUT--0 28 2 35 Name: Age, dtype: int64

Boolean filtering lets you extract a column based on a condition. The expression df['Age'] > 25 generates a boolean mask—a series of True or False values. This mask is then used to select only the rows from the DataFrame that meet the condition.

This is a powerful way to isolate specific data points by first filtering the rows based on a logical test.
The final ['Age'] then selects the desired column from that filtered result, giving you a Series with only the values that satisfied your condition.

Extracting multiple columns

import pandas as pd data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 24, 35], 'City': ['New York', 'Paris', 'Berlin']} df = pd.DataFrame(data) subset = df[['Name', 'Age']] print(subset)--OUTPUT--Name Age 0 John 28 1 Anna 24 2 Peter 35

To select multiple columns, you pass a list of column names inside the selection brackets. This is why you see double brackets in df[['Name', 'Age']]; the inner brackets define a Python list of the columns you want, and the outer brackets apply that selection to the DataFrame. This technique is especially useful when working with data loaded from external sources like reading CSV files in Python.

The key difference here is that this operation returns a new DataFrame, not a Series, which is perfect for creating subsets of your data for further analysis.

Using column filtering with regex patterns

import pandas as pd data = {'name': ['John'], 'age': [28], 'height_cm': [175], 'weight_kg': [70]} df = pd.DataFrame(data) measurement_cols = df.filter(regex='weight|height') print(measurement_cols)--OUTPUT--height_cm weight_kg 0 175 70

When you need to select columns based on a naming pattern, the filter() method is your best friend. It's especially powerful when combined with the regex parameter, allowing you to use regular expressions to find columns that match your criteria.

In the expression df.filter(regex='weight|height'), the pattern looks for column names that contain either "weight" or "height".
The pipe character | acts as an "OR" operator, making this a flexible way to grab related columns without having to type out each name individually.

Move faster with Replit

Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly. This means you can go from reading about a technique to trying it out in seconds, without any environment configuration.

While knowing how to extract columns is useful, Agent 4 helps you move from piecing together individual techniques to building complete applications. Instead of just writing code snippets, you can describe the entire tool you want to build, and Agent will handle the rest.

A data utility that extracts specific columns from an uploaded CSV and generates a new, filtered file.
A simple dashboard that pulls columns based on a pattern—like _total or _avg—and displays their summary statistics.
A data validation tool that checks a specific column for entries that meet a certain condition, like age > 21, and flags the rest.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

Even with the right techniques, you might run into a few common roadblocks when extracting columns from a DataFrame.

Handling `KeyError` when accessing non-existent columns

A KeyError is one of the most frequent errors you'll see. It pops up when you try to access a column that doesn't exist, which is usually caused by a simple typo or incorrect capitalization in the column name. To avoid this, it's a good habit to print df.columns to get an exact list of available columns before making your selection.

Avoiding problems with `.` notation for columns with spaces or reserved names

While dot notation (.) looks clean, it can cause silent errors. If a column name clashes with a built-in DataFrame method—like count or max—calling df.count will execute the method instead of selecting your column. This ambiguity is why it's safer to use bracket notation, which always works as intended.

Avoiding `SettingWithCopyWarning` when modifying filtered data

The SettingWithCopyWarning often appears when you try to change values on a filtered subset of your data. This happens because chained indexing—like df[condition][column]—can create a temporary copy, so your changes might not be saved to the original DataFrame. To modify data reliably, use the .loc accessor with a single operation, such as df.loc[condition, column] = new_value. This guarantees you're working on the original data and not a copy.

Handling `KeyError` when accessing non-existent columns

A KeyError is one of the most common roadblocks you'll encounter. It happens when you try to select a column that doesn't exist, often due to a simple typo. The code below shows this error in action.

import pandas as pd data = {'Name': ['John', 'Anna'], 'Age': [28, 24]} df = pd.DataFrame(data) # This will raise a KeyError salary_column = df['Salary']

The code attempts to select a 'Salary' column, but the DataFrame was only created with 'Name' and 'Age'. This mismatch is what triggers the error. The following code demonstrates how to prevent this from happening.

import pandas as pd data = {'Name': ['John', 'Anna'], 'Age': [28, 24]} df = pd.DataFrame(data) # Safe column access if 'Salary' in df.columns: salary_column = df['Salary'] else: print("Column 'Salary' does not exist")

To prevent a KeyError, you can first check if a column exists. The expression if 'Salary' in df.columns: safely confirms the column's presence before you try to access it. This is a crucial defensive check, especially when working with data from external sources where column names can be inconsistent. By verifying the column name against the df.columns attribute, you can build more robust code that won’t crash unexpectedly.

Avoiding problems with `.` notation for columns with spaces or reserved names

While dot notation is convenient, it breaks when column names aren't valid Python identifiers. This occurs with names containing spaces, like 'First Name', or when a name is a reserved keyword, like 'class'. The code below shows how both cases cause errors.

import pandas as pd data = {'First Name': ['John', 'Anna'], 'class': [101, 102]} df = pd.DataFrame(data) # These will cause errors name_column = df.First Name # Syntax error class_column = df.class # Reserved keyword issue

The expression df.First Name is invalid syntax because of the space, and df.class clashes with a reserved Python keyword. This is why both attempts fail. The following code shows how to access these columns correctly.

import pandas as pd data = {'First Name': ['John', 'Anna'], 'class': [101, 102]} df = pd.DataFrame(data) # Use bracket notation instead name_column = df['First Name'] class_column = df['class']

Bracket notation is the reliable fix. By passing the column name as a string, like df['First Name'], you sidestep the syntax rules that break dot notation. This approach works every time, even if a column name contains spaces or matches a reserved keyword.

It’s a good habit to default to bracket notation, especially when working with data from external files where column names can be unpredictable. This makes your code more robust and prevents unexpected errors.

Avoiding `SettingWithCopyWarning` when modifying filtered data

The SettingWithCopyWarning is a common heads-up from pandas. It appears when you try to modify a slice of a DataFrame, as pandas isn't sure if you're changing the original data or a temporary copy. The code below shows how this warning can occur.

import pandas as pd data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 24, 35]} df = pd.DataFrame(data) # This creates a view, not a copy adults = df[df['Age'] > 25] # May trigger SettingWithCopyWarning adults['Status'] = 'Adult'

The operation adults = df[df['Age'] > 25] can create a view instead of a copy. When you then try to assign a new column, pandas can't guarantee the change will affect the original df. See how to perform this modification safely.

import pandas as pd data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 24, 35]} df = pd.DataFrame(data) # Create an explicit copy adults = df[df['Age'] > 25].copy() # Now it's safe to modify adults['Status'] = 'Adult'

To safely modify a filtered DataFrame, you should create an explicit copy. By adding .copy() to your filtering operation, as in df[df['Age'] > 25].copy(), you create a new, independent DataFrame. This ensures that any subsequent changes, like adding a new column, are applied to the copy and not a temporary view of the original data. This simple step prevents the SettingWithCopyWarning and makes your code's behavior predictable.

Real-world applications

Moving beyond troubleshooting, these extraction methods are fundamental to practical business analysis, from spending patterns to profit calculations. With vibe coding, you can quickly build interactive data analysis tools using natural language.

Analyzing customer spending with `==` filtering and `.mean()`

By combining boolean filtering with the == operator, you can isolate specific data segments—like sales from one category—and then run calculations such as .mean() to uncover key insights.

import pandas as pd # Sample customer purchase data data = {'Customer_ID': [101, 102, 103, 104], 'Purchase_Amount': [120.50, 85.25, 320.00, 76.50], 'Category': ['Electronics', 'Clothing', 'Electronics', 'Groceries']} df = pd.DataFrame(data) # Extract electronics purchases and calculate statistics electronics = df[df['Category'] == 'Electronics']['Purchase_Amount'] print(f"Average electronics purchase: ${electronics.mean():.2f}") print(f"Total electronics revenue: ${electronics.sum():.2f}")

This code demonstrates a common data analysis workflow. It first creates a boolean mask with df['Category'] == 'Electronics' to identify all rows for electronics sales. This mask is then used to filter the DataFrame, and ['Purchase_Amount'] is immediately chained to extract just that column from the filtered results. The final step uses aggregate functions on the resulting Series to find key metrics:

.mean() calculates the average purchase amount.
.sum() finds the total revenue from electronics sales.

Creating a profit analysis with column extraction and `[]` operations

By extracting columns like Revenue and Expenses using bracket notation, you can perform arithmetic operations between them to create a new Profit column.

import pandas as pd # Sample sales data by month data = {'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'], 'Revenue': [10500, 12600, 11400, 15200, 14100, 16800], 'Expenses': [8200, 8700, 9100, 10200, 10800, 11500]} df = pd.DataFrame(data) # Extract and calculate profit df['Profit'] = df['Revenue'] - df['Expenses'] print(df[['Month', 'Profit']])

This code demonstrates how pandas handles column-wide arithmetic. It creates a new 'Profit' column by performing a vectorized subtraction—it subtracts the entire 'Expenses' column from the 'Revenue' column in a single, efficient operation. This is much faster than looping through rows manually.

The result of df['Revenue'] - df['Expenses'] is a new pandas Series.
This Series is then assigned to df['Profit'], adding it to the DataFrame.

Finally, the code prints a two-column view showing the calculated profit for each month. For analysis workflows like this, you might want to preserve results by saving DataFrame to CSV in Python.

Get started with Replit

Turn your knowledge into a real tool with Replit Agent. Try prompts like "Build a CSV utility that extracts columns by name" or "Create a dashboard that calculates the mean of columns ending in _total".

Replit Agent writes the code, tests for errors, and deploys the app, turning your description into a working tool. Start building with Replit.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Get started free

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Get started for free

Follow @Replit