How to plot a linear regression in Python

Learn how to plot linear regression in Python. Discover different methods, tips, real-world applications, and how to debug common errors.

How to plot a linear regression in Python
Published on: 
Tue
Mar 3, 2026
Updated on: 
Thu
Mar 5, 2026
The Replit Team Logo Image
The Replit Team

A linear regression plot visualizes the relationship between variables, a key step in data analysis. Python offers powerful libraries to create these plots with clarity and precision for any dataset.

Here, you'll explore different techniques to create these plots effectively. The article covers practical tips, shows real-world applications, and provides advice to help you debug common errors and refine your visualizations.

Basic linear regression plot with NumPy and Matplotlib

import numpy as np
import matplotlib.pyplot as plt

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3.5, 5, 6.2, 7.5])
m, b = np.polyfit(x, y, 1)
plt.scatter(x, y)
plt.plot(x, m*x + b, color='red')
plt.show()--OUTPUT--[A scatter plot with blue dots representing the data points and a red line showing the linear regression fit]

This approach combines NumPy’s calculation power with Matplotlib’s visualization tools. The core of the regression is the np.polyfit(x, y, 1) function. It computes the slope (m) and intercept (b) for a line of best fit. The final argument, 1, specifies a first-degree polynomial, which is simply a straight line.

Once you have the slope and intercept, you can visualize the results. First, plt.scatter() plots your original data points. Then, plt.plot(x, m*x + b) draws the regression line by applying the calculated coefficients to the x-values, effectively overlaying the trend on your data.

Common libraries for regression visualization

While the NumPy and Matplotlib approach is fundamental, libraries like pandas, seaborn, and scikit-learn offer more direct and powerful methods for regression plotting.

Using pandas for linear regression plots

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'x': [1, 2, 3, 4, 5], 'y': [2, 3.5, 5, 6.2, 7.5]})
plt.scatter(df.x, df.y)
plt.plot(df.x, df.x * 1.35 + 0.7, color='green')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.show()--OUTPUT--[A scatter plot with data points and a green regression line, with labeled X and Y axes]

Using pandas organizes your data into a DataFrame, a common practice that simplifies data handling. From there, you can plot columns directly with matplotlib, as seen with plt.scatter(df.x, df.y).

  • Unlike the previous method, the regression line here is manually defined with the equation df.x * 1.35 + 0.7.
  • This example also adds clarity by labeling the axes using plt.xlabel() and plt.ylabel().

Creating regression plots with seaborn

import seaborn as sns
import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3.5, 5, 6.2, 7.5])
sns.regplot(x=x, y=y, line_kws={"color":"purple"})
plt.title("Linear Regression with Seaborn")
plt.show()--OUTPUT--[A scatter plot with data points, a purple regression line, and a shaded confidence interval region]

Seaborn streamlines regression plotting with its regplot() function. It’s a high-level tool that combines the scatter plot and regression line fitting into a single command, so you don't need to calculate the slope and intercept yourself.

  • The function automatically draws both the data points and the line of best fit.
  • A key feature is the shaded confidence interval it adds around the regression line, which visualizes the uncertainty in the model's fit.
  • You can easily style the line using the line_kws parameter to pass a dictionary of keyword arguments, such as color.

Using scikit-learn for regression visualization

from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt

X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 3.5, 5, 6.2, 7.5])
model = LinearRegression().fit(X, y)
plt.scatter(X, y)
plt.plot(X, model.predict(X), color='orange')
plt.text(1, 7, f'R² = {model.score(X, y):.3f}')
plt.show()--OUTPUT--[A scatter plot with data points, an orange regression line, and an R-squared value displayed]

scikit-learn frames regression as a machine learning task. It’s a powerful library where you first create and train a LinearRegression model using the .fit(X, y) method. This prepares the model to make predictions from your data.

  • Your feature data X must be reshaped with .reshape(-1, 1), as scikit-learn expects a 2D array.
  • The regression line is drawn using model.predict(X), which applies the trained model to generate the line's points.
  • The .score() method conveniently calculates the R-squared value—a metric showing how well the line fits the data—which is then displayed on the plot.

Advanced regression plotting techniques

With the fundamentals covered, you're ready to tackle more complex visualizations, such as plotting multiple variables, showing uncertainty, and building interactive regression plots.

Visualizing multiple regression with 3D plots

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.linear_model import LinearRegression

x1 = np.random.rand(100)
x2 = np.random.rand(100)
y = 2*x1 + 3*x2 + np.random.randn(100)*0.5
X = np.column_stack((x1, x2))
model = LinearRegression().fit(X, y)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x1, x2, y)
x1_range = np.linspace(0, 1, 10)
x2_range = np.linspace(0, 1, 10)
X1, X2 = np.meshgrid(x1_range, x2_range)
Z = model.predict(np.column_stack((X1.ravel(), X2.ravel()))).reshape(X1.shape)
ax.plot_surface(X1, X2, Z, alpha=0.3)
plt.show()--OUTPUT--[A 3D scatter plot with data points and a semi-transparent surface representing the multiple regression plane]

When your outcome depends on two variables, you move from a regression line to a regression plane. This code visualizes that relationship in 3D using matplotlib and scikit-learn. A LinearRegression model is trained on two independent variables (x1, x2) to predict a dependent one (y).

  • First, ax.scatter() plots the raw data points in 3D space.
  • Then, np.meshgrid() creates a coordinate grid, and model.predict() calculates the corresponding Z-values to define the regression plane.
  • Finally, ax.plot_surface() draws this plane over the data.

Adding confidence intervals to regression plots

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3.5, 5, 6.2, 7.5])
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
y_pred = intercept + slope * x
plt.scatter(x, y)
plt.plot(x, y_pred, 'r-')
plt.fill_between(x, y_pred - std_err*2, y_pred + std_err*2, alpha=0.2)
plt.show()--OUTPUT--[A scatter plot with data points, a red regression line, and a light red shaded region representing the confidence interval]

This approach uses SciPy to visualize the uncertainty in your regression model. It's a powerful way to show how much the predicted values might vary from the actual data.

  • The stats.linregress() function is the workhorse here. It returns several statistical values, including the standard error (std_err), which quantifies the model's prediction error.
  • You then use plt.fill_between() to draw a shaded region around the regression line. This band, defined by the standard error, visually represents the confidence interval for your predictions.

Creating interactive regression plots with plotly

import plotly.express as px
import pandas as pd
import numpy as np

# Create sample data
np.random.seed(42)
x = np.arange(1, 101)
y = 2*x + 10*np.random.randn(100)
df = pd.DataFrame({'x': x, 'y': y})

# Create interactive regression plot
fig = px.scatter(df, x='x', y='y', trendline='ols',
               trendline_color_override='red')
fig.update_layout(title='Interactive Linear Regression')
fig.show()--OUTPUT--[An interactive scatter plot with data points and a red regression line, with hover capabilities showing point values]

Plotly Express makes creating interactive visualizations incredibly straightforward. The px.scatter() function handles both the scatter plot and the regression line in a single command, which is a significant time saver.

  • The key is the trendline='ols' argument. It tells Plotly to automatically compute and draw an Ordinary Least Squares regression line.
  • The resulting plot is fully interactive. You can hover over data points to see their values, making it perfect for data exploration.

Move faster with Replit

Replit is an AI-powered development platform that transforms natural language into working applications. Describe what you want to build, and Replit Agent creates it—complete with databases, APIs, and deployment.

For the regression techniques covered in this article, Replit Agent can turn them into production-ready tools.

  • Build a financial forecasting tool that uses linear regression to predict stock prices from historical data and visualizes the trendline.
  • Create a sales dashboard that plots monthly revenue and displays a regression line with confidence intervals to project future growth.
  • Deploy a scientific analysis utility that generates interactive 3D regression plots for researchers to explore relationships between multiple experimental variables.

You can turn any of these concepts into a working application. Try Replit Agent by describing your idea, and it will write, test, and deploy the code for you.

Common errors and challenges

Plotting regression models can be tricky, but most errors have straightforward fixes you can master quickly.

Dealing with NaN values in regression data

Missing data, often represented as NaN (Not a Number), can stop a regression analysis in its tracks. Functions like np.polyfit() or LinearRegression().fit() can't operate on incomplete datasets, which will usually raise an error.

  • You can remove rows with missing values using the dropna() method in pandas.
  • Alternatively, you could fill them with a calculated value, such as the column’s mean or median, using fillna().

Correcting array shapes for sklearn regression models

When using scikit-learn, you might encounter a ValueError because your data has the wrong shape. The library’s models expect the feature data, X, to be a 2D array, even if you only have one feature. A 1D array or pandas Series won't work on its own.

The fix is to reshape your data. Calling .reshape(-1, 1) on your feature array converts it into a single-column 2D array, satisfying scikit-learn's input requirements and allowing the model to train correctly.

Fixing axis limits for proper regression visualization

Sometimes your plot's axes might not adjust properly, cutting off data points or extending the regression line awkwardly beyond your data's range. This can make the visualization confusing or misleading. You can take control by setting the axis boundaries yourself.

Use matplotlib functions like plt.xlim() and plt.ylim() after creating your plot. This lets you define the exact visual window, ensuring your data and trendline are framed clearly and effectively.

Dealing with NaN values in regression data

Missing data, or NaN values, are a common roadblock in regression analysis. Most libraries can't perform calculations on incomplete datasets, which will cause the code to fail. The example below shows what happens when np.polyfit() encounters NaN values.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Dataset with missing values
data = pd.DataFrame({
   'x': [1, 2, np.nan, 4, 5],
   'y': [2, np.nan, 5, 6.2, 7.5]
})

# Will fail with missing values
plt.scatter(data['x'], data['y'])
m, b = np.polyfit(data['x'], data['y'], 1)
plt.plot(data['x'], m*data['x'] + b, color='red')
plt.show()

The calculation fails because np.polyfit() receives columns containing np.nan values, which are mathematically undefined. The corrected code below demonstrates how to prepare the data before plotting.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Dataset with missing values
data = pd.DataFrame({
   'x': [1, 2, np.nan, 4, 5],
   'y': [2, np.nan, 5, 6.2, 7.5]
})

# Fix: drop missing values before plotting
clean_data = data.dropna()
plt.scatter(clean_data['x'], clean_data['y'])
m, b = np.polyfit(clean_data['x'], clean_data['y'], 1)
plt.plot(clean_data['x'], m*clean_data['x'] + b, color='red')
plt.show()

The fix is to clean the data before analysis. By calling data.dropna(), you create a new DataFrame that excludes any rows with missing values. This clean dataset can then be used by np.polyfit() without causing an error.

  • Always check for and handle NaN values before performing calculations, especially when working with data from external sources, as it's a common source of errors.

Correcting array shapes for sklearn regression models

You'll often hit a ValueError with scikit-learn if your data isn't shaped correctly. The library expects a 2D array for your features, but it's easy to accidentally pass a 1D array. The following code demonstrates this common mistake.

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Incorrect shape for sklearn
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3.5, 5, 6.2, 7.5])

# This will raise an error
model = LinearRegression()
model.fit(x, y)  # x needs to be 2D
plt.scatter(x, y)
plt.plot(x, model.predict(x), color='red')
plt.show()

The error occurs because the model.fit() method receives the x data as a simple, one-dimensional array. It's expecting a columnar format instead. The following code demonstrates the necessary adjustment.

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Correcting shape for sklearn
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3.5, 5, 6.2, 7.5])

# Fix: reshape x to be 2D
x_2d = x.reshape(-1, 1)
model = LinearRegression()
model.fit(x_2d, y)
plt.scatter(x, y)
plt.plot(x, model.predict(x_2d), color='red')
plt.show()

The fix is to reshape your feature data before passing it to the fit() method. scikit-learn requires a 2D array for features, even when you only have one. This is because the library is designed to handle multiple features by default.

  • The code x.reshape(-1, 1) converts your 1D array into a 2D array with a single column.

This simple change aligns the data with the library's input requirements, allowing the model to train without error.

Fixing axis limits for proper regression visualization

By default, a regression line in matplotlib only spans the range of your data points. This can make the trend look abrupt or incomplete. The code below illustrates this common visualization issue, where the line stops short at the first and last points.

import numpy as np
import matplotlib.pyplot as plt

# Data points
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3.5, 5, 6.2, 7.5])

# Linear regression
m, b = np.polyfit(x, y, 1)
plt.scatter(x, y)
plt.plot(x, m*x + b, color='red')
# Line only spans the x range of data points
plt.show()

The issue arises because plt.plot() is only given the original x values to draw upon. As a result, the line doesn't extend beyond your data's boundaries. The following code shows how to correct this.

import numpy as np
import matplotlib.pyplot as plt

# Data points
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3.5, 5, 6.2, 7.5])

# Linear regression
m, b = np.polyfit(x, y, 1)
plt.scatter(x, y)

# Fix: extend the line beyond data points
x_line = np.array([0, 6])  # Extended range
plt.plot(x_line, m*x_line + b, color='red')
plt.xlim(0, 6)  # Set explicit axis limits
plt.show()

The fix is to manually extend the line's range. Instead of plotting with your original x values, you create a new array, x_line, that spans a wider area. This new array is then used to draw the regression line with plt.plot().

  • To ensure the entire line is visible, you can adjust the plot's boundaries using plt.xlim().
  • This makes your trendline look more complete and is helpful for visualizing extrapolations beyond your dataset.

Real-world applications

With these common errors solved, you can apply regression plotting to practical scenarios like real estate analysis and model diagnostics.

Using numpy to predict real estate prices

This practical example shows how to model the relationship between house size and price, allowing you to visualize and predict property values with a simple regression line.

import numpy as np
import matplotlib.pyplot as plt

sizes = np.array([750, 850, 950, 1050, 1150, 1250])
prices = np.array([150, 170, 195, 215, 235, 260])
m, b = np.polyfit(sizes, prices, 1)
plt.scatter(sizes, prices)
plt.plot(sizes, m*sizes + b, 'r-')
plt.xlabel('House Size (sq ft)')
plt.ylabel('Price ($1000s)')
plt.show()

This snippet uses NumPy to perform the core math for a linear regression. After defining the sizes and prices arrays, it calls np.polyfit() to compute the slope (m) and intercept (b) of the trendline. Matplotlib then handles the visualization, and labeling the axes with plt.xlabel() and plt.ylabel() makes the final plot easy to interpret.

  • The plt.scatter() function displays the original data as individual points.
  • The plt.plot() function overlays the calculated regression line, styled as a solid red line with 'r-'.

Creating residual plots to diagnose model fit

A residual plot visualizes the errors in your model’s predictions, which is a great way to diagnose how well the regression line fits your data.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y = np.array([2, 4, 5, 4, 6, 8, 7, 10, 11, 14]) + np.random.randn(10)
model = LinearRegression().fit(x.reshape(-1, 1), y)
residuals = y - model.predict(x.reshape(-1, 1))
plt.scatter(x, residuals)
plt.axhline(y=0, color='red')
plt.title('Residual Plot')
plt.show()

This code trains a LinearRegression model and calculates the residuals—the difference between the actual y values and the model's predictions. It then creates a scatter plot to visualize how these residuals are distributed.

  • The residuals are calculated by subtracting the output of model.predict() from the original y array.
  • plt.scatter() plots these residuals against the original x values.
  • A horizontal line at zero is added with plt.axhline(), representing where the model's prediction perfectly matches the actual data.

Get started with Replit

Turn these techniques into a real tool with Replit Agent. Describe what you want, like “a web app that predicts housing prices from square footage and plots the regression line,” or “a dashboard visualizing sales data with a trendline.”

The agent writes the code, tests for errors, and deploys the app, turning your prompt into a finished product. Start building with Replit.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.