How to calculate RMSE in Python

Learn how to calculate RMSE in Python. This guide covers different methods, real-world applications, common errors, and debugging tips.

How to calculate RMSE in Python
Published on: 
Tue
Apr 21, 2026
Updated on: 
Tue
Apr 21, 2026
The Replit Team

Calculating Root Mean Square Error (RMSE) is a key step in evaluating prediction models. Python offers powerful, straightforward libraries for this task, making it simple to measure your model's accuracy.

In this article, you'll explore techniques for calculating RMSE in Python. You'll find practical tips, real-world applications, and debugging advice to help you implement this metric effectively in your projects.

Basic calculation of RMSE using NumPy

import numpy as np

actual = np.array([3, -0.5, 2, 7])
predicted = np.array([2.5, 0.0, 2, 8])
rmse = np.sqrt(np.mean((actual - predicted)**2))
print(f"RMSE: {rmse}")--OUTPUT--RMSE: 0.6123724356957945

NumPy's strength is its ability to perform vectorized operations, which lets you apply calculations across entire arrays at once. The core logic, np.sqrt(np.mean((actual - predicted)**2)), directly translates the RMSE formula into a single, efficient line of code.

Here’s how it breaks down the math:

  • First, (actual - predicted) calculates the error for each prediction.
  • The **2 operator squares these errors, ensuring all values are positive and giving more weight to larger mistakes.
  • np.mean() finds the average of these squared errors.
  • Finally, np.sqrt() takes the square root, converting the error metric back into the original units of your data.

Common implementations of RMSE

Building on the NumPy foundation, you can also calculate RMSE using popular data science libraries or even by defining your own custom functions.

Using sklearn.metrics for RMSE calculation

from sklearn.metrics import mean_squared_error
import numpy as np

actual = [3, -0.5, 2, 7]
predicted = [2.5, 0.0, 2, 8]
rmse = np.sqrt(mean_squared_error(actual, predicted))
print(f"RMSE: {rmse}")--OUTPUT--RMSE: 0.6123724356957945

Scikit-learn, a go-to library for machine learning, offers a more direct approach. The mean_squared_error function streamlines the process by handling the intermediate steps for you.

  • It automatically calculates the squared differences between your actual and predicted values and finds their average.
  • Because the function returns the Mean Squared Error (MSE), you still need to take the square root with np.sqrt() to get the final RMSE.

This method is often preferred for its readability and easy integration into larger machine learning workflows.

Creating a custom RMSE function

import numpy as np

def calculate_rmse(actual, predicted):
"""Calculate Root Mean Squared Error between two arrays"""
mse = np.mean(np.square(np.array(actual) - np.array(predicted)))
return np.sqrt(mse)

print(f"RMSE: {calculate_rmse([3, -0.5, 2, 7], [2.5, 0.0, 2, 8])}")--OUTPUT--RMSE: 0.6123724356957945

Defining a custom calculate_rmse function packages the logic into a reusable and readable block. This is especially useful when you need to calculate RMSE multiple times throughout a project.

  • The function accepts actual and predicted values as arguments.
  • Inside, it uses NumPy to perform the core calculation—finding the mean of the squared errors.
  • Finally, it returns the square root of the result, completing the RMSE calculation.

This approach keeps your main script clean and makes your calculations easy to call on demand.

Calculating RMSE with pandas

import pandas as pd
import numpy as np

df = pd.DataFrame({
'actual': [3, -0.5, 2, 7],
'predicted': [2.5, 0.0, 2, 8]
})
df['squared_error'] = (df['actual'] - df['predicted'])**2
rmse = np.sqrt(df['squared_error'].mean())
print(f"RMSE: {rmse}")--OUTPUT--RMSE: 0.6123724356957945

When your data lives in a pandas DataFrame, you can calculate RMSE in a way that’s easy to follow. This approach breaks the formula down into visible, intermediate steps within the DataFrame itself, which is great for debugging.

  • First, a new squared_error column is created by calculating (df['actual'] - df['predicted'])**2 for each row.
  • Next, you find the average of this new column using the built-in .mean() method.
  • Finally, you take the square root of that average with np.sqrt() to get the final RMSE value.

Advanced techniques and optimizations

With the foundational methods covered, you can now tackle more complex scenarios, such as evaluating multiple models, using deep learning frameworks, or implementing weighted RMSE.

Computing RMSE for multiple models

import numpy as np

actual = np.array([3, -0.5, 2, 7])
predictions = np.array([
[2.5, 0.0, 2, 8], # Model 1
[2.7, -0.3, 2.1, 7.5], # Model 2
[3.2, -0.6, 1.9, 6.8] # Model 3
])

rmse_values = np.sqrt(np.mean((predictions - actual)**2, axis=1))
for i, rmse in enumerate(rmse_values, 1):
print(f"RMSE for Model {i}: {rmse}")--OUTPUT--RMSE for Model 1: 0.6123724356957945
RMSE for Model 2: 0.2738612787525831
RMSE for Model 3: 0.29154759474226504

When comparing multiple models, NumPy's broadcasting feature is a huge time saver. It automatically subtracts the single actual array from each row in your predictions array, creating an array of errors for all models in one step.

  • The key is the axis=1 argument in the np.mean() function. It instructs NumPy to calculate the mean squared error independently for each row, giving you a separate result for each model.

This leaves you with an array of RMSE values, making it simple to see which model performed best.

Using PyTorch for RMSE calculation

import torch

actual = torch.tensor([3.0, -0.5, 2.0, 7.0])
predicted = torch.tensor([2.5, 0.0, 2.0, 8.0])

mse = torch.mean((actual - predicted)**2)
rmse = torch.sqrt(mse)
print(f"RMSE: {rmse.item()}")--OUTPUT--RMSE: 0.6123724579811096

If you're working with deep learning models, calculating RMSE in PyTorch feels very familiar. The process mirrors the NumPy approach but uses PyTorch's native data structure, the torch.tensor.

  • The calculation uses torch.mean() and torch.sqrt(), which function just like their NumPy equivalents.
  • To get the final number from the resulting tensor, you call the .item() method. This is a common step when you need to extract a single value from a PyTorch tensor for use in other parts of your code.

Implementing weighted RMSE

import numpy as np

actual = np.array([3, -0.5, 2, 7])
predicted = np.array([2.5, 0.0, 2, 8])
# Assign higher weights to larger errors
weights = np.array([1, 1, 1, 2]) # Last prediction is more important

squared_errors = (actual - predicted)**2
weighted_mse = np.sum(weights * squared_errors) / np.sum(weights)
weighted_rmse = np.sqrt(weighted_mse)
print(f"Weighted RMSE: {weighted_rmse}")--OUTPUT--Weighted RMSE: 0.7071067811865475

Sometimes, not all prediction errors are created equal. Weighted RMSE is useful when you need to penalize certain mistakes more heavily. By creating a weights array, you can assign a higher importance to specific data points. In this example, the last prediction is treated as twice as important as the others.

  • The calculation multiplies each squared error by its corresponding weight before summing them up.
  • This result is then divided by the sum of all weights—np.sum(weights)—to get the weighted mean.
  • Taking the square root gives you the final weighted RMSE.

Move faster with Replit

Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly. This lets you move from learning individual techniques to building complete applications with tools like Agent 4.

Instead of piecing together functions, you can describe the entire application you want to build, and Agent 4 will take it from idea to working product:

  • A performance dashboard that calculates and visualizes the RMSE for your machine learning models in real time.
  • A model evaluation utility that compares prediction sets from different algorithms and ranks them by their accuracy.
  • A financial forecasting tool that uses weighted RMSE to more heavily penalize larger prediction errors on critical data points.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

Even with powerful libraries, you might run into a few common pitfalls when calculating RMSE in Python.

If your data contains missing values, represented as NaN, the standard np.mean() function will return NaN and break your calculation. The easiest fix is to use np.nanmean() instead. This function is designed to ignore missing values, allowing you to compute the mean from the available data points without any extra cleanup steps.

A ValueError is a classic sign that your actual and predicted arrays don't match. This error typically occurs when their shapes or lengths are different, making element-wise subtraction impossible. Before calculating the error, always confirm that both arrays have the same dimensions by checking their .shape attribute.

When working with extremely small error values, you can run into floating-point precision issues. The default float64 data type might not be accurate enough, causing tiny errors to be rounded incorrectly and affecting your final RMSE. For high-precision tasks, consider converting your arrays to a more precise type, such as np.float128, to ensure your calculations are as accurate as possible.

Handling missing values in np.mean() calculation

Missing data, often represented as NaN (Not a Number), can silently break your RMSE calculation. When NumPy's standard np.mean() function encounters even a single NaN value, it causes the entire operation to result in NaN. See what happens in the code below.

import numpy as np

actual = np.array([3, -0.5, 2, np.nan, 7])
predicted = np.array([2.5, 0.0, 2, 8, 7.5])

rmse = np.sqrt(np.mean((actual - predicted)**2))
print(f"RMSE: {rmse}")

Because one of the actual values is np.nan, the element-wise subtraction (actual - predicted) produces a NaN. This single invalid value propagates through the entire calculation. See the correct way to handle this below.

import numpy as np

actual = np.array([3, -0.5, 2, np.nan, 7])
predicted = np.array([2.5, 0.0, 2, 8, 7.5])

mask = ~np.isnan(actual) & ~np.isnan(predicted)
rmse = np.sqrt(np.mean((actual[mask] - predicted[mask])**2))
print(f"RMSE: {rmse}")

The solution is to filter out NaN values before the calculation. A boolean mask, created using ~np.isnan(), identifies all valid numbers in both arrays. Applying this mask ensures you only compare corresponding, non-missing data points. This prevents NaN from corrupting the result and gives you an accurate RMSE based on the available data. It's a crucial step when working with real-world datasets, which often contain missing entries.

Fixing array dimension mismatch in RMSE calculation

A ValueError often signals that your actual and predicted arrays have different lengths. Because RMSE calculation relies on element-wise subtraction, the operation fails if the arrays don't match. The code below demonstrates what happens when their dimensions are mismatched.

import numpy as np

actual = np.array([3, -0.5, 2, 7])
predicted = np.array([2.5, 0.0, 2])

rmse = np.sqrt(np.mean((actual - predicted)**2))
print(f"RMSE: {rmse}")

The ValueError occurs because the actual array has four elements while the predicted array has three. This size difference makes the element-wise subtraction impossible. The code below shows how to address this issue.

import numpy as np

actual = np.array([3, -0.5, 2, 7])
predicted = np.array([2.5, 0.0, 2])

min_length = min(len(actual), len(predicted))
rmse = np.sqrt(np.mean((actual[:min_length] - predicted[:min_length])**2))
print(f"RMSE with truncated arrays: {rmse}")

The fix is to align the arrays by finding the shortest length with min(len(actual), len(predicted)). Slicing both arrays to this min_length ensures they have identical dimensions, allowing the element-wise subtraction to work.

You'll often encounter this ValueError when your datasets come from different sources or undergo separate processing steps. It’s a good practice to always verify array shapes before performing calculations to prevent unexpected errors.

Avoiding precision issues with very small values

Standard floating-point math can sometimes struggle with very small numbers, leading to precision loss that affects your RMSE. These subtle rounding errors might seem insignificant, but they can compromise your model's evaluation. See how this plays out in the code below.

import numpy as np

actual = np.array([1e-10, 2e-10, 3e-10, 4e-10])
predicted = np.array([1.1e-10, 2.1e-10, 3.1e-10, 4.1e-10])

rmse = np.sqrt(np.mean((actual - predicted)**2))
print(f"RMSE: {rmse}")

The squared differences are so minuscule that they fall below the precision limit of the default data type, causing rounding errors. The following code demonstrates a straightforward way to preserve accuracy and get a non-zero result.

import numpy as np

actual = np.array([1e-10, 2e-10, 3e-10, 4e-10])
predicted = np.array([1.1e-10, 2.1e-10, 3.1e-10, 4.1e-10])

scale_factor = 1e10
scaled_actual = actual * scale_factor
scaled_predicted = predicted * scale_factor

rmse = np.sqrt(np.mean((scaled_actual - scaled_predicted)**2)) / scale_factor
print(f"RMSE: {rmse}")

The solution is to temporarily scale your data. Multiplying both arrays by a scale_factor brings the tiny values into a more stable numerical range, preventing rounding errors during the calculation. Once you compute the RMSE on this scaled data, you divide the final result by the scale_factor to revert it to the original scale. Keep an eye out for this issue when your model predicts values that are extremely close to zero.

Real-world applications

With the mechanics of RMSE calculation covered, you can now see its practical value in real-world weather and time series forecasting.

Evaluating a weather prediction model with rmse

Calculating the RMSE for a weather model gives you a simple way to see, on average, how many degrees off your temperature predictions were.

import numpy as np
import matplotlib.pyplot as plt

# Actual and predicted temperatures (°C) for a week
actual_temps = np.array([15, 17, 16, 18, 21, 19, 14])
predicted_temps = np.array([14, 16, 17, 19, 20, 18, 15])

# Calculate RMSE
rmse = np.sqrt(np.mean((actual_temps - predicted_temps)**2))
print(f"Weather prediction RMSE: {rmse}°C")

This example applies the RMSE formula to a practical dataset. The code sets up two NumPy arrays, actual_temps and predicted_temps, to hold a week's worth of temperature data.

  • The calculation directly compares these two arrays to find the error for each day.
  • It then squares, averages, and takes the square root of these errors in a single line.

This process condenses the model's performance across the entire week into a single, quantifiable error metric, which is then printed.

Using rmse for time series forecast evaluation

A rolling RMSE is particularly useful for time series data, as it shows how your model's prediction accuracy changes over specific time windows.

import numpy as np
import pandas as pd

# Generate sample time series data (e.g., daily sales)
dates = pd.date_range('2023-01-01', periods=10)
actual_sales = np.array([120, 132, 145, 135, 140, 150, 142, 155, 160, 165])
predicted_sales = np.array([118, 130, 141, 138, 143, 147, 145, 150, 158, 162])

# Calculate rolling RMSE with a 3-day window
window_size = 3
rolling_rmse = []

for i in range(len(actual_sales) - window_size + 1):
window_actual = actual_sales[i:i+window_size]
window_pred = predicted_sales[i:i+window_size]
window_rmse = np.sqrt(np.mean((window_actual - window_pred)**2))
rolling_rmse.append(window_rmse)

print(f"Rolling RMSE values: {rolling_rmse}")
print(f"Average rolling RMSE: {np.mean(rolling_rmse)}")

This code calculates RMSE over a sliding window. A for loop moves through the actual_sales and predicted_sales arrays. In each step, it slices both arrays to create a three-day "window" of data using a specified window_size.

  • The RMSE is calculated only for the data within this current window.
  • Each result is appended to the rolling_rmse list.

This process repeats until it has moved across the entire dataset, creating a list of RMSE values—one for each window. The final output shows this list and its average.

Get started with Replit

Now, turn your knowledge into a real tool. Tell Replit Agent to build "a web app that calculates rmse from two CSV files" or "a dashboard that visualizes rolling rmse for a stock price model."

Replit Agent writes the code, tests for errors, and deploys your app from a single prompt. Start building with Replit.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.