How to normalize data in Python

Learn how to normalize data in Python with our guide. We cover different methods, tips, real-world applications, and how to debug errors.

Published on:

Fri

Feb 6, 2026

Updated on:

Mon

Apr 13, 2026

The Replit Team

ON THIS PAGE

Example H2

Data normalization is a key preprocessing step in machine learning. It scales numeric features to a common range, which improves model performance and ensures all variables contribute equally to the analysis.

In this article, we'll cover several normalization techniques, complete with practical examples. You'll also find valuable implementation tips, see real-world applications, and get debugging advice for common issues you might encounter.

Using simple `min-max` normalization

import numpy as np data = np.array([2, 5, 10, 12, 18]) # Min-Max scaling to range [0, 1] normalized_data = (data - np.min(data)) / (np.max(data) - np.min(data)) print("Original data:", data) print("Normalized data:", normalized_data)--OUTPUT--Original data: [ 2 5 10 12 18] Normalized data: [0. 0.1875 0.5 0.625 1. ]

This snippet demonstrates one of the most straightforward normalization methods. The core logic lies in the formula (data - np.min(data)) / (np.max(data) - np.min(data)), which rescales every number in your dataset to a value between 0 and 1.

The process is simple yet effective:

First, data - np.min(data) shifts all data points so the smallest value becomes zero.
Then, dividing by the range—calculated with np.max(data) - np.min(data)—scales everything down, making the largest value equal to 1.

Basic normalization techniques

Beyond the simple min-max scaler, you can also customize its output range or use other common approaches like z-score normalization and decimal scaling.

Using `min-max` scaling with custom range

import numpy as np data = np.array([2, 5, 10, 12, 18]) # Min-Max scaling to range [-1, 1] min_val, max_val = np.min(data), np.max(data) normalized_data = 2 * (data - min_val) / (max_val - min_val) - 1 print("Normalized to [-1, 1]:", normalized_data)--OUTPUT--Normalized to [-1, 1]: [-1. -0.625 0. 0.25 1. ]

You can also adapt min-max scaling for a custom range, like [-1, 1]. This is a common requirement for certain algorithms. The logic builds directly on the standard formula you saw earlier.

The core expression, (data - min_val) / (max_val - min_val), first scales your data to the [0, 1] range.
Multiplying the result by 2 expands this range to [0, 2].
Subtracting 1 then shifts the entire dataset, locking it into the final [-1, 1] range.

Using `z-score` normalization (standardization)

import numpy as np data = np.array([2, 5, 10, 12, 18]) # Z-score normalization normalized_data = (data - np.mean(data)) / np.std(data) print("Original data:", data) print("Z-score normalized:", normalized_data)--OUTPUT--Original data: [ 2 5 10 12 18] Z-score normalized: [-1.30384048 -0.84749831 0.06519218 0.37958772 1.70655889]

Z-score normalization, or standardization, rescales your data to have a mean of 0 and a standard deviation of 1. Unlike min-max scaling, it doesn't constrain your data to a fixed range. The resulting values represent how many standard deviations each data point is from the mean. If you need deeper understanding of the underlying statistics, learn more about calculating standard deviation in Python.

First, data - np.mean(data) centers the data around zero.
Then, dividing by the standard deviation with np.std(data) scales each value accordingly.

Using decimal scaling

import numpy as np data = np.array([200, 500, 1000, 1200, 1800]) # Decimal scaling j = int(np.ceil(np.log10(np.max(np.abs(data))))) normalized_data = data / 10**j print("Original data:", data) print("Decimal scaled:", normalized_data)--OUTPUT--Original data: [ 200 500 1000 1200 1800] Decimal scaled: [0.02 0.05 0.1 0.12 0.18]

Decimal scaling normalizes data by moving the decimal point. The goal is to find the smallest power of 10 that, when used as a divisor, scales all values to be within the range of -1 to 1. It's a straightforward method for handling numbers with varying orders of magnitude.

The code determines the scaling factor by finding the number of digits in the largest absolute value. It does this using np.log10 and np.ceil.
Each data point is then divided by 10 raised to this power, effectively shifting the decimal place to the left.

Advanced normalization techniques

While the basics cover a lot of ground, you'll sometimes need more robust or specialized methods to handle outliers and streamline your preprocessing workflow.

Using robust scaling with `median` and `IQR`

import numpy as np data = np.array([2, 5, 10, 12, 18, 100]) # With outlier # Robust scaling using median and IQR median = np.median(data) q1, q3 = np.percentile(data, [25, 75]) iqr = q3 - q1 robust_scaled = (data - median) / iqr print("Original data:", data) print("Robust scaled:", robust_scaled)--OUTPUT--Original data: [ 2 5 10 12 18 100] Robust scaled: [-0.71428571 -0.35714286 0. 0.14285714 0.57142857 6.42857143]

Robust scaling is your go-to when your data has outliers, like the 100 in the example. It's less affected by extreme values because it uses the median and Interquartile Range (IQR) instead of the mean and standard deviation, which can be easily skewed.

First, the data is centered by subtracting the median.
It's then scaled by dividing by the iqr, which represents the spread of the middle 50% of your data.

This method effectively minimizes the impact of outliers on your normalized dataset, giving you a more reliable result.

Using `scikit-learn` preprocessing modules

import numpy as np from sklearn.preprocessing import MinMaxScaler, StandardScaler data = np.array([[2, 200], [5, 500], [10, 1000], [12, 1200], [18, 1800]]) mm_scaler = MinMaxScaler() std_scaler = StandardScaler() print("MinMax scaled:\n", mm_scaler.fit_transform(data)) print("Standard scaled:\n", std_scaler.fit_transform(data))--OUTPUT--MinMax scaled: [[0. 0. ] [0.1875 0.1875 ] [0.5 0.5 ] [0.625 0.625 ] [1. 1. ]] Standard scaled: [[-1.30384048 -1.30384048] [-0.84749831 -0.84749831] [ 0.06519218 0.06519218] [ 0.37958772 0.37958772] [ 1.70655889 1.70655889]]

For a more streamlined approach, you can use scikit-learn's preprocessing module. It offers ready-to-use classes that handle the math for you, which is especially useful when working with multi-column datasets like the one in the example. This is part of the broader process of scaling data in Python.

MinMaxScaler applies the min-max logic you saw earlier, locking your data into the default [0, 1] range.
StandardScaler performs z-score normalization, giving your data a mean of 0 and a standard deviation of 1.

The fit_transform() method is a handy shortcut that first learns the scaling parameters from your data and then immediately applies the transformation in one go.

Creating custom normalization with mathematical functions

import numpy as np def normalize_with_function(data, func=np.tanh): return func(data / np.max(data)) data = np.array([2, 5, 10, 12, 18]) tanh_normalized = normalize_with_function(data) sigmoid_normalized = normalize_with_function(data, lambda x: 1/(1+np.exp(-x))) print("Tanh normalized:", tanh_normalized) print("Sigmoid normalized:", sigmoid_normalized)--OUTPUT--Tanh normalized: [0.10991413 0.2673152 0.49757422 0.56456214 0.74222773] Sigmoid normalized: [0.57444252 0.64565631 0.73105858 0.76159416 0.84553473]

You can also create your own normalization logic using mathematical functions. It's a flexible approach that lets you apply non-linear transformations that might better suit your data's distribution. The example defines a reusable function, normalize_with_function, to show how it works. For vector-specific applications, you might also need techniques for normalizing a vector in Python.

The function first scales the data by dividing each point by the maximum value.
It then applies a mathematical function—like the default np.tanh or a custom lambda for sigmoid—to transform the scaled data into its final form.

Move faster with Replit

Replit is an AI-powered development platform where you can start coding instantly. It comes with all Python dependencies pre-installed, so you can skip the setup and get straight to work.

Learning individual techniques is one thing, but building a complete application is another. This is where Agent 4 comes in. Instead of piecing together techniques, you can describe the app you want to build, and the Agent will take it from an idea to a working product. For example, you could ask the Agent to build:

A data normalization utility that lets you paste a list of numbers and get the min-max or z-score scaled results.
A feature comparison tool that normalizes different metrics, like price and user ratings, to a common scale for a fair comparison dashboard.
A data preparation script that uses robust scaling to clean a dataset with outliers before feeding it into an analytics model.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

Even with the right formulas, you can run into issues like division by zero or invalid values during normalization.

Handling division by zero in min-max normalization
This error occurs when all values in your dataset are identical, making the denominator in the formula—max(data) - min(data)—equal to zero. A simple check to see if the minimum and maximum values are the same before you normalize will prevent this.
Fixing NaN values in z-score normalization
You'll get NaN (Not a Number) values with z-score normalization if your data has no variability, meaning the standard deviation is zero. This also causes a division-by-zero error. Before you apply the formula, verify that the standard deviation is not zero.
Handling missing values before normalization
Normalization functions expect clean, numeric data and can't handle missing values. If your dataset contains NaNs, you'll need to address them before scaling. You can either remove the data points with missing values or fill them in using a strategy like imputation.

Handling division by zero in `min-max` normalization

A division-by-zero error is a classic snag you'll hit with min-max normalization. It happens when your dataset has no variation, meaning every value is identical. Since np.max(data) and np.min(data) are the same, the denominator becomes zero. The following code demonstrates this exact scenario, which results in NaN values because you can't divide by zero.

import numpy as np # All values are the same data = np.array([5, 5, 5, 5, 5]) # This will cause division by zero normalized_data = (data - np.min(data)) / (np.max(data) - np.min(data)) print(normalized_data)

The expression (data - np.min(data)) results in an array of zeros. Dividing this by the range, np.max(data) - np.min(data), which is also zero, produces NaN values. The following snippet demonstrates a safe way to proceed.

import numpy as np data = np.array([5, 5, 5, 5, 5]) # Check if max equals min to avoid division by zero if np.max(data) == np.min(data): normalized_data = np.zeros_like(data) else: normalized_data = (data - np.min(data)) / (np.max(data) - np.min(data)) print(normalized_data)

The solution is to add a simple check before normalizing. By comparing np.max(data) and np.min(data), you can catch cases where all values are identical. If they match, it means there's no variability, so you can safely return an array of zeros using np.zeros_like(data). Otherwise, you can proceed with the standard formula. This is a crucial guardrail when processing data where some features might be constant across all samples.

Fixing NaN values in `z-score` normalization

You'll encounter a similar division-by-zero issue with z-score normalization, resulting in NaN values. This occurs when all your data points are identical, making the standard deviation zero. The following code snippet shows exactly how this happens.

import numpy as np # All values are identical data = np.array([10, 10, 10, 10]) # Standard deviation is zero, leading to NaN values normalized_data = (data - np.mean(data)) / np.std(data) print(normalized_data)

The expression data - np.mean(data) produces an array of zeros. Dividing this by the standard deviation, which is also zero, results in NaN. The following snippet demonstrates a safe way to handle this.

import numpy as np data = np.array([10, 10, 10, 10]) # Check if std is zero to avoid NaN values std = np.std(data) if std == 0: normalized_data = np.zeros_like(data) else: normalized_data = (data - np.mean(data)) / std print(normalized_data)

The solution is to check the standard deviation before you normalize. If np.std(data) is zero, it means all your data points are identical, and the calculation isn't possible. The code handles this by returning an array of zeros with np.zeros_like(data). Otherwise, it safely proceeds with the z-score formula. This check is crucial when you're working with datasets where some features might be constant across all your samples.

Handling missing values before normalization

Normalization functions can't handle missing data, often represented as np.nan. Any calculation involving a np.nan value—like finding the minimum or maximum—will also result in np.nan, which poisons the entire output. The following code demonstrates this exact problem.

import numpy as np # Data with missing values data = np.array([2, 5, np.nan, 12, 18]) # This will result in NaN for all values normalized_data = (data - np.min(data)) / (np.max(data) - np.min(data)) print(normalized_data)

Since functions like np.min(data) and np.max(data) return nan when a missing value is present, the entire formula breaks down. The calculation becomes a division by nan, which poisons the whole array. The following snippet shows how to handle this.

import numpy as np # Data with missing values data = np.array([2, 5, np.nan, 12, 18]) # Fill missing values with mean before normalizing mean_value = np.nanmean(data) clean_data = np.nan_to_num(data, nan=mean_value) normalized_data = (clean_data - np.min(clean_data)) / (np.max(clean_data) - np.min(clean_data)) print(normalized_data)

To fix this, you must handle missing values before normalizing. The code demonstrates a common strategy: imputation. It uses np.nanmean() to calculate the average while ignoring np.nan values, then fills the gaps with that average using np.nan_to_num(). With a complete dataset, the normalization formula can run without errors. This is a crucial preprocessing step for any real-world data, which often contains missing entries that would otherwise break your calculations.

Real-world applications

These techniques and debugging skills are essential for real-world tasks, from normalizing image data for CNNs to analyzing financial time series.

Normalizing image data for CNN models

In computer vision, normalizing pixel values to a common range, like [0, 1], is a standard preprocessing step that helps neural networks train more effectively.

import numpy as np from PIL import Image # Simulating a grayscale image (like MNIST digit) image_array = np.random.randint(0, 256, size=(28, 28)) # Normalize pixel values to [0,1] range for neural networks normalized_image = image_array / 255.0 print(f"Original image shape: {image_array.shape}") print(f"Original value range: [{np.min(image_array)}, {np.max(image_array)}]") print(f"Normalized value range: [{np.min(normalized_image):.1f}, {np.max(normalized_image):.1f}]")

This snippet first simulates a grayscale image with a NumPy array where pixel values range from 0 to 255. The key transformation is image_array / 255.0, which performs element-wise division to scale every pixel value into a new range between 0.0 and 1.0.

The code uses np.random.randint() to create a 28x28 array, a size you'll often see in datasets like MNIST.
Dividing by the float 255.0 is intentional—it ensures the result is a floating-point number, which is necessary for the scaled decimal values.

Normalizing financial data for time series analysis

When working with financial time series, such as stock prices, normalization is essential for comparing different assets on a common scale.

import numpy as np import pandas as pd # Sample stock price data (simulated) dates = pd.date_range(start='2022-01-01', periods=10, freq='D') stock_prices = pd.Series([150.5, 152.3, 151.1, 153.7, 158.2, 157.3, 155.6, 160.1, 162.5, 159.8], index=dates) # Apply min-max normalization for comparing multiple stocks normalized_prices = (stock_prices - stock_prices.min()) / (stock_prices.max() - stock_prices.min()) print("Original stock prices:") print(stock_prices.head(3)) print("\nNormalized stock prices (0-1 scale):") print(normalized_prices.head(3))

This code creates a simulated time series of stock prices using a `pandas` `Series`. It then applies min-max normalization to rescale these prices into a common [0, 1] range, which is useful for comparing assets with different price levels.

The code uses the built-in .min() and .max() methods to find the lowest and highest prices in the dataset.
It then applies the formula to each price, effectively mapping the lowest price to 0 and the highest to 1.

This transformation allows you to analyze the relative movement of different stocks on an equal footing, regardless of their actual dollar values. When working with extensive financial data, you'll also need techniques for handling large datasets in Python.

Get started with Replit

Now, turn these techniques into a real tool with Replit Agent. Describe what you want, like “a data normalization utility that returns min-max and z-score scaled results” or “a dashboard that normalizes different metrics to a common scale.”

Replit Agent will write the code, test for errors, and deploy your application. Start building with Replit.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Get started free

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Get started for free

Follow @Replit