How to plot a histogram in Python

Learn how to plot histograms in Python using various methods. Discover tips, real-world applications, and how to debug common errors.

How to plot a histogram in Python
Published on: 
Thu
Feb 12, 2026
Updated on: 
Mon
Apr 13, 2026
The Replit Team

A histogram is a powerful tool for data visualization in Python. It provides a clear picture of your data's distribution, so you can quickly spot patterns, outliers, and frequency.

In this article, you'll learn several techniques to create histograms. You'll also get practical tips for customization, explore real-world applications, and receive clear advice to debug common issues.

Basic histogram with matplotlib

import matplotlib.pyplot as plt
import numpy as np

data = np.random.randn(1000) # Generate random data
plt.hist(data, bins=30)
plt.title('Simple Histogram')
plt.show()--OUTPUT--[No textual output - displays a histogram of normally distributed data with 30 bins]

The core of this example is the plt.hist() function, which takes your data and visualizes its distribution. For this demonstration, we're using np.random.randn(1000) for generating random numbers in Python to create a sample dataset of 1000 points from a normal distribution.

The key parameter here is bins=30. This tells Matplotlib to:

  • Divide the entire range of your data into 30 equal intervals, or "bins."
  • Count how many data points fall into each bin.
  • Draw a bar for each bin with a height proportional to its count.

Choosing the right number of bins is crucial—it can significantly change how you interpret the data's shape.

Customizing and enhancing histograms

Fine-tuning your histogram's appearance and using other libraries like NumPy and pandas gives you even greater control over how your data's story is told.

Customizing histogram appearance

import matplotlib.pyplot as plt
import numpy as np

data = np.random.randn(1000)
plt.hist(data, bins=20, color='skyblue', edgecolor='black', alpha=0.7)
plt.title('Customized Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')--OUTPUT--Text(0, 0.5, 'Frequency')

You can easily improve your histogram’s readability by adding a few parameters. This example refines the plot by making the bars a distinct color and adding labels for clarity.

  • The color='skyblue' and edgecolor='black' arguments set the fill and border colors for the bars.
  • alpha=0.7 adjusts the bar's transparency. This is especially useful when you need to overlay multiple histograms.

Finally, plt.xlabel() and plt.ylabel() add descriptive labels to the x and y axes, making your chart much easier to understand at a glance.

Using numpy for histogram calculations

import numpy as np
import matplotlib.pyplot as plt

data = np.random.normal(170, 10, 250) # Height data (mean=170, std=10)
counts, bins = np.histogram(data, bins=15)
plt.stairs(counts, bins)
plt.xlabel('Height (cm)')
plt.ylabel('Count')--OUTPUT--Text(0, 0.5, 'Count')

For more control, you can separate the histogram calculation from the plotting. NumPy's np.histogram() function does the heavy lifting by processing your data without creating a visual. It returns two arrays:

  • counts: The number of data points that fall into each bin.
  • bins: The edges that define each interval.

You can then pass these arrays to a plotting function like plt.stairs() to create a step-style histogram. This method gives you direct access to the underlying data, which is great for custom visualizations or further analysis.

Creating histograms with pandas

import pandas as pd
import numpy as np

df = pd.DataFrame({'values': np.random.normal(0, 1, 1000)})
hist = df.hist(column='values', bins=25, grid=False, figsize=(8, 6))--OUTPUT--array([[<Axes: title={'center': 'values'}>]], dtype=object)

If your data is already in a pandas DataFrame, you can plot a histogram directly using the .hist() method. This is a convenient shortcut that leverages pandas' built-in plotting capabilities, which are based on Matplotlib. It streamlines the process by letting you specify plotting details directly in the function call.

  • column='values' selects the specific DataFrame column to visualize.
  • bins=25 divides the data into 25 intervals.
  • grid=False removes the default background grid for a cleaner look.
  • figsize=(8, 6) sets the dimensions of the plot in inches.

Advanced histogram techniques

Once you're comfortable with the basics, you can elevate your plots with advanced libraries for statistical depth, interactivity, and side-by-side data comparisons.

Statistical visualization with seaborn

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

data = np.random.normal(0, 1, 1000)
sns.histplot(data, kde=True, stat="density", linewidth=0)
plt.title('Histogram with Kernel Density Estimate')--OUTPUT--Text(0.5, 1.0, 'Histogram with Kernel Density Estimate')

Seaborn is a library built on Matplotlib that excels at statistical visualization. The sns.histplot() function creates a histogram with added statistical features, giving you a deeper look into your data's distribution.

  • Setting kde=True overlays a Kernel Density Estimate (KDE). This smooth line helps you visualize the underlying probability distribution of your data.
  • The stat="density" argument normalizes the bar heights to show probability density instead of raw counts. This makes the histogram directly comparable to the KDE curve.

Interactive histograms with plotly

import plotly.express as px
import numpy as np

data = np.random.normal(0, 1, 1000)
fig = px.histogram(data, nbins=30, marginal="box",
title="Interactive Histogram with Box Plot")
fig.show()--OUTPUT--[No textual output - displays an interactive histogram with a box plot]

Plotly Express lets you create fully interactive charts with minimal code. Unlike static plots, the output from px.histogram() allows you to hover over bars for exact counts, zoom in on specific regions, and pan across the data. This makes data exploration much more dynamic, and you can also learn about saving plots in Python for future use.

  • The marginal="box" argument is a standout feature. It adds a compact box plot above your histogram, giving you an instant statistical summary of the data's quartiles, median, and outliers.
  • nbins=30 sets the number of intervals for the data, similar to the bins parameter in other libraries.

Multiple histograms for comparison

import matplotlib.pyplot as plt
import numpy as np

data1 = np.random.normal(0, 1, 1000)
data2 = np.random.normal(3, 1.5, 1000)
plt.hist([data1, data2], bins=30, alpha=0.7, label=['Distribution 1', 'Distribution 2'])
plt.legend()
plt.title('Comparing Multiple Distributions')--OUTPUT--Text(0.5, 1.0, 'Comparing Multiple Distributions')

You can plot multiple datasets on a single histogram to easily compare their distributions. By passing a list of datasets—like [data1, data2]—directly to plt.hist(), Matplotlib automatically overlays them on the same axes.

  • The label parameter assigns a name to each dataset, which is then displayed by calling plt.legend().
  • Using alpha transparency is essential here, as it lets you see where the distributions overlap.

This technique is perfect for quickly spotting differences in the center, spread, and shape between two or more groups of data.

Move faster with Replit

Replit is an AI-powered development platform where you can go from learning techniques to building applications instantly. It comes with all Python dependencies pre-installed, so you can skip the setup and focus on coding.

Instead of manually piecing together functions like plt.hist(), you can use Agent 4 to build a complete application from a description. Describe the tool you want to create, and the Agent handles the code, databases, and deployment.

  • A dashboard that visualizes website traffic distribution, helping you identify peak hours.
  • An A/B test analysis tool that plots histograms of conversion rates for two different user groups to compare their performance.
  • A data quality checker that generates distribution plots for incoming data to spot outliers or unexpected patterns.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

Even with the best tools, you can run into a few common roadblocks when plotting histograms in Python; here’s how to fix them.

Fixing incorrect data types for plt.hist()

The plt.hist() function is designed to work with numerical data. If you pass it a list of strings or other non-numeric types, you'll get a TypeError because the function can't perform the necessary mathematical operations. The fix is straightforward: always check your data types before plotting. Make sure to convert your data into numbers, like integers or floats, to ensure plt.hist() can process it correctly.

Resolving overlapping histograms with proper alpha settings

When you overlay multiple histograms to compare distributions, one plot can easily hide another if they aren't transparent. This makes it impossible to see where the datasets overlap or differ. You can solve this by setting the alpha parameter to a value between 0 and 1. A setting like alpha=0.7 makes the bars semi-transparent, so you can clearly see both distributions at once.

Troubleshooting missing histogram when using incorrect bin values

Sometimes your plot might appear empty, or the histogram doesn't show up at all. This often happens when the bins you've manually defined don't actually cover the range of your data. If your data points fall outside your specified bin edges, they won't be counted or displayed. The easiest solution is to let Matplotlib handle the bins automatically or carefully verify that your custom bin range encompasses all your data points.

Fixing incorrect data types for plt.hist()

A common pitfall is feeding plt.hist() non-numeric data, like a list of strings. The function requires numbers to calculate bin ranges and frequencies, so it will raise a TypeError if it gets anything else. See what happens in this example.

import matplotlib.pyplot as plt

# Dataset with mixed types
data = ['1', '2', '3', '4', '5', '3', '2', '1', '2', '3']
plt.hist(data)
plt.title('Histogram of String Data')
plt.show()

The data list contains strings because each number is wrapped in quotes. Since plt.hist() can't perform mathematical operations on text, it fails. The corrected code below shows how to properly format the data before plotting.

import matplotlib.pyplot as plt

# Convert string data to numeric
data = ['1', '2', '3', '4', '5', '3', '2', '1', '2', '3']
numeric_data = [int(x) for x in data]
plt.hist(numeric_data)
plt.title('Histogram of Numeric Data')
plt.show()

The fix is to convert each string to a number before plotting. The corrected code uses a list comprehension, [int(x) for x in data], to build a new list of integers. This allows plt.hist() to perform its calculations correctly.

This is a common issue when you're working with data imported from external sources like CSV files, which often read numbers as text when reading CSV files in Python. Always verify your data types before passing them to a plotting function.

Resolving overlapping histograms with proper alpha settings

When you plot multiple histograms to compare them, one can easily hide the other if they're not transparent. This makes it impossible to see where the distributions overlap. The code below shows what happens when you forget to set the alpha parameter.

import matplotlib.pyplot as plt
import numpy as np

data1 = np.random.normal(0, 1, 1000)
data2 = np.random.normal(2, 1, 1000)

plt.hist(data1, bins=30, color='blue')
plt.hist(data2, bins=30, color='red')
plt.title('Two Distributions')
plt.show()

The second plt.hist() call draws opaque red bars directly over the blue ones, completely obscuring the first dataset and making a visual comparison impossible. The corrected code below shows how to fix this.

import matplotlib.pyplot as plt
import numpy as np

data1 = np.random.normal(0, 1, 1000)
data2 = np.random.normal(2, 1, 1000)

plt.hist(data1, bins=30, color='blue', alpha=0.5)
plt.hist(data2, bins=30, color='red', alpha=0.5)
plt.title('Two Distributions')
plt.show()

The fix is simple: add the alpha parameter to both plt.hist() calls. Setting alpha=0.5 makes the bars semi-transparent, so you can see where the two distributions overlap. It's crucial for comparing datasets visually, as it prevents one histogram from completely hiding another. Without transparency, you lose valuable information about how the data groups relate to each other. This is a must-do step whenever you're overlaying plots.

Troubleshooting missing histogram when using incorrect bin values

It's a common snag: you run your code, but the histogram is nowhere to be found. This often happens when the bins parameter is set to an invalid value, like zero, leaving Matplotlib unable to group your data. The code below shows this in action.

import matplotlib.pyplot as plt
import numpy as np

data = np.random.normal(0, 1, 100)
plt.hist(data, bins=0) # Invalid bin value
plt.title('Histogram with Invalid Bins')
plt.show()

The code fails because bins=0 instructs Matplotlib to create zero intervals. With no bins available to group the data, the function cannot calculate frequencies or draw bars, which causes an error. The corrected code below shows how to fix this.

import matplotlib.pyplot as plt
import numpy as np

data = np.random.normal(0, 1, 100)
plt.hist(data, bins=10) # Using a valid bin value
plt.title('Histogram with Valid Bins')
plt.show()

The fix works by providing a valid number of intervals, like bins=10. The original code failed because bins=0 is an impossible instruction—you can't sort data into zero containers, so Matplotlib can't draw anything. This problem also occurs if your custom bin ranges don't cover your data's scope. Always ensure your bins value is a positive integer or that your custom ranges correctly encompass your dataset to avoid an empty plot.

Real-world applications

With the common errors out of the way, you can now apply these techniques to practical data analysis challenges.

Analyzing exam scores with plt.hist()

Using plt.hist(), you can visualize the distribution of exam scores to quickly identify performance patterns, such as whether students are clustered into distinct groups.

import matplotlib.pyplot as plt
import numpy as np

# Simulated exam scores with two performance groups
scores = np.concatenate([np.random.normal(65, 10, 30), np.random.normal(85, 8, 70)])

plt.hist(scores, bins=10, color='green', alpha=0.7)
plt.axvline(scores.mean(), color='red', linestyle='dashed')
plt.title('Distribution of Exam Scores')
plt.xlabel('Score')
plt.ylabel('Number of Students')

This example creates a more complex dataset by merging two arrays of simulated scores using np.concatenate. The histogram then visualizes the combined data. The most important new feature is the plt.axvline() function, which adds a vertical line to the plot.

  • This line is positioned at the overall average score, calculated with scores.mean().
  • It's styled as a dashed red line using the linestyle parameter, making it easy to compare the mean to the rest of the data distribution.

Detecting outliers with the np.std() threshold

A histogram becomes a powerful tool for outlier detection when you add a statistical threshold based on the standard deviation, which involves calculating standard deviation in Python using np.std().

import matplotlib.pyplot as plt
import numpy as np

# Generate data with outliers
data = np.concatenate([np.random.normal(0, 1, 500), np.random.uniform(5, 10, 10)])

plt.hist(data, bins=30)
plt.title('Dataset with Outliers')
plt.axvline(np.mean(data) + 2*np.std(data), color='red', linestyle='dashed')

This code demonstrates how to visually identify outliers. It uses np.concatenate to merge two arrays: a large set of normally distributed data and a smaller group of distant points, which act as outliers. The histogram clearly shows the main data cluster and the few values far to the right.

The key feature is the vertical line added by plt.axvline(). This line marks a common statistical threshold for outliers.

  • It's positioned at two standard deviations above the mean, calculated with np.mean(data) + 2*np.std(data).
  • Any data points falling to the right of this line are flagged as potential outliers.

Get started with Replit

Now, turn your knowledge into a working tool. Describe what you want to build to Replit Agent, like “a dashboard to visualize website traffic” or “an app to analyze exam score distributions.”

The Agent will write the code, test for errors, and deploy the app from your description. Start building with Replit.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.