How to plot a scatter plot in Python
Learn to create scatter plots in Python. This guide covers various methods, tips, real-world uses, and how to debug common errors.

Scatter plots are a powerful tool in Python to visualize the relationship between two variables. They help you identify patterns, trends, and correlations within your data through a simple graphical representation.
In this article, you will learn several techniques to create scatter plots. You'll get practical tips, see real-world applications, and receive advice to debug common issues you might face along the way.
Creating a basic scatter plot with matplotlib
import matplotlib.pyplot as plt
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 7, 4, 8, 2])
plt.scatter(x, y)
plt.show()--OUTPUT--[A scatter plot with 5 points]
This example uses matplotlib.pyplot, Python's go-to library for creating static and interactive visualizations. The code first defines two numpy arrays, x and y, which serve as the coordinates for the points on the graph. Each corresponding element from the arrays pairs up to define a single point's location, following similar principles to creating arrays in Python.
The plotting itself is a two-step process. The plt.scatter(x, y) function constructs the plot in memory by mapping your data points. It won't display anything on its own, though. You need to explicitly call plt.show() to render the final visual.
Customizing your scatter plots
That basic plot gets the job done, but you can tell a much richer story with your data by customizing visual elements like color and size.
Changing colors and markers with scatter()
import matplotlib.pyplot as plt
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 7, 4, 8, 2])
plt.scatter(x, y, color='red', marker='s')
plt.title("Red Square Markers")
plt.show()--OUTPUT--[A scatter plot with red square markers]
You can easily tweak your plot’s appearance by passing more arguments to the plt.scatter() function. This lets you go beyond the default blue circles.
- The
colorparameter sets the color for all data points. In this case, it’s set to'red'. - The
markerparameter controls the shape. Using's'turns the points into squares.
The code also introduces plt.title(), a simple way to add a descriptive title above your graph for better context.
Adjusting size and transparency
import matplotlib.pyplot as plt
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 7, 4, 8, 2])
plt.scatter(x, y, s=100, alpha=0.5, color='blue')
plt.grid(True)
plt.show()--OUTPUT--[A scatter plot with larger semi-transparent blue dots and grid lines]
You can further refine your plot's appearance by adjusting the size and transparency of the markers. This is particularly useful for making dense data easier to read.
- The
sparameter controls the size of each point. Here,s=100makes them noticeably larger. - The
alphaparameter sets the transparency level. Analphaof0.5makes the points semi-transparent, which helps reveal overlapping data. - Finally,
plt.grid(True)adds a grid to the plot, improving readability.
Creating multi-series scatter plots with legends
import matplotlib.pyplot as plt
import numpy as np
x1, x2 = np.array([1, 2, 3, 4]), np.array([2, 3, 4, 5])
y1, y2 = np.array([1, 4, 3, 6]), np.array([5, 3, 6, 2])
plt.scatter(x1, y1, color='green', label='Group 1')
plt.scatter(x2, y2, color='purple', label='Group 2')
plt.legend()
plt.show()--OUTPUT--[A scatter plot with two different colored groups of points and a legend]
You can easily compare different datasets on the same graph by calling plt.scatter() multiple times. Each call adds a new series of points to your plot, allowing you to visualize distinct groups of data together. For more complex comparisons, consider making subplots in Python.
- The
labelparameter within eachplt.scatter()call assigns a name to its corresponding data series. - After defining all your series, you simply call
plt.legend()to display a legend that connects each label to its color.
Advanced scatter plot techniques
Beyond basic customization, you can elevate your visualizations by plotting from pandas DataFrames, extending into three dimensions, or adding interactivity with libraries like plotly.
Using pandas DataFrames for scatter plots
import pandas as pd
import matplotlib.pyplot as plt
data = pd.DataFrame({
'x': [1, 2, 3, 4, 5],
'y': [5, 7, 4, 8, 2],
'size': [20, 50, 100, 80, 30]
})
plt.scatter(data['x'], data['y'], s=data['size'])
plt.show()--OUTPUT--[A scatter plot with varying sized dots based on the 'size' column]
Plotting directly from a pandas DataFrame is a common and efficient workflow for vibe coding data apps. Instead of using separate arrays, you can pass DataFrame columns directly into the plt.scatter() function. This approach builds on the fundamentals of creating DataFrames in Python. This example also encodes a third variable into the visualization, adding another layer of information.
- The x and y-coordinates are pulled from the
data['x']anddata['y']columns. - The
sparameter is mapped to thedata['size']column, making each point's size proportional to its value. This effectively visualizes three dimensions of data on a 2D plot.
Creating 3D scatter plots with Axes3D
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = np.random.rand(20)
y = np.random.rand(20)
z = np.random.rand(20)
ax.scatter(x, y, z)
plt.show()--OUTPUT--[A 3D scatter plot with randomly positioned points in space]
To visualize data in three dimensions, you'll need to import the Axes3D toolkit. The process involves creating a subplot and explicitly setting its projection to '3d'. This step transforms your standard 2D plot into a 3D space.
- The line
ax = fig.add_subplot(111, projection='3d')creates the 3D axes. - You then call
ax.scatter(), passing three arguments—one for each of the x, y, and z coordinates.
This allows you to plot points in a three-dimensional coordinate system, offering a deeper perspective on your data's structure.
Creating interactive scatter plots with plotly
import plotly.express as px
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 7, 4, 8, 2])
fig = px.scatter(x=x, y=y, hover_name=["A", "B", "C", "D", "E"])
fig.show()--OUTPUT--[An interactive scatter plot where hovering over points shows labels]
For interactive plots, plotly is a fantastic alternative to matplotlib. The plotly.express library, often imported as px, simplifies the process. You use px.scatter() to create the plot, which returns a figure object. Then, fig.show() renders the interactive visualization, which you can pan, zoom, and inspect.
This approach adds a new layer of user engagement.
- The
hover_nameparameter is a key feature. It assigns a specific label to each data point. - When you move your cursor over a point on the graph, its corresponding label appears in a tooltip.
Move faster with Replit
Learning individual techniques is one thing, but building a complete application is another. Replit is an AI-powered development platform where Python dependencies come pre-installed, so you can skip setup and start coding instantly.
Instead of piecing together techniques, you can use Agent 4 to build a complete application from a simple description. Describe the app you want to build, and the Agent will take it from idea to working product. For example, you could build:
- An interactive dashboard that visualizes website traffic, mapping user sessions to geographic locations on a scatter plot.
- A stock comparison tool that plots the performance of multiple companies on a single graph, using different colors and markers for each.
- A 3D data visualizer for scientific research that plots complex datasets, allowing you to rotate and explore relationships between three variables.
Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.
Common errors and challenges
Even with the best tools, you might run into issues like mismatched array sizes, confusing color maps, or overlapping data points.
Debugging mismatched array dimensions in scatter() plots
One of the most common errors you'll encounter is a ValueError telling you that your x and y arrays must be the same size. This happens when the number of x-coordinates doesn't match the number of y-coordinates. Since scatter() needs to pair each x with a corresponding y to plot a point, any mismatch will halt the process. To fix this, double-check the length of your input arrays or DataFrame columns before plotting.
Fixing color mapping issues with the c parameter
Customizing colors can sometimes produce unexpected results, especially when using the c parameter. While you can pass a single color name, passing a sequence of values is where things get tricky. The scatter() function doesn't interpret these values as direct colors but instead maps them to a colormap. If your colors aren't appearing as you intended, ensure the array you pass to c has the same length as your data and that you understand how its values will be translated into the final plot's color scheme.
Resolving overlapping points with jitter
When you have a dense dataset, many points can plot on top of each other, hiding the true distribution of your data. A simple solution is to add "jitter"—a small amount of random noise—to your data's positions. This spreads the points out slightly, making it easier to see clusters and density. While matplotlib doesn't have a built-in jitter option for scatter(), you can easily add it by generating small random numbers with a library like numpy and adding them to your coordinate arrays before plotting.
Debugging mismatched array dimensions in scatter() plots
A frequent stumbling block when using plt.scatter() is the ValueError that arises from mismatched array dimensions. This error occurs because every x-coordinate needs a corresponding y-coordinate to plot a point, and the function will fail if the arrays differ in length. The code below shows what happens when you try to plot arrays of unequal size.
import matplotlib.pyplot as plt
import numpy as np
# Arrays with different lengths
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 7, 4, 8])
plt.scatter(x, y)
plt.show()
Here, the x array has five elements, but the y array has just four. Since matplotlib can't pair them up, it raises an error. The following code demonstrates the fix.
import matplotlib.pyplot as plt
import numpy as np
# Arrays with matching lengths
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 7, 4, 8, 2])
plt.scatter(x, y)
plt.show()
The fix is straightforward: ensure your x and y arrays have an equal number of elements. In the corrected example, the y array is updated to include a fifth value, 2, so it matches the length of the x array. This allows plt.scatter() to pair each coordinate successfully. This error often pops up when you're cleaning or filtering data, so it's a good practice to check array lengths before you plot.
Fixing color mapping issues with the c parameter
When you pass a sequence of numbers to the c parameter, you might expect them to correspond to specific colors. However, matplotlib maps these values to a colormap, which can be confusing if you're not prepared. See what happens in the code below.
import matplotlib.pyplot as plt
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 7, 4, 8, 2])
values = [10, 20, 30, 40, 50]
plt.scatter(x, y, c=values)
plt.show()
The values array doesn't assign distinct colors. Instead, matplotlib maps these numbers along a continuous color gradient, which can be confusing. The following code demonstrates how to make this mapping explicit and understandable.
import matplotlib.pyplot as plt
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 7, 4, 8, 2])
values = [10, 20, 30, 40, 50]
plt.scatter(x, y, c=values, cmap='viridis')
plt.colorbar(label='Values')
plt.show()
The solution makes the color mapping clear by adding two things. First, the cmap='viridis' argument explicitly tells matplotlib to use the "viridis" colormap, a popular choice for its readability. Second, plt.colorbar() adds a legend that shows exactly how the numerical values correspond to the colors on the plot. This removes any guesswork and makes your visualization much easier to interpret, especially when you're encoding a third variable as color.
Resolving overlapping points with jitter
When your dataset contains duplicate or very close data points, they can stack on top of each other in a scatter plot. This "overplotting" hides the true density of your data, making it look like you have fewer points than you actually do. The code below shows exactly what this looks like.
import matplotlib.pyplot as plt
import numpy as np
# Data with duplicate points
x = np.array([1, 1, 2, 2, 3, 3])
y = np.array([1, 1, 2, 2, 3, 3])
plt.scatter(x, y)
plt.title("Overlapping Points")
plt.show()
The plot appears to have only three points because duplicate coordinate pairs are drawn directly on top of each other, hiding the true data density. The following code demonstrates a common fix for this visual overlap.
import matplotlib.pyplot as plt
import numpy as np
# Adding jitter to reveal overlapping points
x = np.array([1, 1, 2, 2, 3, 3])
y = np.array([1, 1, 2, 2, 3, 3])
jitter = 0.1 * np.random.rand(len(x))
plt.scatter(x + jitter, y + jitter)
plt.title("Points with Jitter")
plt.show()
The fix is to add a small amount of random noise, known as "jitter," to your data. By creating a jitter array using np.random.rand() and adding it to your x and y coordinates, you slightly shift each point's position. This spreads them out, revealing the true density of your data. This technique is especially useful when working with datasets that contain discrete or rounded values, where overplotting is a memory-efficient issue.
Real-world applications
With the mechanics of creating and debugging plots covered, you can now use them to explore real-world data and model outputs.
Visualizing correlation in real-world data with iris dataset
The classic iris dataset offers a straightforward way to visualize the relationship between two variables by plotting sepal length against sepal width to check for correlation. This complements statistical methods for finding correlation in Python.
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
iris = load_iris()
plt.scatter(iris.data[:, 0], iris.data[:, 1])
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.show()
This code snippet demonstrates how to plot data directly from a dataset included with the scikit-learn library. It starts by calling load_iris() to fetch the dataset object, which contains both the flower measurements and their classifications.
The core of the plotting logic lies in how the data is selected for the axes:
iris.data[:, 0]uses array slicing to grab all rows from the first column (sepal length) for the x-axis.iris.data[:, 1]does the same for the second column (sepal width) to use for the y-axis.
Finally, plt.xlabel() and plt.ylabel() add descriptive labels, a crucial step for making any chart readable.
Applying scatter plots to KMeans clustering results
Scatter plots are also essential for visualizing the output of machine learning algorithms, such as the clusters identified by a KMeans model.
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import KMeans
X = np.random.rand(100, 2)
kmeans = KMeans(n_clusters=3).fit(X)
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], marker='x', s=100, c='red')
plt.show()
This code visualizes the output of a KMeans clustering model, building on concepts from k-means clustering in Python. It first generates random data, then fits the model to group that data into three distinct clusters. The plot itself is built in two main stages.
- The first
plt.scatter()call displays the data points. Thec=kmeans.labels_argument is key here; it colors each point based on the cluster it was assigned to by the algorithm. - A second
plt.scatter()call then overlays the cluster centers onto the plot, marking them with large red 'x's for easy identification.
This combination clearly shows how the algorithm has grouped the data.
Get started with Replit
Now, turn your knowledge into a real application. Describe what you want to build to Replit Agent, like “a tool to plot housing prices vs. square footage” or “a dashboard visualizing user engagement data”.
The Agent writes the code, tests for errors, and deploys your app. You just provide the instructions. Start building with Replit.
Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.
Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.



