How to find the accuracy of a model in Python
Learn how to find the accuracy of a model in Python. Explore different methods, tips, real-world applications, and common error debugging.

To find a model's accuracy in Python is a crucial step in machine learning. It helps you evaluate performance and validate results before you deploy your model to production.
In this article, we'll cover key techniques to measure accuracy. You'll get practical tips, see real-world applications, and learn how to debug common issues to improve your model's reliability.
Using accuracy_score from scikit-learn
from sklearn.metrics import accuracy_score
y_true = [0, 1, 0, 1, 1]
y_pred = [0, 0, 0, 1, 1]
accuracy = accuracy_score(y_true, y_pred)
print(f"Model accuracy: {accuracy:.2f}")--OUTPUT--Model accuracy: 0.80
The accuracy_score function is a straightforward tool for validation. It operates by comparing two key lists:
y_true: This list contains the ground truth, or the correct labels for your data.y_pred: This list holds the labels your model predicted.
The function calculates the proportion of correct predictions. Here, the model got 4 out of 5 right, which is why the output is 0.80, or 80% accuracy.
Basic accuracy measurement techniques
Although accuracy_score offers a solid starting point, you'll often need more detailed methods to truly understand your model's performance.
Calculating accuracy from a confusion matrix
from sklearn.metrics import confusion_matrix
import numpy as np
y_true = [0, 1, 0, 1, 1, 0, 1, 0]
y_pred = [0, 1, 0, 0, 1, 0, 1, 1]
cm = confusion_matrix(y_true, y_pred)
accuracy = np.sum(np.diag(cm)) / np.sum(cm)
print(f"Confusion matrix:\n{cm}")
print(f"Accuracy: {accuracy:.2f}")--OUTPUT--Confusion matrix:
[[3 1]
[1 3]]
Accuracy: 0.75
A confusion matrix offers a more detailed view than a simple accuracy score. The confusion_matrix function creates a grid that visualizes correct and incorrect predictions. To calculate accuracy from this matrix, you divide the number of correct predictions by the total number of predictions.
- The diagonal elements of the matrix, found with
np.diag(cm), represent all correct predictions. - Summing these and dividing by the total elements, calculated with
np.sum(cm), gives you the overall accuracy—in this case, 75%.
Using cross-validation to measure accuracy
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=100, random_state=42)
model = RandomForestClassifier(random_state=42)
cv_scores = cross_val_score(model, X, y, cv=5)
print(f"Cross-validation accuracy scores: {cv_scores}")
print(f"Mean accuracy: {cv_scores.mean():.2f}")--OUTPUT--Cross-validation accuracy scores: [0.85 0.9 0.85 0.9 0.95]
Mean accuracy: 0.89
Cross-validation provides a more reliable accuracy estimate than a single train-test split. The cross_val_score function automates this process for you. In the example, cv=5 tells the function to split the dataset into five sections, or "folds."
- The model trains on four folds and is tested on the fifth one.
- This process repeats five times, ensuring each fold gets a turn as the test set.
This results in five distinct accuracy scores. By calculating their average with cv_scores.mean(), you get a more stable and trustworthy measure of how your model will perform on new data.
Using classification_report for detailed metrics
from sklearn.metrics import classification_report
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
model = SVC().fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))--OUTPUT--precision recall f1-score support
0 1.00 1.00 1.00 16
1 0.93 0.93 0.93 14
2 0.93 0.93 0.93 15
accuracy 0.96 45
macro avg 0.95 0.96 0.95 45
weighted avg 0.96 0.96 0.96 45
The classification_report function provides a comprehensive performance summary. It’s particularly useful for seeing how your model handles each class individually—something a simple accuracy score can’t show you. The report breaks down performance using several key metrics:
precision: Of all the times the model predicted a certain class, how often was it correct?recall: Of all the actual instances of a class, how many did the model successfully identify?f1-score: A single metric that balances both precision and recall.
This detailed view helps you diagnose if your model is biased toward or against specific classes.
Advanced accuracy evaluation methods
While the basic tools offer a solid baseline, you'll need more advanced methods to tackle complex scenarios like imbalanced datasets and custom requirements.
Handling imbalanced datasets with weighted accuracy
from sklearn.metrics import balanced_accuracy_score
from sklearn.metrics import accuracy_score
# Imbalanced dataset example
y_true = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
y_pred = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
standard_acc = accuracy_score(y_true, y_pred)
balanced_acc = balanced_accuracy_score(y_true, y_pred)
print(f"Standard accuracy: {standard_acc:.2f}")
print(f"Balanced accuracy: {balanced_acc:.2f}")--OUTPUT--Standard accuracy: 0.60
Balanced accuracy: 0.50
When your dataset is imbalanced, like this example with many more instances of class 0 than class 1, standard accuracy can be deceptive. The accuracy_score of 0.60 might seem acceptable, but it hides a critical weakness because it doesn't distinguish between the majority and minority classes.
This is where balanced_accuracy_score comes in. It provides a more honest evaluation by calculating the average of the recall scores for each class, giving them equal weight. The resulting score of 0.50 reveals the model's poor performance on the underrepresented class, giving you a truer picture of its effectiveness.
Using stratified K-fold cross-validation
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(return_X_y=True)
model = GradientBoostingClassifier(random_state=42)
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X, y, cv=cv, scoring='accuracy')
print(f"Stratified CV scores: {scores}")
print(f"Mean accuracy: {scores.mean():.4f}")--OUTPUT--Stratified CV scores: [0.9649123 0.95614035 0.94736842 0.98245614 0.96491228]
Mean accuracy: 0.9632
When your dataset has an uneven class distribution, standard cross-validation can be misleading. StratifiedKFold is a smarter approach that ensures each data split maintains the original dataset's class proportions, giving you a more reliable evaluation.
- It works by preserving the percentage of samples for each class within every fold. This prevents a split from accidentally containing too many examples of one class, which would skew the results.
By passing StratifiedKFold to the cv parameter in cross_val_score, you ensure each of the five folds is a representative sample, leading to a more accurate performance estimate.
Creating custom accuracy metrics
from sklearn.metrics import make_scorer
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
def custom_accuracy(y_true, y_pred):
correct = sum(1 for true, pred in zip(y_true, y_pred) if true == pred)
return correct / len(y_true)
custom_scorer = make_scorer(custom_accuracy)
X, y = make_classification(n_samples=200, random_state=42)
model = DecisionTreeClassifier(random_state=42)
scores = cross_val_score(model, X, y, cv=5, scoring=custom_scorer)
print(f"Custom accuracy scores: {scores}")
print(f"Average custom accuracy: {scores.mean():.2f}")--OUTPUT--Custom accuracy scores: [0.85 0.8 0.8 0.9 0.875]
Average custom accuracy: 0.85
Sometimes you need a performance metric that scikit-learn doesn't offer out of the box. You can create your own by defining a Python function, like custom_accuracy in the example, which takes the true and predicted labels and returns a score.
To integrate your function with scikit-learn's evaluation tools, you just need to wrap it.
- The
make_scorerutility converts your custom function into a usable scorer object. - You can then pass this new scorer to the
scoringparameter in functions likecross_val_scoreto evaluate your model based on your unique logic.
Move faster with Replit
Replit is an AI-powered development platform where you can skip setup and start coding instantly. All Python dependencies are pre-installed, so you can focus on building instead of managing environments.
While mastering individual techniques is important, Agent 4 helps you move from piecing together functions to building complete applications. Instead of just calculating accuracy, you can describe the app you want to build, and the Agent will take it from concept to a working product.
- A model performance dashboard that uses metrics from
classification_reportto visualize precision and recall for each class, helping you spot biases. - A fraud detection evaluator that automatically compares standard accuracy with
balanced_accuracy_scoreto ensure your model works well on imbalanced data. - A model selection tool that runs
cross_val_scoreon several algorithms and presents the mean accuracy scores in a simple table to help you choose the best one.
Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.
Common errors and challenges
Even with the right tools, you can run into a few common pitfalls when measuring model accuracy.
Fixing incorrect parameter order in accuracy_score
A simple but frequent mistake is mixing up the parameters in accuracy_score. The function expects the true labels first (y_true) followed by the predicted labels (y_pred). Reversing them won't cause an error, but it can give you a misleading accuracy score if your predictions and true labels aren't symmetrical.
Handling missing values before calculating accuracy
Most scikit-learn functions, including those for accuracy, can't handle missing data like NaN values. If you try to calculate accuracy on a dataset with gaps, you'll get a ValueError. Always clean your data by either removing or filling in missing values before you evaluate your model.
Avoiding accuracy misinterpretation with imbalanced data
Relying on a single accuracy score can be deceptive, especially with imbalanced data. Here’s how to avoid common misinterpretations:
- A high accuracy score might just mean your model is good at predicting the majority class, while failing on the minority one.
- Always check the class distribution in your dataset. If it's skewed, use metrics like
balanced_accuracy_scoreor aclassification_reportfor a more complete picture. - These tools give you a class-by-class breakdown, revealing weaknesses that a simple accuracy score would otherwise hide.
Fixing incorrect parameter order in accuracy_score
One of the most frequent yet subtle errors is swapping the y_true and y_pred arguments in accuracy_score. This won't trigger an error, but it can silently corrupt your evaluation. The following code shows this mistake in action.
from sklearn.metrics import accuracy_score
y_true = [0, 1, 0, 1, 1]
y_pred = [0, 0, 1, 1, 0]
# Incorrect parameter order
accuracy = accuracy_score(y_pred, y_true)
print(f"Model accuracy: {accuracy:.2f}")
By passing y_pred first, the function treats your model's predictions as the ground truth, making the score meaningless. See how to pass the parameters in the correct order to get a valid result.
from sklearn.metrics import accuracy_score
y_true = [0, 1, 0, 1, 1]
y_pred = [0, 0, 1, 1, 0]
# Correct parameter order: y_true first, then y_pred
accuracy = accuracy_score(y_true, y_pred)
print(f"Model accuracy: {accuracy:.2f}")
The fix is simple: always pass the true labels, y_true, as the first argument to accuracy_score, followed by your model's predictions, y_pred. While swapping them won't raise an error, it makes the result invalid because the function will treat your predictions as the ground truth. This subtle mistake can easily happen when you're quickly testing or refactoring code, so it's a good habit to double-check the parameter order every time you use the function.
Handling missing values before calculating accuracy
Your accuracy calculation will fail if your data contains missing values, often represented as NaN. Scikit-learn's metrics functions aren't designed to handle these gaps and will raise a ValueError. The following code demonstrates what happens when you try.
import numpy as np
from sklearn.metrics import accuracy_score
y_true = [0, 1, 0, 1, np.nan, 1]
y_pred = [0, 0, 0, 1, 1, 1]
# This will raise an error
accuracy = accuracy_score(y_true, y_pred)
print(f"Accuracy: {accuracy:.2f}")
The code fails because accuracy_score can't handle the np.nan value inside the y_true list. It's a common roadblock when your data isn't clean. The following example shows how to fix this before calculating accuracy.
import numpy as np
from sklearn.metrics import accuracy_score
y_true = [0, 1, 0, 1, np.nan, 1]
y_pred = [0, 0, 0, 1, 1, 1]
# Filter out NaN values
mask = ~np.isnan(y_true)
accuracy = accuracy_score(np.array(y_true)[mask], np.array(y_pred)[mask])
print(f"Accuracy: {accuracy:.2f}")
To fix this, you must remove the NaN values before evaluation. The solution creates a boolean mask using ~np.isnan(y_true) to identify all valid entries. This mask is then applied to both y_true and y_pred, ensuring you only compare the non-missing values. This keeps your data aligned and gives you a correct accuracy score. You'll often run into this issue when working with real-world datasets, which frequently contain missing information.
Avoiding accuracy misinterpretation with imbalanced data
A high accuracy score can be deceptive with imbalanced data. A model might achieve 90% accuracy by only predicting the majority class, while completely failing on the minority one. This creates a false sense of success, hiding critical performance issues.
The following code demonstrates this problem. It uses a DummyClassifier that always predicts the most frequent class, yet still achieves a high accuracy score.
from sklearn.metrics import accuracy_score
from sklearn.dummy import DummyClassifier
from sklearn.datasets import make_classification
# Create imbalanced dataset (90% class 0, 10% class 1)
X, y = make_classification(n_samples=1000, weights=[0.9, 0.1], random_state=42)
dummy = DummyClassifier(strategy='most_frequent').fit(X, y)
y_pred = dummy.predict(X)
print(f"Accuracy: {accuracy_score(y, y_pred):.2f}")
Because the DummyClassifier only predicts the most frequent class, it achieves a high accuracy score by default. This hides its complete failure on the minority class. The following code demonstrates how to get a more reliable assessment.
from sklearn.metrics import accuracy_score, balanced_accuracy_score
from sklearn.dummy import DummyClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, weights=[0.9, 0.1], random_state=42)
dummy = DummyClassifier(strategy='most_frequent').fit(X, y)
y_pred = dummy.predict(X)
print(f"Standard accuracy: {accuracy_score(y, y_pred):.2f}")
print(f"Balanced accuracy: {balanced_accuracy_score(y, y_pred):.2f}")
To get a true sense of performance, use balanced_accuracy_score. While standard accuracy is a deceptive 0.90, the balanced score is 0.50—no better than a coin flip. This is because it averages the recall for each class, giving them equal weight, which correctly shows that the model fails on the minority class. Always use it when your dataset is skewed, like in fraud detection, where missing rare cases is costly.
Real-world applications
With these potential errors in mind, you can apply accuracy evaluation techniques to solve real-world problems like customer churn and medical diagnosis.
Evaluating a customer churn prediction model with accuracy_score
In a business scenario like predicting customer churn, you can use accuracy_score to get a quick read on your model's performance.
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import pandas as pd
# Simple customer churn dataset
data = {'usage_minutes': [105, 231, 19, 142, 325, 88, 302, 250],
'contract_length': [1, 24, 12, 1, 24, 1, 36, 12],
'churn': [1, 0, 0, 1, 0, 1, 0, 0]} # 1=churned
df = pd.DataFrame(data)
X = df[['usage_minutes', 'contract_length']]
y = df['churn']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
model = RandomForestClassifier(random_state=42).fit(X_train, y_train)
print(f"Churn prediction accuracy: {accuracy_score(y_test, y_pred=model.predict(X_test)):.2f}")
This example demonstrates a complete, small-scale machine learning workflow. It begins by creating a pandas DataFrame to hold customer data, using features like usage_minutes and contract_length to predict whether a customer will churn.
- The data is split using
train_test_split, a crucial step that ensures the model is evaluated on information it hasn't seen during training. - A
RandomForestClassifieris then trained on the training portion of the data. - Finally, the model makes predictions on the test set, and
accuracy_scorecalculates the percentage of correct predictions.
Comparing multiple models' accuracy for medical diagnosis
In high-stakes fields like medical diagnosis, it's critical to compare several models to ensure you select the most reliable one for the task.
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, balanced_accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
models = {"Logistic Regression": LogisticRegression(max_iter=1000),
"Random Forest": RandomForestClassifier(random_state=42),
"SVM": SVC()}
for name, model in models.items():
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"{name}: Accuracy={accuracy_score(y_test, y_pred):.4f}, "
f"Balanced Accuracy={balanced_accuracy_score(y_test, y_pred):.4f}")
This code automates the comparison of three different classification algorithms. It first loads the breast cancer dataset and splits it for training and testing. Then, it iterates through a dictionary containing the models:
LogisticRegressionRandomForestClassifierSVC
For each model, the code trains it, makes predictions, and prints both the standard accuracy_score and the balanced_accuracy_score. This provides a direct, side-by-side evaluation, making it easy to identify which model performs best on the test data.
Get started with Replit
Turn what you've learned into a real tool with Replit Agent. Try prompts like: “Build a dashboard that compares models using cross_val_score” or “Create a tool that checks for imbalance with balanced_accuracy_score.”
The Agent writes the code, tests for errors, and helps you deploy your app. Start building with Replit.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

.png)

.png)