How to implement logistic regression in Python

Implement logistic regression in Python. This guide covers different methods, tips, real-world applications, and how to debug common errors.

How to implement logistic regression in Python
Published on: 
Mon
Apr 6, 2026
Updated on: 
Fri
Apr 10, 2026
The Replit Team

Logistic regression is a powerful statistical method for binary classification tasks. In Python, libraries like scikit-learn make its implementation straightforward, so you can predict outcomes from input features.

In this article, we'll guide you through the implementation steps. You'll learn practical techniques, see real-world applications, and get debugging tips to build effective models for your projects.

Using sklearn for basic logistic regression

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=100, n_features=2, random_state=42)
model = LogisticRegression()
model.fit(X, y)
print(f"Accuracy: {model.score(X, y):.2f}")--OUTPUT--Accuracy: 0.96

This example uses make_classification to quickly generate a synthetic dataset. It's a common practice for demonstrating algorithms, as it provides clean data—features (X) and labels (y)—without needing to find and load an external file.

The core logic then unfolds in a few key steps:

  • First, an instance of the model is created with LogisticRegression().
  • The model is then trained on the data using the model.fit(X, y) method.
  • Finally, model.score(X, y) evaluates the model's accuracy on the same data it was trained on, giving a quick performance check.

Implementing from scratch

Now that you've seen the high-level implementation, building the model from scratch offers a look under the hood at the core components driving the classification.

Building a simple sigmoid function

import numpy as np
import matplotlib.pyplot as plt

def sigmoid(z):
return 1 / (1 + np.exp(-z))

z = np.linspace(-10, 10, 100)
plt.plot(z, sigmoid(z))
plt.title("Sigmoid Function")--OUTPUT--<matplotlib.text.Text object at 0x7f8b1d5f9e80>

The sigmoid function is the engine of logistic regression. It’s an S-shaped curve that takes any real-valued number and transforms it into a value between 0 and 1. This is how the model converts its output into a probability. The code implements this transformation in the sigmoid(z) function.

  • The np.linspace and plt.plot functions are used to visualize this curve.
  • You can see how extreme inputs get pushed toward 0 or 1, which is the key to making a binary classification.

Implementing cost function for logistic regression

def cost_function(X, y, weights):
z = np.dot(X, weights)
h = sigmoid(z)
cost = -1/len(y) * np.sum(y * np.log(h) + (1-y) * np.log(1-h))
return cost

X_with_bias = np.c_[np.ones(X.shape[0]), X]
initial_weights = np.zeros(X_with_bias.shape[1])
print(f"Initial cost: {cost_function(X_with_bias, y, initial_weights):.4f}")--OUTPUT--Initial cost: 0.6931

The cost function measures how wrong the model's predictions are. Your goal during training is to find weights that minimize this value. The code implements log-loss, which is a standard way to evaluate classification models.

  • It penalizes the model more for predictions that are both confident and incorrect.
  • A bias term is added to the features using np.c_. This gives the model more flexibility, similar to a y-intercept in a linear equation.
  • The initial cost is calculated with weights at zero, giving you a baseline before optimization starts.

Training with gradient descent

def gradient_descent(X, y, weights, alpha, iterations):
cost_history = []
for i in range(iterations):
z = np.dot(X, weights)
h = sigmoid(z)
gradient = np.dot(X.T, (h - y)) / len(y)
weights -= alpha * gradient
cost_history.append(cost_function(X, y, weights))
return weights, cost_history

final_weights, costs = gradient_descent(X_with_bias, y, initial_weights, 0.1, 1000)
print(f"Final cost: {costs[-1]:.4f}")--OUTPUT--Final cost: 0.3512

Gradient descent is the optimization engine that trains the model. The gradient_descent function iteratively adjusts the model's weights to minimize the cost function, making predictions more accurate over time.

  • The gradient is calculated in each loop. It points in the direction of the steepest ascent of the cost function.
  • The weights are then updated in the opposite direction. The learning rate, alpha, controls the size of each step.
  • This process repeats for 1000 iterations, gradually reducing the cost from its initial value and improving the model's fit.

Advanced techniques

Building on this foundation, you can now implement advanced techniques to create more robust models and evaluate their performance with greater precision.

Using regularization to prevent overfitting

def regularized_cost(X, y, weights, lambda_param):
m = len(y)
reg_term = (lambda_param / (2 * m)) * np.sum(weights[1:]**2) # Exclude bias
return cost_function(X, y, weights) + reg_term

lambda_param = 1.0
reg_cost = regularized_cost(X_with_bias, y, final_weights, lambda_param)
print(f"Regularized cost: {reg_cost:.4f}")--OUTPUT--Regularized cost: 0.3568

Regularization is a technique to prevent overfitting, which happens when a model learns the training data too well and performs poorly on new data. It discourages complexity by adding a penalty for large weights to the cost function, promoting a simpler and more general model.

  • The regularized_cost function adds a reg_term to the original cost. This term penalizes the model based on the size of its weights.
  • The strength of this penalty is controlled by lambda_param. A higher value pushes the weights closer to zero, simplifying the model.
  • The calculation intentionally excludes the bias weight using weights[1:], as it's common practice to only penalize the weights associated with the input features.

Implementing mini-batch gradient descent

def mini_batch_gradient_descent(X, y, weights, alpha, iterations, batch_size):
m = len(y)
cost_history = []

for i in range(iterations):
indices = np.random.permutation(m)
X_shuffled = X[indices]
y_shuffled = y[indices]

for j in range(0, m, batch_size):
X_batch = X_shuffled[j:j+batch_size]
y_batch = y_shuffled[j:j+batch_size]

z = np.dot(X_batch, weights)
h = sigmoid(z)
gradient = np.dot(X_batch.T, (h - y_batch)) / len(y_batch)
weights -= alpha * gradient

cost_history.append(cost_function(X, y, weights))
return weights, cost_history

batch_weights, batch_costs = mini_batch_gradient_descent(X_with_bias, y, np.zeros(X_with_bias.shape[1]), 0.1, 50, 20)
print(f"Final mini-batch cost: {batch_costs[-1]:.4f}")--OUTPUT--Final mini-batch cost: 0.3629

Mini-batch gradient descent offers a practical middle ground for optimization. Instead of using the entire dataset for each update, it processes smaller, random subsets—or mini-batches. This approach is more computationally efficient, especially with large datasets, and often leads to faster convergence.

  • The data is first shuffled using np.random.permutation to ensure each batch is random.
  • An inner loop then iterates through the shuffled data, creating chunks determined by the batch_size.
  • Weights are updated after processing each mini-batch, providing more frequent updates than standard gradient descent.

Evaluating model performance with confusion matrix

from sklearn.metrics import confusion_matrix, classification_report

def predict(X, weights):
z = np.dot(X, weights)
return (sigmoid(z) >= 0.5).astype(int)

y_pred = predict(X_with_bias, final_weights)
conf_matrix = confusion_matrix(y, y_pred)
print("Confusion Matrix:")
print(conf_matrix)--OUTPUT--Confusion Matrix:
[[49 2]
[ 4 45]]

While accuracy provides a single score, a confusion matrix offers a deeper look at your model's performance. The predict function generates predictions by applying a 0.5 threshold to the sigmoid output. Then, scikit-learn's confusion_matrix function compares these predictions against the true labels.

  • The resulting 2x2 grid details every correct and incorrect classification.
  • It gives you the exact counts for true positives, true negatives, false positives, and false negatives, revealing precisely where the model is succeeding or making errors.

Move faster with Replit

Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly. Instead of just learning individual techniques, you can move straight to building complete applications.

This is where Agent 4 comes in. It takes an idea to a working product by handling the code, databases, APIs, and deployment directly from your description. You can build practical classification tools, such as:

  • A spam filter that classifies incoming emails as legitimate or junk.
  • A sentiment analysis tool to determine if customer feedback is positive or negative.
  • A simple loan eligibility predictor that assesses an applicant's risk based on financial data.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

Implementing logistic regression from scratch can surface a few common challenges, but each has a straightforward solution you can master.

For instance, you might encounter numerical instability in the sigmoid function, where extreme input values cause overflow errors. This can cascade into NaN (Not a Number) values within your cost function, especially when the algorithm tries to calculate the logarithm of zero, effectively stopping your model from learning. Another frequent hurdle is handling imbalanced data, where one class vastly outnumbers the other, potentially tricking your model into being lazy and still looking accurate. The following sections will address how to resolve these key problems:

  • Handling numerical instability in the sigmoid function
  • Fixing NaN errors in the cost function
  • Correcting imbalanced classes in training data

Handling numerical instability in the sigmoid function

Numerical instability in the sigmoid function often stems from its core calculation, np.exp(-z). If z is a large negative number, -z becomes a large positive one, and the exponential function can overflow, returning an inf value that disrupts your model. See what happens in the code below.

def sigmoid(z):
return 1 / (1 + np.exp(-z))

large_negative = -1000
print(f"Sigmoid({large_negative}) = {sigmoid(large_negative)}")

With an input like -1000, the np.exp(-z) calculation overflows, making the denominator infinite. This forces the function to incorrectly return 0.0. The code below shows a more stable way to handle this calculation.

def sigmoid(z):
# Clip values to prevent overflow
z = np.clip(z, -500, 500)
return 1 / (1 + np.exp(-z))

large_negative = -1000
print(f"Sigmoid({large_negative}) = {sigmoid(large_negative)}")

The fix is surprisingly simple: you just need to constrain the input to the sigmoid function. This prevents the exponential calculation from blowing up.

  • The np.clip(z, -500, 500) function caps the input values, keeping them within a manageable range.
  • This stops np.exp() from receiving extreme numbers that would otherwise cause an overflow error.

This is a crucial step when your dataset contains features with a very wide range of values.

Fixing NaN errors in the cost function

A NaN error in your cost_function often signals a math problem: you’re trying to take the logarithm of zero. This happens when the sigmoid function's output is exactly 0 or 1, making the cost calculation impossible. The code below demonstrates how this can occur.

def cost_function(X, y, weights):
z = np.dot(X, weights)
h = sigmoid(z)
cost = -1/len(y) * np.sum(y * np.log(h) + (1-y) * np.log(1-h))
return cost

When the sigmoid output h is exactly 0 or 1, the np.log() function receives an invalid input. This breaks the cost calculation and returns NaN. The code below shows how a small adjustment prevents this.

def cost_function(X, y, weights):
z = np.dot(X, weights)
h = sigmoid(z)
# Add small epsilon to prevent log(0)
epsilon = 1e-15
h = np.clip(h, epsilon, 1-epsilon)
cost = -1/len(y) * np.sum(y * np.log(h) + (1-y) * np.log(1-h))
return cost

The solution is to prevent the sigmoid output, h, from ever being exactly 0 or 1. By adding a tiny value called epsilon and using np.clip(), you can keep h within a safe range. This ensures the np.log() calculation in your cost_function never receives an invalid input, which would otherwise break the training process.

  • Keep an eye out for this error when your model becomes very confident, pushing its probability predictions to their absolute limits.

Correcting imbalanced classes in training data

Imbalanced data occurs when one class significantly outnumbers another. This can mislead your model into achieving high accuracy by simply predicting the majority class every time, ignoring the minority class entirely. It's a common problem that makes a model seem effective when it isn't.

The code below demonstrates this issue. It uses make_classification with the weights parameter set to [0.9, 0.1], creating a dataset where 90% of the samples belong to one class. Notice how the model behaves when trained on this skewed data.

X_imbalanced, y_imbalanced = make_classification(
n_samples=100, n_features=2, weights=[0.9, 0.1], random_state=42
)
model = LogisticRegression()
model.fit(X_imbalanced, y_imbalanced)
print(f"Predictions: {model.predict(X_imbalanced)[:10]}")

The model.predict() output shows the model has become biased, almost always predicting the majority class. This makes it unreliable for finding the minority cases. The code below demonstrates how to fix this during training.

X_imbalanced, y_imbalanced = make_classification(
n_samples=100, n_features=2, weights=[0.9, 0.1], random_state=42
)
# Use class_weight parameter to handle imbalance
model = LogisticRegression(class_weight='balanced')
model.fit(X_imbalanced, y_imbalanced)
print(f"Predictions: {model.predict(X_imbalanced)[:10]}")

The fix is to tell the model to pay more attention to the underrepresented class. By setting class_weight='balanced' in the LogisticRegression model, you automatically adjust its focus during training. This simple parameter change has a significant impact:

  • It penalizes mistakes on the minority class more heavily.
  • This forces the model to learn its features instead of just guessing the majority class.

This is a crucial technique when working with datasets where one outcome is much rarer than another.

Real-world applications

With the implementation and debugging techniques covered, you can now tackle practical business problems like fraud detection and predicting customer churn.

Detecting credit card fraud with LogisticRegression

In this scenario, a LogisticRegression model learns to spot fraudulent transactions by analyzing patterns in synthetic data, giving weight to features like transaction amount, time, and distance.

from sklearn.linear_model import LogisticRegression
import numpy as np

# Create synthetic transaction data (amount, time, distance)
np.random.seed(42)
X_legitimate = np.random.normal(loc=[100, 12, 5], scale=[20, 4, 2], size=(100, 3))
X_fraud = np.random.normal(loc=[200, 3, 50], scale=[100, 5, 20], size=(20, 3))
X = np.vstack([X_legitimate, X_fraud])
y = np.hstack([np.zeros(100), np.ones(20)])

# Train model and print feature importance
model = LogisticRegression()
model.fit(X, y)
print(f"Accuracy: {model.score(X, y):.2f}")
print(f"Feature importance: {np.abs(model.coef_[0])}")

This code first generates synthetic data to mimic a real-world problem. It uses np.random.normal to create two distinct classes of transactions—legitimate and fraudulent—each with different statistical properties. Notice the dataset is imbalanced, with far more legitimate samples than fraudulent ones.

  • The np.vstack and np.hstack functions combine these separate arrays into a single training set.
  • After training the model with fit(), the code inspects model.coef_. This attribute reveals the importance the model assigned to each feature when making its predictions.

Predicting customer churn with OneHotEncoder for categorical features

Here, you'll see how to predict customer churn by converting non-numeric data, like contract types, into a format the model can understand.

This example uses a pandas DataFrame to organize customer data, which includes both numerical features like usage and categorical features like contract type. Since logistic regression requires all inputs to be numeric, the text-based contract data must be transformed.

A scikit-learn Pipeline is used to streamline this entire workflow, from data preparation to prediction.

  • The first step in the pipeline is a ColumnTransformer, which applies OneHotEncoder to the contract column. This converts the text categories into a binary format the model can interpret.
  • The preprocessed data is then passed to the LogisticRegression classifier for training.
  • Finally, the trained model uses the predict_proba method to calculate the churn probability for a new customer, giving a specific likelihood instead of just a binary prediction.

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
import pandas as pd

# Create simple customer churn dataset
data = {
'usage': [100, 220, 150, 50, 300, 250],
'contract': ['monthly', 'yearly', 'monthly', 'monthly', 'yearly', 'monthly'],
'churned': [1, 0, 1, 1, 0, 0]
}
df = pd.DataFrame(data)

# Create pipeline for mixed data types
preprocessor = ColumnTransformer([
('categorical', OneHotEncoder(), ['contract']),
('numerical', 'passthrough', ['usage'])
])
model = Pipeline([
('preprocess', preprocessor),
('classifier', LogisticRegression())
])

# Train and predict
X = df.drop('churned', axis=1)
y = df['churned']
model.fit(X, y)
new_customer = pd.DataFrame({'usage': [75], 'contract': ['monthly']})
print(f"Churn probability: {model.predict_proba(new_customer)[0][1]:.2f}")

This code demonstrates a clean way to handle real-world data that isn't purely numerical. It builds a Pipeline that automatically prepares the data before training a LogisticRegression model. This is crucial when your dataset, like the customer churn example, mixes numbers (usage) with text (contract).

  • The pipeline uses a ColumnTransformer to target and convert only the text-based columns into a machine-readable format.
  • This ensures the model receives clean, consistent data for both training and future predictions, like calculating the churn probability for a new_customer.

Get started with Replit

Put your learning into practice. Describe your tool to Replit Agent: "Build a customer churn predictor using logistic regression" or "Create a simple fraud detection app based on transaction data."

Replit Agent will then write the code, test for errors, and deploy the app from your description. Start building with Replit.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.