How to implement logistic regression in Python
Implement logistic regression in Python. This guide covers different methods, tips, real-world applications, and how to debug common errors.
.avif)
Logistic regression is a powerful statistical method for binary classification tasks. In Python, libraries like scikit-learn make its implementation straightforward, so you can predict outcomes from input features.
In this article, we'll guide you through the implementation steps. You'll learn practical techniques, see real-world applications, and get debugging tips to build effective models for your projects.
Using sklearn for basic logistic regression
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=100, n_features=2, random_state=42)
model = LogisticRegression()
model.fit(X, y)
print(f"Accuracy: {model.score(X, y):.2f}")--OUTPUT--Accuracy: 0.96
This example uses make_classification to quickly generate a synthetic dataset. It's a common practice for demonstrating algorithms, as it provides clean data—features (X) and labels (y)—without needing to find and load an external file.
The core logic then unfolds in a few key steps:
- First, an instance of the model is created with
LogisticRegression(). - The model is then trained on the data using the
model.fit(X, y)method. - Finally,
model.score(X, y)evaluates the model's accuracy on the same data it was trained on, giving a quick performance check.
Implementing from scratch
Now that you've seen the high-level implementation, building the model from scratch offers a look under the hood at the core components driving the classification.
Building a simple sigmoid function
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(z):
return 1 / (1 + np.exp(-z))
z = np.linspace(-10, 10, 100)
plt.plot(z, sigmoid(z))
plt.title("Sigmoid Function")--OUTPUT--<matplotlib.text.Text object at 0x7f8b1d5f9e80>
The sigmoid function is the engine of logistic regression. It’s an S-shaped curve that takes any real-valued number and transforms it into a value between 0 and 1. This is how the model converts its output into a probability. The code implements this transformation in the sigmoid(z) function.
- The
np.linspaceandplt.plotfunctions are used to visualize this curve. - You can see how extreme inputs get pushed toward 0 or 1, which is the key to making a binary classification.
Implementing cost function for logistic regression
def cost_function(X, y, weights):
z = np.dot(X, weights)
h = sigmoid(z)
cost = -1/len(y) * np.sum(y * np.log(h) + (1-y) * np.log(1-h))
return cost
X_with_bias = np.c_[np.ones(X.shape[0]), X]
initial_weights = np.zeros(X_with_bias.shape[1])
print(f"Initial cost: {cost_function(X_with_bias, y, initial_weights):.4f}")--OUTPUT--Initial cost: 0.6931
The cost function measures how wrong the model's predictions are. Your goal during training is to find weights that minimize this value. The code implements log-loss, which is a standard way to evaluate classification models.
- It penalizes the model more for predictions that are both confident and incorrect.
- A bias term is added to the features using
np.c_. This gives the model more flexibility, similar to a y-intercept in a linear equation. - The initial cost is calculated with weights at zero, giving you a baseline before optimization starts.
Training with gradient descent
def gradient_descent(X, y, weights, alpha, iterations):
cost_history = []
for i in range(iterations):
z = np.dot(X, weights)
h = sigmoid(z)
gradient = np.dot(X.T, (h - y)) / len(y)
weights -= alpha * gradient
cost_history.append(cost_function(X, y, weights))
return weights, cost_history
final_weights, costs = gradient_descent(X_with_bias, y, initial_weights, 0.1, 1000)
print(f"Final cost: {costs[-1]:.4f}")--OUTPUT--Final cost: 0.3512
Gradient descent is the optimization engine that trains the model. The gradient_descent function iteratively adjusts the model's weights to minimize the cost function, making predictions more accurate over time.
- The
gradientis calculated in each loop. It points in the direction of the steepest ascent of the cost function. - The weights are then updated in the opposite direction. The learning rate,
alpha, controls the size of each step. - This process repeats for 1000
iterations, gradually reducing the cost from its initial value and improving the model's fit.
Advanced techniques
Building on this foundation, you can now implement advanced techniques to create more robust models and evaluate their performance with greater precision.
Using regularization to prevent overfitting
def regularized_cost(X, y, weights, lambda_param):
m = len(y)
reg_term = (lambda_param / (2 * m)) * np.sum(weights[1:]**2) # Exclude bias
return cost_function(X, y, weights) + reg_term
lambda_param = 1.0
reg_cost = regularized_cost(X_with_bias, y, final_weights, lambda_param)
print(f"Regularized cost: {reg_cost:.4f}")--OUTPUT--Regularized cost: 0.3568
Regularization is a technique to prevent overfitting, which happens when a model learns the training data too well and performs poorly on new data. It discourages complexity by adding a penalty for large weights to the cost function, promoting a simpler and more general model.
- The
regularized_costfunction adds areg_termto the original cost. This term penalizes the model based on the size of its weights. - The strength of this penalty is controlled by
lambda_param. A higher value pushes the weights closer to zero, simplifying the model. - The calculation intentionally excludes the bias weight using
weights[1:], as it's common practice to only penalize the weights associated with the input features.
Implementing mini-batch gradient descent
def mini_batch_gradient_descent(X, y, weights, alpha, iterations, batch_size):
m = len(y)
cost_history = []
for i in range(iterations):
indices = np.random.permutation(m)
X_shuffled = X[indices]
y_shuffled = y[indices]
for j in range(0, m, batch_size):
X_batch = X_shuffled[j:j+batch_size]
y_batch = y_shuffled[j:j+batch_size]
z = np.dot(X_batch, weights)
h = sigmoid(z)
gradient = np.dot(X_batch.T, (h - y_batch)) / len(y_batch)
weights -= alpha * gradient
cost_history.append(cost_function(X, y, weights))
return weights, cost_history
batch_weights, batch_costs = mini_batch_gradient_descent(X_with_bias, y, np.zeros(X_with_bias.shape[1]), 0.1, 50, 20)
print(f"Final mini-batch cost: {batch_costs[-1]:.4f}")--OUTPUT--Final mini-batch cost: 0.3629
Mini-batch gradient descent offers a practical middle ground for optimization. Instead of using the entire dataset for each update, it processes smaller, random subsets—or mini-batches. This approach is more computationally efficient, especially with large datasets, and often leads to faster convergence.
- The data is first shuffled using
np.random.permutationto ensure each batch is random. - An inner loop then iterates through the shuffled data, creating chunks determined by the
batch_size. - Weights are updated after processing each mini-batch, providing more frequent updates than standard gradient descent.
Evaluating model performance with confusion matrix
from sklearn.metrics import confusion_matrix, classification_report
def predict(X, weights):
z = np.dot(X, weights)
return (sigmoid(z) >= 0.5).astype(int)
y_pred = predict(X_with_bias, final_weights)
conf_matrix = confusion_matrix(y, y_pred)
print("Confusion Matrix:")
print(conf_matrix)--OUTPUT--Confusion Matrix:
[[49 2]
[ 4 45]]
While accuracy provides a single score, a confusion matrix offers a deeper look at your model's performance. The predict function generates predictions by applying a 0.5 threshold to the sigmoid output. Then, scikit-learn's confusion_matrix function compares these predictions against the true labels.
- The resulting 2x2 grid details every correct and incorrect classification.
- It gives you the exact counts for true positives, true negatives, false positives, and false negatives, revealing precisely where the model is succeeding or making errors.
Move faster with Replit
Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly. Instead of just learning individual techniques, you can move straight to building complete applications.
This is where Agent 4 comes in. It takes an idea to a working product by handling the code, databases, APIs, and deployment directly from your description. You can build practical classification tools, such as:
- A spam filter that classifies incoming emails as legitimate or junk.
- A sentiment analysis tool to determine if customer feedback is positive or negative.
- A simple loan eligibility predictor that assesses an applicant's risk based on financial data.
Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.
Common errors and challenges
Implementing logistic regression from scratch can surface a few common challenges, but each has a straightforward solution you can master.
For instance, you might encounter numerical instability in the sigmoid function, where extreme input values cause overflow errors. This can cascade into NaN (Not a Number) values within your cost function, especially when the algorithm tries to calculate the logarithm of zero, effectively stopping your model from learning. Another frequent hurdle is handling imbalanced data, where one class vastly outnumbers the other, potentially tricking your model into being lazy and still looking accurate. The following sections will address how to resolve these key problems:
- Handling numerical instability in the
sigmoidfunction - Fixing
NaNerrors in the cost function - Correcting imbalanced classes in training data
Handling numerical instability in the sigmoid function
Numerical instability in the sigmoid function often stems from its core calculation, np.exp(-z). If z is a large negative number, -z becomes a large positive one, and the exponential function can overflow, returning an inf value that disrupts your model. See what happens in the code below.
def sigmoid(z):
return 1 / (1 + np.exp(-z))
large_negative = -1000
print(f"Sigmoid({large_negative}) = {sigmoid(large_negative)}")
With an input like -1000, the np.exp(-z) calculation overflows, making the denominator infinite. This forces the function to incorrectly return 0.0. The code below shows a more stable way to handle this calculation.
def sigmoid(z):
# Clip values to prevent overflow
z = np.clip(z, -500, 500)
return 1 / (1 + np.exp(-z))
large_negative = -1000
print(f"Sigmoid({large_negative}) = {sigmoid(large_negative)}")
The fix is surprisingly simple: you just need to constrain the input to the sigmoid function. This prevents the exponential calculation from blowing up.
- The
np.clip(z, -500, 500)function caps the input values, keeping them within a manageable range. - This stops
np.exp()from receiving extreme numbers that would otherwise cause an overflow error.
This is a crucial step when your dataset contains features with a very wide range of values.
Fixing NaN errors in the cost function
A NaN error in your cost_function often signals a math problem: you’re trying to take the logarithm of zero. This happens when the sigmoid function's output is exactly 0 or 1, making the cost calculation impossible. The code below demonstrates how this can occur.
def cost_function(X, y, weights):
z = np.dot(X, weights)
h = sigmoid(z)
cost = -1/len(y) * np.sum(y * np.log(h) + (1-y) * np.log(1-h))
return cost
When the sigmoid output h is exactly 0 or 1, the np.log() function receives an invalid input. This breaks the cost calculation and returns NaN. The code below shows how a small adjustment prevents this.
def cost_function(X, y, weights):
z = np.dot(X, weights)
h = sigmoid(z)
# Add small epsilon to prevent log(0)
epsilon = 1e-15
h = np.clip(h, epsilon, 1-epsilon)
cost = -1/len(y) * np.sum(y * np.log(h) + (1-y) * np.log(1-h))
return cost
The solution is to prevent the sigmoid output, h, from ever being exactly 0 or 1. By adding a tiny value called epsilon and using np.clip(), you can keep h within a safe range. This ensures the np.log() calculation in your cost_function never receives an invalid input, which would otherwise break the training process.
- Keep an eye out for this error when your model becomes very confident, pushing its probability predictions to their absolute limits.
Correcting imbalanced classes in training data
Imbalanced data occurs when one class significantly outnumbers another. This can mislead your model into achieving high accuracy by simply predicting the majority class every time, ignoring the minority class entirely. It's a common problem that makes a model seem effective when it isn't.
The code below demonstrates this issue. It uses make_classification with the weights parameter set to [0.9, 0.1], creating a dataset where 90% of the samples belong to one class. Notice how the model behaves when trained on this skewed data.
X_imbalanced, y_imbalanced = make_classification(
n_samples=100, n_features=2, weights=[0.9, 0.1], random_state=42
)
model = LogisticRegression()
model.fit(X_imbalanced, y_imbalanced)
print(f"Predictions: {model.predict(X_imbalanced)[:10]}")
The model.predict() output shows the model has become biased, almost always predicting the majority class. This makes it unreliable for finding the minority cases. The code below demonstrates how to fix this during training.
X_imbalanced, y_imbalanced = make_classification(
n_samples=100, n_features=2, weights=[0.9, 0.1], random_state=42
)
# Use class_weight parameter to handle imbalance
model = LogisticRegression(class_weight='balanced')
model.fit(X_imbalanced, y_imbalanced)
print(f"Predictions: {model.predict(X_imbalanced)[:10]}")
The fix is to tell the model to pay more attention to the underrepresented class. By setting class_weight='balanced' in the LogisticRegression model, you automatically adjust its focus during training. This simple parameter change has a significant impact:
- It penalizes mistakes on the minority class more heavily.
- This forces the model to learn its features instead of just guessing the majority class.
This is a crucial technique when working with datasets where one outcome is much rarer than another.
Real-world applications
With the implementation and debugging techniques covered, you can now tackle practical business problems like fraud detection and predicting customer churn.
Detecting credit card fraud with LogisticRegression
In this scenario, a LogisticRegression model learns to spot fraudulent transactions by analyzing patterns in synthetic data, giving weight to features like transaction amount, time, and distance.
from sklearn.linear_model import LogisticRegression
import numpy as np
# Create synthetic transaction data (amount, time, distance)
np.random.seed(42)
X_legitimate = np.random.normal(loc=[100, 12, 5], scale=[20, 4, 2], size=(100, 3))
X_fraud = np.random.normal(loc=[200, 3, 50], scale=[100, 5, 20], size=(20, 3))
X = np.vstack([X_legitimate, X_fraud])
y = np.hstack([np.zeros(100), np.ones(20)])
# Train model and print feature importance
model = LogisticRegression()
model.fit(X, y)
print(f"Accuracy: {model.score(X, y):.2f}")
print(f"Feature importance: {np.abs(model.coef_[0])}")
This code first generates synthetic data to mimic a real-world problem. It uses np.random.normal to create two distinct classes of transactions—legitimate and fraudulent—each with different statistical properties. Notice the dataset is imbalanced, with far more legitimate samples than fraudulent ones.
- The
np.vstackandnp.hstackfunctions combine these separate arrays into a single training set. - After training the model with
fit(), the code inspectsmodel.coef_. This attribute reveals the importance the model assigned to each feature when making its predictions.
Predicting customer churn with OneHotEncoder for categorical features
Here, you'll see how to predict customer churn by converting non-numeric data, like contract types, into a format the model can understand.
This example uses a pandas DataFrame to organize customer data, which includes both numerical features like usage and categorical features like contract type. Since logistic regression requires all inputs to be numeric, the text-based contract data must be transformed.
A scikit-learn Pipeline is used to streamline this entire workflow, from data preparation to prediction.
- The first step in the pipeline is a
ColumnTransformer, which appliesOneHotEncoderto thecontractcolumn. This converts the text categories into a binary format the model can interpret. - The preprocessed data is then passed to the
LogisticRegressionclassifier for training. - Finally, the trained
modeluses thepredict_probamethod to calculate the churn probability for a new customer, giving a specific likelihood instead of just a binary prediction.
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
import pandas as pd
# Create simple customer churn dataset
data = {
'usage': [100, 220, 150, 50, 300, 250],
'contract': ['monthly', 'yearly', 'monthly', 'monthly', 'yearly', 'monthly'],
'churned': [1, 0, 1, 1, 0, 0]
}
df = pd.DataFrame(data)
# Create pipeline for mixed data types
preprocessor = ColumnTransformer([
('categorical', OneHotEncoder(), ['contract']),
('numerical', 'passthrough', ['usage'])
])
model = Pipeline([
('preprocess', preprocessor),
('classifier', LogisticRegression())
])
# Train and predict
X = df.drop('churned', axis=1)
y = df['churned']
model.fit(X, y)
new_customer = pd.DataFrame({'usage': [75], 'contract': ['monthly']})
print(f"Churn probability: {model.predict_proba(new_customer)[0][1]:.2f}")
This code demonstrates a clean way to handle real-world data that isn't purely numerical. It builds a Pipeline that automatically prepares the data before training a LogisticRegression model. This is crucial when your dataset, like the customer churn example, mixes numbers (usage) with text (contract).
- The pipeline uses a
ColumnTransformerto target and convert only the text-based columns into a machine-readable format. - This ensures the model receives clean, consistent data for both training and future predictions, like calculating the churn probability for a
new_customer.
Get started with Replit
Put your learning into practice. Describe your tool to Replit Agent: "Build a customer churn predictor using logistic regression" or "Create a simple fraud detection app based on transaction data."
Replit Agent will then write the code, test for errors, and deploy the app from your description. Start building with Replit.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.


.avif)
.avif)