How to calculate a p-value in Python

Learn how to calculate p-value in Python. This guide covers different methods, tips, real-world applications, and common error debugging.

How to calculate a p-value in Python
Published on: 
Mon
Apr 6, 2026
Updated on: 
Wed
Apr 8, 2026
The Replit Team

The p-value calculation in Python is a core skill for data science. It helps you determine the statistical significance of your findings and make data-driven decisions with confidence.

In this article, we'll explore several techniques to calculate p-values. We'll also cover practical tips, real-world applications, and common debugging advice to help you master this essential statistical concept for your projects.

Using scipy.stats for a one-sample t-test

import scipy.stats as stats
import numpy as np

sample = np.array([5.2, 4.8, 6.1, 5.5, 5.9, 5.3, 5.7])
t_stat, p_value = stats.ttest_1samp(sample, 5.0)
print(f"P-value: {p_value:.6f}")--OUTPUT--P-value: 0.026347

The SciPy library offers a straightforward way to perform statistical tests. The function scipy.stats.ttest_1samp runs a one sample t-test, which checks if your sample's average is meaningfully different from a known value. Here, it compares the mean of our sample data against a hypothesized population mean of 5.0.

This function conveniently returns both the t-statistic and the p-value. The resulting p-value of 0.026347 suggests there's only a 2.6% chance of observing this data if the true mean were actually 5.0. Since the value is low, you can infer the difference is statistically significant.

Basic statistical methods

Beyond comparing a single sample to a mean, the scipy.stats library also provides powerful tools for comparing two samples, measuring correlations, and testing categorical data.

Using scipy.stats for two-sample t-test

import scipy.stats as stats
import numpy as np

group1 = np.array([5.2, 4.8, 6.1, 5.5, 5.9])
group2 = np.array([4.5, 4.3, 4.9, 4.7, 4.6])
t_stat, p_value = stats.ttest_ind(group1, group2)
print(f"P-value: {p_value:.6f}")--OUTPUT--P-value: 0.002377

When you need to compare the means of two different groups, the two-sample t-test is the right tool. The scipy.stats.ttest_ind function handles this by taking two independent samples—in this case, group1 and group2—and determining if their means are significantly different.

  • The function returns both a t-statistic and a p-value.
  • The resulting p-value of 0.002377 is very low, which strongly suggests the difference between the two groups isn't due to random chance.

Using scipy.stats for correlation test

import scipy.stats as stats
import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 5, 4, 6])
corr, p_value = stats.pearsonr(x, y)
print(f"Correlation: {corr:.4f}, P-value: {p_value:.6f}")--OUTPUT--Correlation: 0.8207, P-value: 0.088579

To see if two variables are related, you can run a correlation test. The scipy.stats.pearsonr function is perfect for this, as it calculates the strength and direction of a linear relationship between two datasets, like x and y.

  • The function returns a correlation coefficient and a p-value.
  • Here, the correlation is 0.8207, suggesting a strong positive relationship. The p-value of 0.088579, however, indicates this result isn't statistically significant and could be due to random chance.

Using scipy.stats for chi-square test

import scipy.stats as stats
import numpy as np

observed = np.array([[10, 15], [20, 25]])
chi2, p_value, dof, expected = stats.chi2_contingency(observed)
print(f"Chi-square statistic: {chi2:.4f}, P-value: {p_value:.6f}")--OUTPUT--Chi-square statistic: 0.1389, P-value: 0.709422

When working with categorical data, you can use a chi-square test to see if two variables are related. The scipy.stats.chi2_contingency function analyzes a contingency table of your observed data to determine if any apparent association is statistically significant.

  • The function returns multiple values, including the chi-square statistic and the p-value.
  • The high p-value of 0.709422 indicates that there's no significant relationship between the variables in your dataset.

Advanced statistical methods

While scipy.stats is powerful, you'll sometimes need more specialized methods for regression, non-parametric data, or even want to build the calculations from scratch.

Implementing p-value calculation from scratch

import numpy as np
from scipy import stats

def manual_p_value(t_stat, df):
# Two-tailed p-value
p_value = 2 * (1 - stats.t.cdf(abs(t_stat), df))
return p_value

t_statistic = 2.5
degrees_of_freedom = 10
print(f"P-value: {manual_p_value(t_statistic, degrees_of_freedom):.6f}")--OUTPUT--P-value: 0.031236

For a deeper understanding, you can calculate the p-value manually. This approach gives you granular control over the statistical logic. The function manual_p_value uses the t-distribution's Cumulative Distribution Function (CDF) from SciPy to find the probability.

  • 1 - stats.t.cdf(...) calculates the probability of observing a result as extreme as your t-statistic in one tail of the distribution.
  • Multiplying by 2 converts this into a two-tailed p-value, which accounts for extreme outcomes in both positive and negative directions.

Using statsmodels for regression p-values

import numpy as np
import statsmodels.api as sm

X = np.random.rand(20, 2)
X = sm.add_constant(X) # Add intercept
y = 2 + 3 * X[:, 1] + 1.5 * X[:, 2] + np.random.normal(0, 1, 20)
model = sm.OLS(y, X).fit()
print(model.summary2().tables[1][['P>|t|']].head())--OUTPUT--P>|t|
const 0.043246
x1 0.000218
x2 0.005731

For more advanced analysis like regression, the statsmodels library is your go-to tool. The code fits an Ordinary Least Squares (OLS) model using sm.OLS(y, X).fit(), which is a standard way to check how well your predictor variables explain an outcome. The sm.add_constant(X) function is included to ensure the model has an intercept.

  • The model.summary2() method generates a comprehensive report, and the code extracts the p-values from the P>|t| column.
  • These p-values indicate whether each predictor variable has a statistically significant impact on the outcome.

Using permutation tests for non-parametric p-values

import numpy as np

def permutation_p_value(group1, group2, n_perm=1000):
diff_obs = np.mean(group1) - np.mean(group2)
combined = np.concatenate([group1, group2])
count = 0

for _ in range(n_perm):
np.random.shuffle(combined)
diff_perm = np.mean(combined[:len(group1)]) - np.mean(combined[len(group1):])
if abs(diff_perm) >= abs(diff_obs):
count += 1

return count / n_perm

group1 = np.array([5.2, 4.8, 6.1, 5.5, 5.9])
group2 = np.array([4.5, 4.3, 4.9, 4.7, 4.6])
print(f"P-value: {permutation_p_value(group1, group2):.6f}")--OUTPUT--P-value: 0.003000

Permutation tests offer a powerful way to find p-values without assuming your data fits a specific distribution, which is ideal for non-parametric data. The permutation_p_value function simulates what would happen if the group labels were meaningless.

  • First, it calculates the observed difference between the means of group1 and group2.
  • It then repeatedly shuffles the combined data, splits it into new random groups, and calculates a new difference.
  • The final p-value is the proportion of times a random difference was more extreme than the one you actually observed.

Move faster with Replit

Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly. Instead of just learning individual techniques, you can use Agent 4 to build complete applications from a simple description.

Rather than manually piecing together functions like ttest_ind or chi2_contingency, you can describe the tool you want to build and let the Agent handle the implementation:

  • An A/B testing dashboard that automatically runs a two-sample t-test to determine if a new website design significantly impacts conversion rates.
  • A market analysis tool that uses pearsonr to calculate the correlation and p-value between ad spend and sales revenue.
  • A customer feedback processor that applies a chi-square test to see if there's a statistically significant relationship between user region and feature requests.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

Calculating p-values in Python is straightforward, but a few common mistakes can easily lead to incorrect conclusions.

One of the most frequent issues is dealing with missing data, represented as NaN values. If your dataset contains even one NaN, most statistical functions in SciPy will return NaN for the p-value and halt your analysis. You'll need to handle these missing values before running your test, for example by filtering them out to ensure the function only receives clean, numerical data.

Another common slip-up involves the direction of your test. By default, functions like scipy.stats.ttest_1samp perform a two-tailed test, which checks for any difference from the mean in either direction. If your hypothesis is more specific, like testing whether a sample mean is greater than a certain value, you need a one-sided test. Forgetting to set the alternative parameter to 'greater' or 'less' means you're not actually testing your specific hypothesis and might overlook a significant finding.

When using scipy.stats.pearsonr, it's easy to focus only on the p-value and forget about the correlation coefficient itself. The p-value tells you if the relationship is statistically significant, but it says nothing about its direction. A low p-value simply means the observed correlation is unlikely to be due to random chance. You must also check the sign of the correlation coefficient, the first value returned by the function, to understand if the relationship is positive or negative.

Handling NaN values in statistical tests

Missing data, represented as NaN (Not a Number), can silently break your statistical tests. Most functions aren't built to handle them and will return NaN instead of a p-value, effectively stopping your analysis. See what happens in the following code.

import scipy.stats as stats
import numpy as np

data = np.array([5.2, 4.8, 6.1, np.nan, 5.9])
t_stat, p_value = stats.ttest_1samp(data, 5.0)
print(f"P-value: {p_value}")

The np.nan value in the array makes it impossible for ttest_1samp to compute a p-value, resulting in nan. You must handle these values before the test. See how to adjust the code to get a valid result.

import scipy.stats as stats
import numpy as np

data = np.array([5.2, 4.8, 6.1, np.nan, 5.9])
clean_data = data[~np.isnan(data)]
t_stat, p_value = stats.ttest_1samp(clean_data, 5.0)
print(f"P-value: {p_value}")

The solution is to filter out NaN values before the test. The line clean_data = data[~np.isnan(data)] creates a new array containing only the valid numbers. It works by using np.isnan() to identify missing values and the ~ operator to select everything that isn't NaN.

With the cleaned data, ttest_1samp can now compute the p-value correctly. This is a vital data cleaning step you'll often need before any statistical analysis, especially with real-world datasets which are rarely perfect.

Forgetting to specify the correct tail for one-sided tests with ttest_1samp

When your hypothesis is directional—like testing if a sample mean is strictly greater than a value—using the default two-tailed p-value is a mistake. The ttest_1samp function defaults to this, potentially masking a significant result. The following code illustrates this common error.

import scipy.stats as stats
import numpy as np

sample = np.array([5.2, 4.8, 6.1, 5.5, 5.9, 5.3, 5.7])
t_stat, p_value = stats.ttest_1samp(sample, 5.0)
# Incorrect: using two-sided p-value for one-sided hypothesis
print(f"Sample greater than 5.0? P-value: {p_value:.6f}")

The code calculates the p-value for a difference in either direction, not just for the specific "greater than" hypothesis. This doesn't correctly test your question. See how a simple change to the function provides the right result.

import scipy.stats as stats
import numpy as np

sample = np.array([5.2, 4.8, 6.1, 5.5, 5.9, 5.3, 5.7])
t_stat, p_value = stats.ttest_1samp(sample, 5.0, alternative='greater')
print(f"Sample greater than 5.0? P-value: {p_value:.6f}")

The fix is to specify your hypothesis direction using the alternative parameter. By setting alternative='greater', you instruct ttest_1samp to run a one-sided test. This correctly calculates the p-value for the specific question of whether the sample mean is larger than the target value. Always use this parameter when your hypothesis isn't just about a difference, but the direction of that difference—like testing for improvement or decline.

Misinterpreting correlation coefficient sign with pearsonr

A low p-value from pearsonr signals a significant relationship, but it doesn't tell the whole story. Focusing only on the p-value and the absolute correlation value can cause you to misread the direction of the trend entirely.

The following code demonstrates how using abs(corr) can mask the true nature of the relationship, leading to a misleading conclusion.

import scipy.stats as stats
import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 4, 3, 2, 1])
corr, p_value = stats.pearsonr(x, y)
print(f"Strong correlation found: {abs(corr):.4f}, p={p_value:.6f}")

The code's use of abs(corr) correctly identifies a strong relationship but masks its negative direction. This is misleading since it doesn't show that as one variable increases, the other decreases. See how to report the full picture.

import scipy.stats as stats
import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 4, 3, 2, 1])
corr, p_value = stats.pearsonr(x, y)
print(f"Strong negative correlation: {corr:.4f}, p={p_value:.6f}")

The solution is to print the correlation coefficient directly, without using abs(). This correctly reports the strong negative correlation of -1.0000, showing that as one variable increases, the other decreases. Always check the sign of the coefficient from pearsonr, not just its magnitude. This ensures you understand the true direction of the relationship in your data, which is crucial for drawing accurate conclusions from your analysis.

Real-world applications

Now that you can navigate the code and its pitfalls, you can apply these skills to real-world scenarios like A/B testing and anomaly detection.

Analyzing A/B test results with stats.ttest_ind

With stats.ttest_ind, you can analyze A/B test results to determine if the performance difference between two versions is statistically significant or simply due to random chance.

import scipy.stats as stats
import numpy as np

# Website A/B test conversion rates (1=converted, 0=didn't convert)
version_a = np.array([0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0])
version_b = np.array([1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1])

# Test if Version B performs significantly better
t_stat, p_value = stats.ttest_ind(version_a, version_b)
print(f"P-value: {p_value:.6f}")

This code compares two sets of A/B test results, version_a and version_b, where 1 represents a conversion. The stats.ttest_ind function performs an independent t-test on these two samples to see if their average outcomes are meaningfully different.

  • The test effectively compares the conversion rates of both versions.
  • The resulting p-value helps you decide if one version's performance is truly better or if the observed difference is likely just random variation.

Detecting anomalies in sensor data using stats.zscore

The stats.zscore function helps you find outliers by calculating how many standard deviations each data point is from the mean, making it easy to flag unusual readings.

import scipy.stats as stats
import numpy as np

# Temperature readings from a sensor (Celsius)
current_temps = np.array([22.3, 22.5, 22.2, 25.6, 26.1, 22.1, 22.4, 22.3])

# Detect anomalies using z-scores (values > 2 std dev from mean)
z_scores = stats.zscore(current_temps)
anomalies = np.where(np.abs(z_scores) > 2)
print(f"Anomaly indices: {anomalies[0]}, values: {current_temps[anomalies]}")

This code puts the z-score concept into practice to find anomalies in temperature data. After stats.zscore converts each reading into a standardized score, the script identifies outliers.

  • The key step is using np.where to find the index of any value whose absolute z-score exceeds 2.
  • This threshold is a common way to flag data points that are significantly different from the rest of the group. The output then shows you the exact location and value of these anomalies.

Get started with Replit

Turn your knowledge into a real tool. Just tell Replit Agent what you need: "an A/B test significance calculator" or "a tool to find anomalies in sensor data from a CSV".

Replit Agent writes the code, debugs errors, and deploys your application directly from your browser. Start building with Replit.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.