P
P-value
Definition
The p-value is the probability of obtaining test results at least as extreme as the observed results, assuming the null hypothesis is true. A small p-value (typically ≤ 0.05) suggests strong evidence against the null hypothesis, leading researchers to reject it. It does not indicate the probability that the null hypothesis is true, nor the size of an effect.
Formula
P = P(T \geq t_{obs} \mid H_0) \quad \text{(one-tailed right)} P = 2 \times P(T \geq |t_{obs}| \mid H_0) \quad \text{(two-tailed)}
In-Depth Explanation
The p-value is a fundamental concept in statistical hypothesis testing that measures how compatible the observed data are with a null hypothesis (H₀). Formally, it represents the probability of obtaining a test statistic at least as extreme as the one observed, given that the null hypothesis is actually true. In equation form, for a right-tailed test:
P-value = P(T ≥ t_obs | H₀ is true)
For a two-tailed test (most common), it becomes:
P-value = 2 × P(T ≥ |t_obs| | H₀ is true)
where T is the test statistic distribution under the null.
Interpretation guidelines:
- p ≤ 0.05: Statistically significant at the 5% level → reject H₀ (common threshold in most scientific fields)
- p ≤ 0.01: Significant at the 1% level → stronger evidence against H₀
- p ≤ 0.001: Very strong evidence (often used in high-stakes research)
- p > 0.05: Not statistically significant → fail to reject H₀ (but this does NOT prove H₀ is true)
Key points to understand correctly:
1. The p-value is NOT the probability that the null hypothesis is true. It only tells us about the data under the assumption that H₀ is true.
2. A small p-value does NOT mean the effect is large or practically important — it only indicates incompatibility with H₀.
3. A large p-value does NOT prove the null hypothesis — it simply means the data do not provide strong evidence against it.
4. P-values depend heavily on sample size: very large samples can produce tiny p-values even for trivial effects, while small samples may fail to detect real effects (low statistical power).
One-tailed vs two-tailed tests:
- One-tailed: Used when the research hypothesis is directional (e.g., "new drug is better than placebo" → only test greater than).
- Two-tailed: Tests for any difference (more conservative, most common in exploratory research).
Common distributions used to calculate p-values:
- Normal (z-test) for large samples
- t-distribution (t-test) for smaller samples or unknown population variance
- Chi-square for goodness-of-fit and independence tests
- F-distribution for ANOVA and variance comparisons
Practical example:
Suppose you run a clinical trial testing whether a new drug reduces blood pressure more than placebo. You get a t-statistic of 2.45 with p = 0.017 (two-tailed). This means: if there were truly no difference (H₀ true), the chance of seeing a t-value this extreme (or more) purely by chance is only 1.7%. Since 0.017 < 0.05, you reject H₀ and conclude there is statistically significant evidence that the drug has an effect.
Always report the exact p-value (e.g., p = 0.023) rather than just "p < 0.05" when space allows — it gives more information to readers and reviewers.
Misuse warning: Over-reliance on p < 0.05 has been heavily criticized (the "p-value crisis" in science). Modern best practices recommend reporting effect sizes (Cohen’s d, odds ratios), confidence intervals, and practical significance alongside p-values.
Use our free P-Value Calculator to compute exact p-values from z-scores, t-statistics, chi-square values, or F-statistics instantly.