0.55170273
1
0.734694
0.857143
0
5
1
0.55170273
1
0.734694
0.857143
0
5
1
The Hypergeometric Distribution Calculator computes the probability mass function (PMF), mean, variance, and standard deviation for the hypergeometric distribution. This distribution models the number of successes in a sample drawn without replacement from a finite population, making it fundamentally different from the binomial distribution, which assumes sampling with replacement (or from an infinite population).
The hypergeometric distribution arises whenever you draw a sample from a finite collection containing two types of items (successes and failures) without putting items back. Classic examples include drawing cards from a deck (how many aces in a 5-card hand?), quality control inspection (how many defectives in a sample from a finite lot?), ecological capture-recapture studies (how many tagged fish in a recaptured sample?), and committee selection (how many women on a randomly chosen committee from a mixed pool?).
The distribution is characterized by three parameters: N (total population size), K (number of success states in the population), and n (number of draws). The random variable k counts the number of observed successes in the draw. The key difference from the binomial is that each draw changes the composition of the remaining population, so successive draws are not independent. As a result, the variance of the hypergeometric is smaller than the corresponding binomial variance by a factor of (N-n)/(N-1), called the finite population correction.
This calculator uses Stirling's approximation for the log-factorial computation, which allows it to handle population sizes up to 1000 while maintaining good accuracy. For small populations, the approximation is less precise but still provides useful estimates. The exact PMF involves a ratio of three binomial coefficients: C(K,k) * C(N-K, n-k) / C(N, n), which can involve astronomically large numbers that Stirling's formula handles gracefully in log-space.
Understanding the hypergeometric distribution is essential in quality assurance (acceptance sampling plans), genetics (Fisher's exact test for independence), ecology (population estimation via capture-recapture), card game probability, lotteries, and any scenario involving finite-population sampling without replacement.
In industrial quality control, acceptance sampling plans use the hypergeometric distribution to determine whether a batch of products meets quality standards based on a sample inspection. Military Standard 105E and its civilian equivalent ANSI/ASQ Z1.4 are built on hypergeometric calculations. In genomics, the hypergeometric test is widely used for gene set enrichment analysis, determining whether a set of genes of interest is overrepresented in a particular biological pathway. In lottery mathematics, the hypergeometric distribution computes the exact probability of matching k numbers from a drawn set, which is fundamental to prize structure design and expected value calculations.
The hypergeometric PMF gives the probability of observing exactly k successes when drawing n items without replacement from a population of N items containing K successes:
$$P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}$$
This formula counts: (ways to choose k successes from K) × (ways to choose n-k failures from N-K) / (total ways to choose n from N). To avoid numerical overflow, we compute in log-space using Stirling's approximation:
$$\ln(m!) \approx m \ln(m) - m + \frac{1}{2}\ln(2\pi m)$$
The key statistics are:
$$E[X] = \frac{nK}{N}, \quad \text{Var}(X) = n \cdot \frac{K}{N} \cdot \frac{N-K}{N} \cdot \frac{N-n}{N-1}$$
The last factor (N-n)/(N-1) is the finite population correction, which reduces the variance compared to the binomial.
The PMF P(X = k) is the exact probability of drawing exactly k success items from the population. The mean nK/N is the expected number of successes, which equals what you would expect proportionally. The variance is less than the binomial variance np(1-p) due to the finite population correction factor. Note: the calculator returns PMF = 0 for impossible combinations (e.g., k > K, k > n, or n-k > N-K). Stirling's approximation is highly accurate for large values but may have small errors for very small factorials (0!, 1!, 2!).
Inputs
Results
The probability of getting exactly 2 aces in a 5-card poker hand is about 3.99%. On average, a 5-card hand contains 0.385 aces. The exact probability is C(4,2)*C(48,3)/C(52,5).
Inputs
Results
From a lot of 100 items with 8 defective, drawing 10 and finding exactly 1 defective has a probability of about 41.5%. The expected number of defectives in the sample is 0.8.
Use the hypergeometric when sampling without replacement from a finite population. The binomial assumes either replacement or an infinite population. As a rule of thumb, if the sample size n is less than 5-10% of the population N, the binomial is a good approximation. When n/N > 0.05, the hypergeometric is more accurate because the finite population correction becomes significant.
The factor (N-n)/(N-1) reduces the hypergeometric variance compared to the binomial. When the sample is a large fraction of the population, each draw significantly changes the remaining composition, reducing variability. When N is much larger than n, this factor approaches 1 and the hypergeometric converges to the binomial.
Fisher's exact test uses the hypergeometric distribution to test for association in 2×2 contingency tables. It computes the exact probability of observing the given table (or more extreme) under the null hypothesis of no association. Unlike chi-squared tests, it is valid for small sample sizes and is the gold standard for testing independence in small samples.
Stirling's approximation ln(m!) ≈ m·ln(m) - m + 0.5·ln(2πm) is very accurate for large m (relative error < 1/(12m)). For m = 10, the error is about 0.8%. For m = 50, it is about 0.17%. For m = 0, 1, 2, the approximation is less precise but the calculator handles the m = 0 case (ln(0!) = 0) explicitly.
The parameters must satisfy: 0 ≤ K ≤ N (success states cannot exceed population), 1 ≤ n ≤ N (cannot draw more than population), and max(0, n+K-N) ≤ k ≤ min(n, K) (observed successes must be feasible). The calculator returns PMF = 0 for infeasible combinations.
In capture-recapture studies, N is the unknown total population, K is the number of tagged animals from the first capture, n is the second capture sample size, and k is the number of tagged animals recaptured. The hypergeometric distribution models k, and the maximum likelihood estimate of N is approximately nK/k. This is the Lincoln-Petersen method for population estimation.
Roboculator Team
The Roboculator Team explains calculations, planning tools, and practical formulas in clear language for real-life situations.
How helpful was this calculator?
Be the first to rate!
Normal Distribution Calculator
Probability Distributions
Standard Normal Distribution Calculator
Probability Distributions
Poisson Distribution Calculator
Probability Distributions
Exponential Distribution Calculator
Probability Distributions
Uniform Distribution Calculator
Probability Distributions
Geometric Distribution Calculator
Probability Distributions