100
85
%
50
%
0.7
35
%
15
%
47.5
%
50
%
45
%
5
%
100
85
%
50
%
0.7
35
%
15
%
47.5
%
50
%
45
%
5
%
Cohen's Kappa (κ) is the most widely used statistic for measuring inter-rater agreement between two raters who classify items into mutually exclusive categories. Introduced by Jacob Cohen in 1960, it improves upon simple percent agreement by accounting for the probability of agreement occurring by chance alone. This makes it a more robust and trustworthy measure of true agreement.
This calculator accepts a standard 2×2 contingency table — the four cells representing all possible combinations of two raters' binary decisions — and computes Kappa along with its standard error and qualitative interpretation.
Cohen's Kappa is derived from a 2×2 contingency table structured as follows:
| Rater 2: Yes | Rater 2: No | |
|---|---|---|
| Rater 1: Yes | a | b |
| Rater 1: No | c | d |
The total number of observations is:
$$n = a + b + c + d$$
The observed proportion of agreement (Pₒ) counts the cases where both raters agree:
$$P_o = \frac{a + d}{n}$$
The expected proportion of agreement by chance (Pₑ) uses marginal probabilities:
$$P_e = \frac{(a+b)(a+c)}{n^2} + \frac{(c+d)(b+d)}{n^2}$$
Cohen's Kappa is then:
$$\kappa = \frac{P_o - P_e}{1 - P_e}$$
The standard error, useful for confidence intervals and hypothesis testing, is approximated as:
$$SE = \sqrt{\frac{P_o(1 - P_o)}{n(1 - P_e)^2}}$$
A 95% confidence interval can be constructed as κ ± 1.96 × SE. If the interval does not include zero, the agreement is statistically significant beyond chance.
Cohen's Kappa ranges from −1 to +1. The interpretation follows the widely cited Landis-Koch benchmark scale:
It is important to note that kappa can be affected by prevalence and bias. When the prevalence of one category is very high (most items are 'yes' or most are 'no'), kappa may appear lower than expected even with high percent agreement. Always examine both Pₒ and κ together for a complete picture.
Inputs
Results
Two radiologists screen 100 chest X-rays for abnormalities. With 85% observed agreement against 50% expected by chance, κ = 0.70, indicating substantial agreement.
Inputs
Results
Two reviewers classify 100 documents as relevant or not. High agreement in the 'yes' category but lower in 'no' yields κ ≈ 0.69, still substantial agreement.
Simple percent agreement does not account for the possibility that some agreement occurs purely by chance. For example, if two raters each say 'yes' 90% of the time, they would agree about 82% of the time even by random assignment. Cohen's Kappa subtracts out this chance agreement, providing a more meaningful measure of true concordance.
Yes. While this calculator uses a 2×2 table for binary classifications, the general form of Cohen's Kappa extends to any number of categories using a k×k contingency table. For ordinal categories, weighted Kappa (linear or quadratic weights) is often preferred as it accounts for the degree of disagreement.
A negative kappa indicates that the two raters agree less than would be expected by chance. This usually signals a systematic pattern of disagreement — for instance, when rater 1 tends to say 'yes' exactly when rater 2 says 'no'. Negative values warrant investigation of the rating criteria and rater training.
Cohen's Kappa is designed for exactly two raters evaluating the same set of subjects. Fleiss' Kappa generalizes the concept to multiple raters (three or more) and allows different raters for different subjects. If you have more than two raters, use Fleiss' Kappa instead.
Weighted Kappa is used when categories are ordinal (e.g., 'mild', 'moderate', 'severe'). It assigns partial credit for near-agreements rather than treating all disagreements equally. Linear weights penalize disagreements proportionally to their distance, while quadratic weights penalize more heavily as distance increases.
A minimum of 50 subjects is commonly recommended, though 100+ provides more stable estimates. The required sample size depends on the expected kappa, the desired precision (confidence interval width), and the prevalence of each category. Power analysis for kappa is available in specialized software.
Roboculator Team
The Roboculator Team explains calculations, planning tools, and practical formulas in clear language for real-life situations.
How helpful was this calculator?
Be the first to rate!
Random Number Generator
Advanced & Specialized Statistical Tools
Central Limit Theorem Calculator
Advanced & Specialized Statistical Tools
Empirical Rule Calculator
Advanced & Specialized Statistical Tools
Chebyshev's Theorem Calculator
Advanced & Specialized Statistical Tools
Monte Carlo Estimation Calculator
Advanced & Specialized Statistical Tools
Power Analysis Calculator
Advanced & Specialized Statistical Tools