0.571429
0.8
5
57.1
/100
42.86
%
0.571429
0.8
5
57.1
/100
42.86
%
The Intraclass Correlation Coefficient (ICC) Calculator measures the reliability of ratings or measurements made by multiple raters or instruments on the same set of subjects. Unlike Pearson's correlation (which measures association between two specific raters), the ICC assesses consistency among any number of raters and accounts for both the correlation and the agreement in absolute values. It is the standard measure of inter-rater reliability in medicine, psychology, and quality assessment.
Enter the between-subjects mean square (MSb), within-subjects mean square (MSw), and the number of raters from a one-way random effects ANOVA to compute both the single-measures ICC (reliability of a single rater) and the average-measures ICC (reliability of the mean of k raters). This calculator implements the ICC(1,1) and ICC(1,k) forms from the Shrout and Fleiss classification.
The ICC is derived from a one-way random effects ANOVA model. The single-measures ICC, denoted ICC(1,1), is:
$$ICC(1,1) = \frac{MS_B - MS_W}{MS_B + (k-1) \cdot MS_W}$$
The average-measures ICC, denoted ICC(1,k), applies the Spearman-Brown formula:
$$ICC(1,k) = \frac{MS_B - MS_W}{MS_B}$$
Where:
The ICC partitions the total variance into between-subject variance (σ²b) and within-subject (error) variance (σ²w):
$$\sigma_b^2 = \frac{MS_B - MS_W}{k}$$
$$\sigma_w^2 = MS_W$$
$$ICC = \frac{\sigma_b^2}{\sigma_b^2 + \sigma_w^2}$$
This shows the ICC as the proportion of total variance that is due to true differences between subjects. When raters agree perfectly (MSW → 0), ICC → 1. When between-subject differences are swamped by rater disagreement (MSB ≈ MSW), ICC → 0. Negative ICCs can occur when within-subject variability exceeds between-subject variability, indicating systematic disagreement among raters worse than chance.
The F statistic (MSB/MSW) tests the null hypothesis that ICC = 0. A significant F indicates meaningful inter-rater agreement exists.
ICC values are interpreted using the guidelines from Koo and Li (2016) and Cicchetti (1994):
| ICC Range | Interpretation |
|---|---|
| < 0.50 | Poor reliability |
| 0.50 - 0.74 | Moderate reliability |
| 0.75 - 0.89 | Good reliability |
| ≥ 0.90 | Excellent reliability |
The choice between single and average measures depends on how the measurement will be used. If a clinical scale will always be scored by one rater in practice, report the single-measures ICC. If the protocol requires averaging multiple raters, report the average-measures ICC.
Inputs
Results
Three radiologists rate 20 images. ANOVA yields MSb = 25, MSw = 5. ICC(1,1) = (25-5)/(25+2×5) = 20/35 = 0.571 (moderate for single rater). ICC(1,k) = (25-5)/25 = 0.80 (good for the average of 3 raters). F = 25/5 = 5.0, indicating significant inter-rater agreement.
Inputs
Results
Four clinicians assess pain in 15 patients. MSb = 48, MSw = 3. ICC(1,1) = (48-3)/(48+3×3) = 45/57 = 0.789 (good for a single clinician). ICC(1,k) = (48-3)/48 = 0.938 (excellent for 4-rater average). F = 16.0, highly significant agreement among clinicians.
The ICC is a reliability coefficient that quantifies the degree of agreement among multiple raters or measurements on the same set of subjects. Unlike Pearson's r (which only measures the linear association between two specific raters), the ICC can handle any number of raters and measures both consistency (relative ranking) and absolute agreement (same numerical values). It is calculated from an ANOVA framework, partitioning total variability into between-subject and within-subject components.
ICC(1,1) estimates the reliability of a single rater's measurement. It answers: how reliable is one randomly selected rater? ICC(1,k) estimates the reliability of the average of k raters. It answers: how reliable is the mean score across all raters? ICC(1,k) is always higher than ICC(1,1) because averaging multiple ratings reduces random error. They are related by the Spearman-Brown formula: ICC(1,k) = k × ICC(1,1) / (1 + (k-1) × ICC(1,1)).
Run a one-way random effects ANOVA where subjects are the groups and rater scores are the observations within each group. The ANOVA table will provide: MSBetween (the mean square for the subject factor, reflecting between-subject variability) and MSWithin (the residual mean square, reflecting rater disagreement). Most statistical software (SPSS: Analyze → Scale → Reliability; R: icc() from the irr package; SAS: PROC MIXED) can compute ICCs directly.
Yes. A negative ICC occurs when within-subject variance exceeds between-subject variance (MSw > MSb), meaning raters disagree more than expected by chance. This can happen when: (1) raters interpret the scale in opposite ways; (2) there is a systematic bias between raters; (3) the sample is very homogeneous (little true between-subject variation). A negative ICC is usually treated as zero in practice, and the rating process should be examined for problems.
Guidelines suggest a minimum of 30 subjects for stable ICC estimates, though more is better. For the number of raters, k ≥ 3 is recommended for the one-way model. The precision of the ICC estimate depends on both: with 30 subjects and 3 raters, the 95% confidence interval for an ICC of 0.70 spans roughly 0.50 to 0.85. With 50 subjects and 4 raters, it narrows to approximately 0.58 to 0.80. Confidence intervals for ICC should always be reported alongside the point estimate.
Shrout and Fleiss (1979) defined 6 forms based on the study design; McGraw and Wong (1996) expanded to 10. The key decisions are: (1) Model -- one-way random (raters vary randomly across subjects) vs. two-way random (same raters rate all subjects) vs. two-way mixed (raters are fixed); (2) Type -- single measures vs. average measures; (3) Definition -- consistency (relative agreement) vs. absolute agreement. This calculator implements ICC(1,1) and ICC(1,k) -- the one-way random model, appropriate when each subject is rated by a different random set of raters.
Roboculator Team
The Roboculator Team explains calculations, planning tools, and practical formulas in clear language for real-life situations.
How helpful was this calculator?
Be the first to rate!
Random Number Generator
Advanced & Specialized Statistical Tools
Central Limit Theorem Calculator
Advanced & Specialized Statistical Tools
Empirical Rule Calculator
Advanced & Specialized Statistical Tools
Chebyshev's Theorem Calculator
Advanced & Specialized Statistical Tools
Monte Carlo Estimation Calculator
Advanced & Specialized Statistical Tools
Power Analysis Calculator
Advanced & Specialized Statistical Tools