0.1
0.107326
32.198
sites
1
0.1
0.107326
32.198
sites
1
The Jukes-Cantor Distance calculator estimates the evolutionary distance between two nucleotide sequences. It corrects for multiple substitutions at the same site by applying the Jukes-Cantor model, which assumes all nucleotide substitutions occur at equal rates. This correction is essential because observed differences underestimate the true number of substitutions that have occurred over evolutionary time.
By providing the number of different sites and the total alignment length, you can obtain both the raw p-distance and the corrected Jukes-Cantor distance. This metric is widely used in phylogenetic analyses and molecular evolution studies to construct distance-based trees and estimate divergence.
First, the proportion of different sites (p-distance) is calculated:
p = Different Sites / Total Sites
The Jukes-Cantor correction then accounts for multiple substitutions:
d = -3/4 × ln(1 - 4/3 × p)
The correction factor grows as p increases, reflecting the higher probability that sites have changed more than once. Note that the formula becomes undefined when p approaches 0.75, as this implies saturation where sequences are no more similar than random.
Inputs
Results
With 30 out of 300 sites differing (p = 0.1), the Jukes-Cantor corrected distance is 0.1073, slightly higher than the raw p-distance due to correction for multiple hits.
Inputs
Results
At 30% observed differences, the corrected distance rises to 0.381, showing a substantial correction for hidden substitutions at higher divergence.
The p-distance is the simple observed proportion of sites that differ between two sequences. The Jukes-Cantor distance corrects for multiple substitutions at the same site, giving a more accurate estimate of the true evolutionary distance. At low divergence the two values are similar, but they diverge substantially at higher mutation rates.
The model becomes undefined when p approaches 0.75, because at that point sequences are no more similar than random nucleotide strings. Additionally, the model assumes equal substitution rates among all nucleotides, which may not hold for real sequences where transitions and transversions occur at different rates.
Without correction, the observed number of differences underestimates the true evolutionary distance because some sites have mutated multiple times. This underestimation biases branch lengths in phylogenetic trees and can lead to inaccurate topology. The correction provides a more linear relationship between distance and time.
Roboculator Team
The Roboculator Team explains calculations, planning tools, and practical formulas in clear language for real-life situations.
How helpful was this calculator?
Be the first to rate!