Untitled


- Suppose 100 items are rated by 3 annotators on a 1–5 scale.
- In 95 items, every annotator gives a 5 (perfect agreement, zero disagreement).
- In the remaining 5 items, two annotators give a 5 and one annotator gives a 4. For each of these 5 items, the average pairwise disagreement is 0.667 (since the pairs are (5,5)=0, (5,4)=1, and (5,4)=1, and (0+1+1)/3 ≈ 0.667).

The overall observed disagreement (Do) is computed as:
    Do = (95 items * 0 + 5 items * 0.667) / 100 ≈ 0.03335
(Alternatively, if using another weighting, this value can differ slightly, e.g. 0.63 in another computation context, but the key is the low observed disagreement.)

However, because the distribution is highly skewed—with most ratings being 5—the probability of drawing a 5 (P(5)) is very high:
    P(5) = (95×3 + 5×2) / (100×3) = 295/300 ≈ 0.9833
and correspondingly P(4) is very low:
    P(4) = (5×1) / (100×3) = 5/300 ≈ 0.01667

When computing the expected disagreement (De), only the pairs of differing ratings contribute:
    De = Permutations({4,5}) × P(5) × P(4) × Distance(4,5)
Since there are 2 ordered pairs ((4,5) and (5,4)) and the difference between 4 and 5 is 1:
    De = 2 × 0.9833 × 0.01667 × 1 ≈ 0.0328

Krippendorff’s alpha is then:
    α = 1 – (Do/De) ≈ 1 – (0.03335/0.0328) ≈ -0.016

This example shows that even when the raw agreement is very high (as nearly all ratings are 5), the imbalanced (skewed) distribution drives the expected disagreement to be very low. Thus, even a small observed disagreement appears large relative to the chance disagreement, leading to a low—or even slightly negative—alpha.
Editor is loading...