First page Back Continue Last page Graphics
Dialogue Evaluation
Reliability of subjective judgments
- "Assessing agreement on classification tasks: the kappa statistic". J. Carletta. Computational Linguistics. 1996
- "K coefficient measures agreement among a set of coders making category judgments, correcting for expected chance agreement."
- K = (P(A) - P(E)) / (1 - P(E)), where P(A) stands for the proportion of times the coders agree and P(E) is the proportion of times that we would expect them to agree by chance.