# Inter-Rater Reliability Measures in R

## Inter-Rater Reliability Analyses: Quick R Codes

This chapter provides a quick start R code to compute the different statistical measures for analyzing the inter-rater reliability or agreement. These include:

• Cohen’s Kappa: It can be used for either two nominal or two ordinal variables. It accounts for strict agreements between observers. It is most appropriate for two nominal variables.
• Weighted Kappa: It should be considered for two ordinal variables only. It allows partial agreement.
• Light’s Kappa, which is the average of Cohen’s Kappa if using more than two categorical variables.
• Fleiss Kappa: for two or more categorical variables (nominal or ordinal)
• Intraclass correlation coefficient (ICC) for continuous or ordinal data

Contents:

#### Related Book

Inter-Rater Reliability Essentials: Practical Guide in R

## R packages

There are many R packages and functions for inter-rater agreement analyses, including:

Measures R function [package]
Cohen’s kappa Kappa() [vcd], kappa2() [irr]
Weighted kappa Kappa() [vcd], kappa2() [irr]
Light’s kappa kappam.light() [irr]
Fleiss Kappa kappam.fleiss() [irr]
ICC icc() [irr], ICC() [psych]

## Prerequisites

In the next sections, we’ll use only the functions from the irr package. Make sure you have installed it.

# install.packages("irrr")
library(irr)

## Examples data

• psychiatric diagnoses data provided by 6 raters [irr package]. A total of 30 patients were enrolled and classified by each of the raters into 5 nominal categories (Fleiss and others 1971): 1. Depression, 2. Personality Disorder, 3. Schizophrenia, 4. Neurosis, 5. Other.
• anxiety data [irr package], which contains the anxiety ratings of 20 subjects, rated by 3 raters on ordinal scales. Values are ranging from 1 (not anxious at all) to 6 (extremely anxious).

Inspect the data:

# Diagnoses data
data("diagnoses", package = "irr")
head(diagnoses[, 1:3])
##                    rater1                  rater2                  rater3
## 1             4. Neurosis             4. Neurosis             4. Neurosis
## 2 2. Personality Disorder 2. Personality Disorder 2. Personality Disorder
## 3 2. Personality Disorder        3. Schizophrenia        3. Schizophrenia
## 4                5. Other                5. Other                5. Other
## 5 2. Personality Disorder 2. Personality Disorder 2. Personality Disorder
## 6           1. Depression           1. Depression        3. Schizophrenia
# Anxiety data
data("anxiety", package = "irr")
head(anxiety, 4)
##   rater1 rater2 rater3
## 1      3      3      2
## 2      3      6      1
## 3      3      4      4
## 4      4      6      4

## Cohen’s Kappa: two raters

The Cohen’s kappa corresponds to the unweighted kappa. It can be used for two nominal or two ordinal categorical variables

kappa2(diagnoses[, c("rater1", "rater2")], weight = "unweighted")
##  Cohen's Kappa for 2 Raters (Weights: unweighted)
##
##  Subjects = 30
##    Raters = 2
##     Kappa = 0.651
##
##         z = 7
##   p-value = 2.63e-12

## Weighed kappa: ordinal scales

Weighted kappa should be considered only when ratings are performed in ordinal scale as in the following example.

kappa2(anxiety[, c("rater1", "rater2")], weight = "equal")

## Light’s kappa: multiple raters

It returns the average Cohen’s kappa when you have multiple raters

kappam.light(diagnoses[, 1:3])
##  Light's Kappa for m Raters
##
##  Subjects = 30
##    Raters = 3
##     Kappa = 0.555
##
##         z = NaN
##   p-value = NaN

## Fleiss’ kappa: multiple raters

The raters are not assumed to be the same for all subjects.

kappam.fleiss(diagnoses[, 1:3])
##  Fleiss' Kappa for m Raters
##
##  Subjects = 30
##    Raters = 3
##     Kappa = 0.534
##
##         z = 9.89
##   p-value = 0

## Intraclass correlation coefficients: continuous scales

Read more in Chapter @ref(intraclass-correlation-coefficient):

icc(
anxiety, model = "twoway",
type = "agreement", unit = "single"
)
##  Single Score Intraclass Correlation
##
##    Model: twoway
##    Type : agreement
##
##    Subjects = 20
##      Raters = 3
##    ICC(A,1) = 0.198
##
##  F-Test, H0: r0 = 0 ; H1: r0 > 0
##  F(19,39.7) = 1.83 , p = 0.0543
##
##  95%-Confidence Interval for ICC Population Values:
##   -0.039 < ICC < 0.494

## Summary

This article describes how to compute the different inter-rater agreement measures using the irr packages. 