T-Test Essentials: Definition, Formula and Calculation

Paired T-Test

The paired t-test is used to compare the means of two related groups of samples. Put into another words, it’s used in a situation where you have two values (i.e., pair of values) for the same samples.

It is also referred as:

  • dependent t-test,
  • dependent samples t-test,
  • repeated measures t-test,
  • related samples t-test,
  • paired two sample t-test,
  • paired sample t-test and
  • t-test for dependent means.

For example, you might want to compare the average weight of 20 mice before and after treatment. The data contain 20 sets of values before treatment and 20 sets of values after treatment from measuring twice the weight of the same mice. In such situations, paired t-test can be used to compare the mean weights before and after treatment.

The procedure of the paired t-test analysis is as follow:

  1. Calculate the difference (\(d\)) between each pair of value
  2. Compute the mean (\(m\)) and the standard deviation (\(s\)) of \(d\)
  3. Compare the average difference to 0. If there is any significant difference between the two pairs of samples, then the mean of d (\(m\)) is expected to be far from 0.

Paired t-test can be used only when the difference \(d\) is normally distributed. This can be checked using the Shapiro-Wilk test.

In this chapter, you will learn the paired t-test formula, as well as, how to:

  • Compute the paired t-test in R. The pipe-friendly function t_test() [rstatix package] will be used.
  • Check the paired t-test assumptions
  • Calculate and report the paired t-test effect size using the Cohen’s d. The d statistic redefines the difference in means as the number of standard deviations that separates those means. T-test conventional effect sizes, proposed by Cohen, are: 0.2 (small effect), 0.5 (moderate effect) and 0.8 (large effect) (Cohen 1998).


Related Book

Practical Statistics in R II - Comparing Groups: Numerical Variables


Make sure you have installed the following R packages:

  • tidyverse for data manipulation and visualization
  • ggpubr for creating easily publication ready plots
  • rstatix provides pipe-friendly R functions for easy statistical analyses.
  • datarium: contains required data sets for this chapter.

Start by loading the following required packages:


Research questions

Typical research questions are:

  1. whether the mean difference (\(m\)) is equal to 0?
  2. whether the mean difference (\(m\)) is less than 0?
  3. whether the mean difference (\(m\)) is greather than 0?

statistical hypotheses

In statistics, we can define the corresponding null hypothesis (\(H_0\)) as follow:

  1. \(H_0: m = 0\)
  2. \(H_0: m \leq 0\)
  3. \(H_0: m \geq 0\)

The corresponding alternative hypotheses (\(H_a\)) are as follow:

  1. \(H_a: m \ne 0\) (different)
  2. \(H_a: m > 0\) (greater)
  3. \(H_a: m < 0\) (less)

Note that:

  • Hypotheses 1) are called two-tailed tests
  • Hypotheses 2) and 3) are called one-tailed tests


The paired t-test statistics value can be calculated using the following formula:

t = \frac{m}{s/\sqrt{n}}


  • m is the mean differences
  • n is the sample size (i.e., size of d).
  • s is the standard deviation of d

We can compute the p-value corresponding to the absolute value of the t-test statistics (|t|) for the degrees of freedom (df): \(df = n - 1\).

If the p-value is inferior or equal to 0.05, we can conclude that the difference between the two paired samples are significantly different.

Demo data

Here, we’ll use a demo dataset mice2 [datarium package], which contains the weight of 10 mice before and after the treatment.

# Wide format
data("mice2", package = "datarium")
head(mice2, 3)
##   id before after
## 1  1    187   430
## 2  2    194   404
## 3  3    232   406
# Transform into long data: 
# gather the before and after values in the same column
mice2.long <- mice2 %>%
  gather(key = "group", value = "weight", before, after)
head(mice2.long, 3)
##   id  group weight
## 1  1 before    187
## 2  2 before    194
## 3  3 before    232

Summary statistics

Compute some summary statistics (mean and sd) by groups:

mice2.long %>%
  group_by(group) %>%
  get_summary_stats(weight, type = "mean_sd")
## # A tibble: 2 x 5
##   group  variable     n  mean    sd
##   <chr>  <chr>    <dbl> <dbl> <dbl>
## 1 after  weight      10  400.  30.1
## 2 before weight      10  201.  20.0


bxp <- ggpaired(mice2.long, x = "group", y = "weight", 
         order = c("before", "after"),
         ylab = "Weight", xlab = "Groups")

Assumptions and preleminary tests

The paired samples t-test assume the following characteristics about the data:

  • the two groups are paired. In our example, this is the case since the data have been collected from measuring twice the weight of the same mice.
  • No significant outliers in the difference between the two related groups
  • Normality. the difference of pairs follow a normal distribution.

In this section, we’ll perform some preliminary tests to check whether these assumptions are met.

First, start by computing the difference between groups:

mice2 <- mice2 %>% mutate(differences = before - after)
head(mice2, 3)
##   id before after differences
## 1  1    187   430        -242
## 2  2    194   404        -210
## 3  3    232   406        -174

Identify outliers

Outliers can be easily identified using boxplot methods, implemented in the R function identify_outliers() [rstatix package].

mice2 %>% identify_outliers(differences)
## [1] id          before      after       differences is.outlier  is.extreme 
## <0 rows> (or 0-length row.names)

There were no extreme outliers.

Note that, in the situation where you have extreme outliers, this can be due to: 1) data entry errors, measurement errors or unusual values.

You can include the outlier in the analysis anyway if you do not believe the result will be substantially affected. This can be evaluated by comparing the result of the t-test with and without the outlier.

It’s also possible to keep the outliers in the data and perform Wilcoxon test or robust t-test using the WRS2 package.

Check normality assumption

The normality assumption can be checked by computing the Shapiro-Wilk test for each group. If the data is normally distributed, the p-value should be greater than 0.05.

mice2 %>% shapiro_test(differences) 
## # A tibble: 1 x 3
##   variable    statistic     p
##   <chr>           <dbl> <dbl>
## 1 differences     0.968 0.867

From the output, the two p-values are greater than the significance level 0.05 indicating that the distribution of the data are not significantly different from the normal distribution. In other words, we can assume the normality.

You can also create QQ plots for each group. QQ plot draws the correlation between a given data and the normal distribution.

ggqqplot(mice2, "differences")

All the points fall approximately along the (45-degree) reference line, for each group. So we can assume normality of the data.

Note that, if your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.

In the situation where the data are not normally distributed, it’s recommended to use the non parametric Wilcoxon test.


We want to know, if there is any significant difference in the mean weights after treatment?

We’ll use the pipe-friendly t_test() function [rstatix package], a wrapper around the R base function t.test().

stat.test <- mice2.long  %>% 
  t_test(weight ~ group, paired = TRUE) %>%
## # A tibble: 1 x 9
##   .y.    group1 group2    n1    n2 statistic    df             p p.signif
##   <chr>  <chr>  <chr>  <int> <int>     <dbl> <dbl>         <dbl> <chr>   
## 1 weight after  before    10    10      25.5     9 0.00000000104 ****

The results above show the following components:

  • .y.: the y variable used in the test.
  • group1,group2: the compared groups in the pairwise tests.
  • statistic: Test statistic used to compute the p-value.
  • df: degrees of freedom.
  • p: p-value.

Note that, you can obtain a detailed result by specifying the option detailed = TRUE.

To compute one tailed paired t-test, you can specify the option alternative as follow.

  • if you want to test whether the average weight before treatment is less than the average weight after treatment, type this:
mice2.long  %>%
  t_test(weight ~ group, paired = TRUE, alternative = "less")
  • Or, if you want to test whether the average weight before treatment is greater than the average weight after treatment, type this
mice2.long  %>%
  t_test(weight ~ group, paired = TRUE, alternative = "greater")

Effect size

The effect size for a paired-samples t-test can be calculated by dividing the mean difference by the standard deviation of the difference, as shown below.

Cohen’s d formula:

d = \frac{mean_D}{SD_D}

Where D is the differences of the paired samples values.


mice2.long  %>% cohens_d(weight ~ group, paired = TRUE)
## # A tibble: 1 x 7
##   .y.    group1 group2 effsize    n1    n2 magnitude
## * <chr>  <chr>  <chr>    <dbl> <int> <int> <ord>    
## 1 weight after  before    8.08    10    10 large

There is a large effect size, Cohen’s d = 8.07.


We could report the result as follow: The average weight of mice was significantly increased after treatment, t(9) = 25.5, p < 0.0001, d = 8.07.

stat.test <- stat.test %>% add_xy_position(x = "group")
bxp + 
  stat_pvalue_manual(stat.test, tip.length = 0) +
  labs(subtitle = get_test_label(stat.test, detailed= TRUE))


This article describes the formula and the basics of the paired t-test or dependent t-test. Examples of R codes are provided to check the assumptions, computing the test and the effect size, interpreting and reporting the results.


Cohen, J. 1998. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates.

Welch T-Test (Prev Lesson)
(Next Lesson) Pairwise T-Test
Back to T-Test Essentials: Definition, Formula and Calculation

No Comments

Give a comment