{"id":10872,"date":"2019-11-29T08:00:05","date_gmt":"2019-11-29T06:00:05","guid":{"rendered":"https:\/\/www.datanovia.com\/en\/?post_type=dt_lessons&#038;p=10872"},"modified":"2019-11-29T08:29:10","modified_gmt":"2019-11-29T06:29:10","slug":"anova-in-r","status":"publish","type":"dt_lessons","link":"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/","title":{"rendered":"ANOVA in R"},"content":{"rendered":"<div id=\"rdoc\">\n<p>The <strong>ANOVA<\/strong> test (or <strong>Analysis of Variance<\/strong>) is used to compare the mean of multiple groups. The term ANOVA is a little misleading. Although the name of the technique refers to variances, the main goal of ANOVA is to investigate differences in means.<\/p>\n<p>This chapter describes the different types of ANOVA for <strong>comparing independent groups<\/strong>, including:<\/p>\n<ul>\n<li><strong>One-way ANOVA<\/strong>: an extension of the independent samples t-test for comparing the means in a situation where there are more than two groups. This is the simplest case of ANOVA test where the data are organized into several groups according to only one single grouping variable (also called factor variable). Other synonyms are: <em>1 way ANOVA<\/em>, <em>one-factor ANOVA<\/em> and <em>between-subject ANOVA<\/em>.<\/li>\n<li><strong>two-way ANOVA<\/strong> used to evaluate simultaneously the effect of two different grouping variables on a continuous outcome variable. Other synonyms are: <em>two factorial design<\/em>, <em>factorial anova<\/em> or <em>two-way between-subjects ANOVA<\/em>.<\/li>\n<li><strong>three-way ANOVA<\/strong> used to evaluate simultaneously the effect of three different grouping variables on a continuous outcome variable. Other synonyms are: <em>factorial ANOVA<\/em> or <em>three-way between-subjects ANOVA<\/em>.<\/li>\n<\/ul>\n<div class=\"block\">\n<p>Note that, the independent grouping variables are also known as <strong>between-subjects factors<\/strong>.<\/p>\n<p>The main goal of two-way and three-way ANOVA is, respectively, to evaluate if there is a statistically significant interaction effect between two and three between-subjects factors in explaining a continuous outcome variable.<\/p>\n<\/div>\n<p>You will learn how to:<\/p>\n<ul>\n<li><strong>Compute and interpret the different types of ANOVA in R<\/strong> for comparing independent groups.<\/li>\n<li><strong>Check ANOVA test assumptions<\/strong><\/li>\n<li><strong>Perform post-hoc tests<\/strong>, multiple pairwise comparisons between groups to identify which groups are different<\/li>\n<li><strong>Visualize the data<\/strong> using box plots, add ANOVA and pairwise comparisons p-values to the plot<\/li>\n<\/ul>\n<p>Contents:<\/p>\n<div id=\"TOC\">\n<ul>\n<li><a href=\"#basics\">Basics<\/a><\/li>\n<li><a href=\"#assumptions\">Assumptions<\/a><\/li>\n<li><a href=\"#prerequisites\">Prerequisites<\/a><\/li>\n<li><a href=\"#one-way-independent-measures\">One-way ANOVA<\/a>\n<ul>\n<li><a href=\"#data-preparation\">Data preparation<\/a><\/li>\n<li><a href=\"#summary-statistics\">Summary statistics<\/a><\/li>\n<li><a href=\"#visualization\">Visualization<\/a><\/li>\n<li><a href=\"#check-assumptions\">Check assumptions<\/a><\/li>\n<li><a href=\"#computation\">Computation<\/a><\/li>\n<li><a href=\"#post-hoc-tests\">Post-hoc tests<\/a><\/li>\n<li><a href=\"#report\">Report<\/a><\/li>\n<li><a href=\"#relaxing-the-homogeneity-of-variance-assumption\">Relaxing the homogeneity of variance assumption<\/a><\/li>\n<\/ul>\n<\/li>\n<li><a href=\"#two-way-independent-anova\">Two-way ANOVA<\/a>\n<ul>\n<li><a href=\"#data-preparation-1\">Data preparation<\/a><\/li>\n<li><a href=\"#summary-statistics-1\">Summary statistics<\/a><\/li>\n<li><a href=\"#visualization-1\">Visualization<\/a><\/li>\n<li><a href=\"#check-assumptions-1\">Check assumptions<\/a><\/li>\n<li><a href=\"#computation-1\">Computation<\/a><\/li>\n<li><a href=\"#post-hoct-tests\">Post-hoct tests<\/a><\/li>\n<li><a href=\"#report-1\">Report<\/a><\/li>\n<\/ul>\n<\/li>\n<li><a href=\"#three-way-independent-anova\">Three-Way ANOVA<\/a>\n<ul>\n<li><a href=\"#data-preparation-2\">Data preparation<\/a><\/li>\n<li><a href=\"#summary-statistics-2\">Summary statistics<\/a><\/li>\n<li><a href=\"#visualization-2\">Visualization<\/a><\/li>\n<li><a href=\"#check-assumptions-2\">Check assumptions<\/a><\/li>\n<li><a href=\"#computation-2\">Computation<\/a><\/li>\n<li><a href=\"#post-hoc-tests-1\">Post-hoc tests<\/a><\/li>\n<\/ul>\n<\/li>\n<li><a href=\"#summary\">Summary<\/a><\/li>\n<\/ul>\n<\/div>\n<div class='dt-sc-hr-invisible-medium  '><\/div>\n<div class='dt-sc-ico-content type1'><div class='custom-icon' ><a href='https:\/\/www.datanovia.com\/en\/product\/practical-statistics-in-r-for-comparing-groups-numerical-variables\/' target='_blank'><span class='fa fa-book'><\/span><\/a><\/div><h4><a href='https:\/\/www.datanovia.com\/en\/product\/practical-statistics-in-r-for-comparing-groups-numerical-variables\/' target='_blank'> Related Book <\/a><\/h4>Practical Statistics in R II - Comparing Groups: Numerical Variables<\/div>\n<div class='dt-sc-hr-invisible-medium  '><\/div>\n<div id=\"basics\" class=\"section level2\">\n<h2>Basics<\/h2>\n<p>Assume that we have 3 groups to compare, as illustrated in the image below. The dashed line indicates the group mean. The figure shows the variation between the means of the groups (panel A) and the variation within each group (panel B), also known as <strong>residual variance<\/strong>.<\/p>\n<p>The idea behind the ANOVA test is very simple: if the average variation between groups is large enough compared to the average variation within groups, then you could conclude that at least one group mean is not equal to the others.<\/p>\n<p>Thus, it\u2019s possible to evaluate whether the differences between the group means are significant by comparing the two variance estimates. This is why the method is called <strong>analysis of variance<\/strong> even though the main goal is to compare the group means.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/r-statistics-2-comparing-groups-means\/images\/one-way-anova-basics.png\" alt=\"one-way anova basics\" \/><\/p>\n<p>Briefly, the mathematical procedure behind the ANOVA test is as follow:<\/p>\n<ol style=\"list-style-type: decimal;\">\n<li>Compute the <strong>within-group variance<\/strong>, also known as <strong>residual variance<\/strong>. This tells us, how different each participant is from their own group mean (see figure, panel B).<\/li>\n<li>Compute the <strong>variance between group means<\/strong> (see figure, panel A).<\/li>\n<li>Produce the F-statistic as the ratio of <code>variance.between.groups\/variance.within.groups<\/code>.<\/li>\n<\/ol>\n<div class=\"success\">\n<p>Note that, a lower F value (F &lt; 1) indicates that there are no significant difference between the means of the samples being compared.<\/p>\n<p>However, a higher ratio implies that the variation among group means are greatly different from each other compared to the variation of the individual observations in each groups.<\/p>\n<\/div>\n<\/div>\n<div id=\"assumptions\" class=\"section level2\">\n<h2>Assumptions<\/h2>\n<p>The ANOVA test makes the following assumptions about the data:<\/p>\n<ul>\n<li><strong>Independence of the observations<\/strong>. Each subject should belong to only one group. There is no relationship between the observations in each group. Having repeated measures for the same participants is not allowed.<\/li>\n<li><strong>No significant outliers<\/strong> in any cell of the design<\/li>\n<li><strong>Normality<\/strong>. the data for each design cell should be approximately normally distributed.<\/li>\n<li><strong>Homogeneity of variances<\/strong>. The variance of the outcome variable should be equal in every cell of the design.<\/li>\n<\/ul>\n<p>Before computing ANOVA test, you need to perform some preliminary tests to check if the assumptions are met.<\/p>\n<div class=\"warning\">\n<p>Note that, if the above assumptions are not met there are a non-parametric alternative (<em>Kruskal-Wallis test<\/em>) to the one-way ANOVA.<\/p>\n<p>Unfortunately, there are no non-parametric alternatives to the two-way and the three-way ANOVA. Thus, in the situation where the assumptions are not met, you could consider running the two-way\/three-way ANOVA on the transformed and non-transformed data to see if there are any meaningful differences.<\/p>\n<p>If both tests lead you to the same conclusions, you might not choose to transform the outcome variable and carry on with the two-way\/three-way ANOVA on the original data.<\/p>\n<p>It\u2019s also possible to perform robust ANOVA test using the <strong>WRS2<\/strong> R package.<\/p>\n<p>No matter your choice, you should report what you did in your results.<\/p>\n<\/div>\n<\/div>\n<div id=\"prerequisites\" class=\"section level2\">\n<h2>Prerequisites<\/h2>\n<p>Make sure you have the following R packages:<\/p>\n<ul>\n<li><code>tidyverse<\/code> for data manipulation and visualization<\/li>\n<li><code>ggpubr<\/code> for creating easily publication ready plots<\/li>\n<li><code>rstatix<\/code> provides pipe-friendly R functions for easy statistical analyses<\/li>\n<li><code>datarium<\/code>: contains required data sets for this chapter<\/li>\n<\/ul>\n<p>Load required R packages:<\/p>\n<pre class=\"r\"><code>library(tidyverse)\r\nlibrary(ggpubr)\r\nlibrary(rstatix)<\/code><\/pre>\n<p>Key R functions: <code>anova_test()<\/code> [rstatix package], wrapper around the function <code>car::Anova()<\/code>.<\/p>\n<\/div>\n<div id=\"one-way-independent-measures\" class=\"section level2\">\n<h2>One-way ANOVA<\/h2>\n<div id=\"data-preparation\" class=\"section level3\">\n<h3>Data preparation<\/h3>\n<p>Here, we\u2019ll use the built-in R data set named <code>PlantGrowth<\/code>. It contains the weight of plants obtained under a control and two different treatment conditions.<\/p>\n<p>Load and inspect the data by using the function <code>sample_n_by()<\/code> to display one random row by groups:<\/p>\n<pre class=\"r\"><code>data(\"PlantGrowth\")\r\nset.seed(1234)\r\nPlantGrowth %&gt;% sample_n_by(group, size = 1)<\/code><\/pre>\n<pre><code>## # A tibble: 3 x 2\r\n##   weight group\r\n##    &lt;dbl&gt; &lt;fct&gt;\r\n## 1   5.58 ctrl \r\n## 2   6.03 trt1 \r\n## 3   4.92 trt2<\/code><\/pre>\n<p>Show the levels of the grouping variable:<\/p>\n<pre class=\"r\"><code>levels(PlantGrowth$group)<\/code><\/pre>\n<pre><code>## [1] \"ctrl\" \"trt1\" \"trt2\"<\/code><\/pre>\n<p>If the levels are not automatically in the correct order, re-order them as follow:<\/p>\n<pre class=\"r\"><code>PlantGrowth &lt;- PlantGrowth %&gt;%\r\n  reorder_levels(group, order = c(\"ctrl\", \"trt1\", \"trt2\"))<\/code><\/pre>\n<div class=\"block\">\n<p>The one-way ANOVA can be used to determine whether the means plant growths are significantly different between the three conditions.<\/p>\n<\/div>\n<\/div>\n<div id=\"summary-statistics\" class=\"section level3\">\n<h3>Summary statistics<\/h3>\n<p>Compute some summary statistics (count, mean and sd) of the variable <code>weight<\/code> organized by groups:<\/p>\n<pre class=\"r\"><code>PlantGrowth %&gt;%\r\n  group_by(group) %&gt;%\r\n  get_summary_stats(weight, type = \"mean_sd\")<\/code><\/pre>\n<pre><code>## # A tibble: 3 x 5\r\n##   group variable     n  mean    sd\r\n##   &lt;fct&gt; &lt;chr&gt;    &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;\r\n## 1 ctrl  weight      10  5.03 0.583\r\n## 2 trt1  weight      10  4.66 0.794\r\n## 3 trt2  weight      10  5.53 0.443<\/code><\/pre>\n<\/div>\n<div id=\"visualization\" class=\"section level3\">\n<h3>Visualization<\/h3>\n<p>Create a box plot of <code>weight<\/code> by <code>group<\/code>:<\/p>\n<pre class=\"r\"><code>ggboxplot(PlantGrowth, x = \"group\", y = \"weight\")<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/r-statistics-2-comparing-groups-means\/figures\/045-anova-analysis-of-variance-one-way-boxplots-1.png\" width=\"288\" \/><\/p>\n<\/div>\n<div id=\"check-assumptions\" class=\"section level3\">\n<h3>Check assumptions<\/h3>\n<div id=\"outliers\" class=\"section level4\">\n<h4>Outliers<\/h4>\n<p>Outliers can be easily identified using box plot methods, implemented in the R function <code>identify_outliers()<\/code> [rstatix package].<\/p>\n<pre class=\"r\"><code>PlantGrowth %&gt;% \r\n  group_by(group) %&gt;%\r\n  identify_outliers(weight)<\/code><\/pre>\n<pre><code>## # A tibble: 2 x 4\r\n##   group weight is.outlier is.extreme\r\n##   &lt;fct&gt;  &lt;dbl&gt; &lt;lgl&gt;      &lt;lgl&gt;     \r\n## 1 trt1    5.87 TRUE       FALSE     \r\n## 2 trt1    6.03 TRUE       FALSE<\/code><\/pre>\n<div class=\"success\">\n<p>There were no extreme outliers.<\/p>\n<\/div>\n<div class=\"warning\">\n<p>Note that, in the situation where you have extreme outliers, this can be due to: 1) data entry errors, measurement errors or unusual values.<\/p>\n<p>Yo can include the outlier in the analysis anyway if you do not believe the result will be substantially affected. This can be evaluated by comparing the result of the ANOVA test with and without the outlier.<\/p>\n<p>It\u2019s also possible to keep the outliers in the data and perform robust ANOVA test using the WRS2 package.<\/p>\n<\/div>\n<\/div>\n<div id=\"normality-assumption\" class=\"section level4\">\n<h4>Normality assumption<\/h4>\n<p>The normality assumption can be checked by using one of the following two approaches:<\/p>\n<ol style=\"list-style-type: decimal;\">\n<li><strong>Analyzing the ANOVA model residuals<\/strong> to check the normality for all groups together. This approach is easier and it\u2019s very handy when you have many groups or if there are few data points per group.<\/li>\n<li><strong>Check normality for each group separately<\/strong>. This approach might be used when you have only a few groups and many data points per group.<\/li>\n<\/ol>\n<p>In this section, we\u2019ll show you how to proceed for both option 1 and 2.<\/p>\n<p><strong>Check normality assumption by analyzing the model residuals<\/strong>. QQ plot and Shapiro-Wilk test of normality are used. QQ plot draws the correlation between a given data and the normal distribution.<\/p>\n<pre class=\"r\"><code># Build the linear model\r\nmodel  &lt;- lm(weight ~ group, data = PlantGrowth)\r\n# Create a QQ plot of residuals\r\nggqqplot(residuals(model))<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/r-statistics-2-comparing-groups-means\/figures\/045-anova-analysis-of-variance-one-way-qq-plot-residuals-1.png\" width=\"384\" \/><\/p>\n<pre class=\"r\"><code># Compute Shapiro-Wilk test of normality\r\nshapiro_test(residuals(model))<\/code><\/pre>\n<pre><code>## # A tibble: 1 x 3\r\n##   variable         statistic p.value\r\n##   &lt;chr&gt;                &lt;dbl&gt;   &lt;dbl&gt;\r\n## 1 residuals(model)     0.966   0.438<\/code><\/pre>\n<div class=\"success\">\n<p>In the QQ plot, as all the points fall approximately along the reference line, we can assume normality. This conclusion is supported by the Shapiro-Wilk test. The p-value is not significant (p = 0.13), so we can assume normality.<\/p>\n<\/div>\n<p><strong>Check normality assumption by groups<\/strong>. Computing Shapiro-Wilk test for each group level. If the data is normally distributed, the p-value should be greater than 0.05.<\/p>\n<pre class=\"r\"><code>PlantGrowth %&gt;%\r\n  group_by(group) %&gt;%\r\n  shapiro_test(weight)<\/code><\/pre>\n<pre><code>## # A tibble: 3 x 4\r\n##   group variable statistic     p\r\n##   &lt;fct&gt; &lt;chr&gt;        &lt;dbl&gt; &lt;dbl&gt;\r\n## 1 ctrl  weight       0.957 0.747\r\n## 2 trt1  weight       0.930 0.452\r\n## 3 trt2  weight       0.941 0.564<\/code><\/pre>\n<div class=\"success\">\n<p>The score were normally distributed (p &gt; 0.05) for each group, as assessed by Shapiro-Wilk\u2019s test of normality.<\/p>\n<\/div>\n<p>Note that, if your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.<\/p>\n<p>QQ plot draws the correlation between a given data and the normal distribution. Create QQ plots for each group level:<\/p>\n<pre class=\"r\"><code>ggqqplot(PlantGrowth, \"weight\", facet.by = \"group\")<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/r-statistics-2-comparing-groups-means\/figures\/045-anova-analysis-of-variance-one-way-qq-plot-1.png\" width=\"480\" \/><\/p>\n<div class=\"success\">\n<p>All the points fall approximately along the reference line, for each cell. So we can assume normality of the data.<\/p>\n<\/div>\n<div class=\"warning\">\n<p>If you have doubt about the normality of the data, you can use the <em>Kruskal-Wallis test<\/em>, which is the non-parametric alternative to one-way ANOVA test.<\/p>\n<\/div>\n<\/div>\n<div id=\"homogneity-of-variance-assumption\" class=\"section level4\">\n<h4>Homogneity of variance assumption<\/h4>\n<ol style=\"list-style-type: decimal;\">\n<li>The <em>residuals versus fits plot<\/em> can be used to check the homogeneity of variances.<\/li>\n<\/ol>\n<pre class=\"r\"><code>plot(model, 1)<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/r-statistics-2-comparing-groups-means\/figures\/045-anova-analysis-of-variance-one-way-residuals-vs-fits-1.png\" width=\"384\" \/><\/p>\n<div class=\"success\">\n<p>In the plot above, there is no evident relationships between residuals and fitted values (the mean of each groups), which is good. So, we can assume the homogeneity of variances.<\/p>\n<\/div>\n<ol style=\"list-style-type: decimal;\" start=\"2\">\n<li>It\u2019s also possible to use the <em>Levene\u2019s test<\/em> to check the <em>homogeneity of variances<\/em>:<\/li>\n<\/ol>\n<pre class=\"r\"><code>PlantGrowth %&gt;% levene_test(weight ~ group)<\/code><\/pre>\n<pre><code>## # A tibble: 1 x 4\r\n##     df1   df2 statistic     p\r\n##   &lt;int&gt; &lt;int&gt;     &lt;dbl&gt; &lt;dbl&gt;\r\n## 1     2    27      1.12 0.341<\/code><\/pre>\n<div class=\"success\">\n<p>From the output above, we can see that the p-value is &gt; 0.05, which is not significant. This means that, there is not significant difference between variances across groups. Therefore, we can assume the homogeneity of variances in the different treatment groups.<\/p>\n<\/div>\n<div class=\"warning\">\n<p>In a situation where the homogeneity of variance assumption is not met, you can compute the Welch one-way ANOVA test using the function <em>welch_anova_test()<\/em>[rstatix package]. This test does not require the assumption of equal variances.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"computation\" class=\"section level3\">\n<h3>Computation<\/h3>\n<pre class=\"r\"><code>res.aov &lt;- PlantGrowth %&gt;% anova_test(weight ~ group)\r\nres.aov<\/code><\/pre>\n<pre><code>## ANOVA Table (type II tests)\r\n## \r\n##   Effect DFn DFd    F     p p&lt;.05   ges\r\n## 1  group   2  27 4.85 0.016     * 0.264<\/code><\/pre>\n<p>In the table above, the column <code>ges<\/code> corresponds to the generalized eta squared (effect size). It measures the proportion of the variability in the outcome variable (here plant <code>weight<\/code>) that can be explained in terms of the predictor (here, treatment <code>group<\/code>). An effect size of 0.26 (26%) means that 26% of the change in the <code>weight<\/code> can be accounted for the treatment conditions.<\/p>\n<div class=\"success\">\n<p>From the above ANOVA table, it can be seen that there are significant differences between groups (p = 0.016), which are highlighted with \u201c*\u201c, F(2, 27) = 4.85, p = 0.16, eta2[g] = 0.26.<\/p>\n<\/div>\n<p>where,<\/p>\n<ul>\n<li><code>F<\/code> indicates that we are comparing to an F-distribution (F-test); <code>(2, 27)<\/code> indicates the degrees of freedom in the numerator (DFn) and the denominator (DFd), respectively; <code>4.85<\/code> indicates the obtained F-statistic value<\/li>\n<li><code>p<\/code> specifies the p-value<\/li>\n<li><code>ges<\/code> is the generalized effect size (amount of variability due to the factor)<\/li>\n<\/ul>\n<\/div>\n<div id=\"post-hoc-tests\" class=\"section level3\">\n<h3>Post-hoc tests<\/h3>\n<p>A significant one-way ANOVA is generally followed up by Tukey post-hoc tests to perform multiple pairwise comparisons between groups. Key R function: <code>tukey_hsd()<\/code> [rstatix].<\/p>\n<pre class=\"r\"><code># Pairwise comparisons\r\npwc &lt;- PlantGrowth %&gt;% tukey_hsd(weight ~ group)\r\npwc<\/code><\/pre>\n<pre><code>## # A tibble: 3 x 8\r\n##   term  group1 group2 estimate conf.low conf.high p.adj p.adj.signif\r\n## * &lt;chr&gt; &lt;chr&gt;  &lt;chr&gt;     &lt;dbl&gt;    &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;       \r\n## 1 group ctrl   trt1     -0.371   -1.06      0.320 0.391 ns          \r\n## 2 group ctrl   trt2      0.494   -0.197     1.19  0.198 ns          \r\n## 3 group trt1   trt2      0.865    0.174     1.56  0.012 *<\/code><\/pre>\n<p>The output contains the following columns:<\/p>\n<ul>\n<li><code>estimate<\/code>: estimate of the difference between means of the two groups<\/li>\n<li><code>conf.low<\/code>, <code>conf.high<\/code>: the lower and the upper end point of the confidence interval at 95% (default)<\/li>\n<li><code>p.adj<\/code>: p-value after adjustment for the multiple comparisons.<\/li>\n<\/ul>\n<div class=\"success\">\n<p>It can be seen from the output, that only the difference between trt2 and trt1 is significant (adjusted p-value = 0.012).<\/p>\n<\/div>\n<\/div>\n<div id=\"report\" class=\"section level3\">\n<h3>Report<\/h3>\n<p>We could report the results of one-way ANOVA as follow:<\/p>\n<p>A one-way ANOVA was performed to evaluate if the plant growth was different for the 3 different treatment groups: ctr (n = 10), trt1 (n = 10) and trt2 (n = 10).<\/p>\n<p>Data is presented as mean +\/- standard deviation. Plant growth was statistically significantly different between different treatment groups, F(2, 27) = 4.85, p = 0.016, generalized eta squared = 0.26.<\/p>\n<p>Plant growth decreased in trt1 group (4.66 +\/- 0.79) compared to ctr group (5.03 +\/- 0.58). It increased in trt2 group (5.53 +\/- 0.44) compared to trt1 and ctr group.<\/p>\n<p>Tukey post-hoc analyses revealed that the increase from trt1 to trt2 (0.87, 95% CI (0.17 to 1.56)) was statistically significant (p = 0.012), but no other group differences were statistically significant.<\/p>\n<pre class=\"r\"><code># Visualization: box plots with p-values\r\npwc &lt;- pwc %&gt;% add_xy_position(x = \"group\")\r\nggboxplot(PlantGrowth, x = \"group\", y = \"weight\") +\r\n  stat_pvalue_manual(pwc, hide.ns = TRUE) +\r\n  labs(\r\n    subtitle = get_test_label(res.aov, detailed = TRUE),\r\n    caption = get_pwc_label(pwc)\r\n    )<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/r-statistics-2-comparing-groups-means\/figures\/045-anova-analysis-of-variance-one-way-anova-boxplots-with-p-values-1.png\" width=\"480\" \/><\/p>\n<\/div>\n<div id=\"relaxing-the-homogeneity-of-variance-assumption\" class=\"section level3\">\n<h3>Relaxing the homogeneity of variance assumption<\/h3>\n<p>The classical one-way ANOVA test requires an assumption of equal variances for all groups. In our example, the homogeneity of variance assumption turned out to be fine: the Levene test is not significant.<\/p>\n<div class=\"block\">\n<p>How do we save our ANOVA test, in a situation where the homogeneity of variance assumption is violated?<\/p>\n<\/div>\n<ul>\n<li>The <strong>Welch one-way test<\/strong> is an alternative to the standard one-way ANOVA in the situation where the homogeneity of variance can\u2019t be assumed (i.e., <em>Levene test<\/em> is significant).<\/li>\n<li>In this case, the <strong>Games-Howell<\/strong> post hoc test or <strong>pairwise t-tests<\/strong> (with no assumption of equal variances) can be used to compare all possible combinations of group differences.<\/li>\n<\/ul>\n<pre class=\"r\"><code># Welch One way ANOVA test\r\nres.aov2 &lt;- PlantGrowth %&gt;% welch_anova_test(weight ~ group)\r\n# Pairwise comparisons (Games-Howell)\r\npwc2 &lt;- PlantGrowth %&gt;% games_howell_test(weight ~ group)\r\n# Visualization: box plots with p-values\r\npwc2 &lt;- pwc2 %&gt;% add_xy_position(x = \"group\", step.increase = 1)\r\nggboxplot(PlantGrowth, x = \"group\", y = \"weight\") +\r\n  stat_pvalue_manual(pwc2, hide.ns = TRUE) +\r\n  labs(\r\n    subtitle = get_test_label(res.aov2, detailed = TRUE),\r\n    caption = get_pwc_label(pwc2)\r\n    )<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/r-statistics-2-comparing-groups-means\/figures\/045-anova-analysis-of-variance-welch-one-way-anova-boxplots-with-p-values-1.png\" width=\"480\" \/><\/p>\n<p>You can also perform pairwise comparisons using pairwise t-test with no assumption of equal variances:<\/p>\n<pre class=\"r\"><code>pwc3 &lt;- PlantGrowth %&gt;% \r\n  pairwise_t_test(\r\n    weight ~ group, pool.sd = FALSE,\r\n    p.adjust.method = \"bonferroni\"\r\n    )\r\npwc3<\/code><\/pre>\n<\/div>\n<\/div>\n<div id=\"two-way-independent-anova\" class=\"section level2\">\n<h2>Two-way ANOVA<\/h2>\n<div id=\"data-preparation-1\" class=\"section level3\">\n<h3>Data preparation<\/h3>\n<p>We\u2019ll use the <code>jobsatisfaction<\/code> dataset [datarium package], which contains the job satisfaction score organized by gender and education levels.<\/p>\n<p>In this study, a research wants to evaluate if there is a significant two-way interaction between <code>gender<\/code> and <code>education_level<\/code> on explaining the job satisfaction score. An interaction effect occurs when the effect of one independent variable on an outcome variable depends on the level of the other independent variables. If an interaction effect does not exist, main effects could be reported.<\/p>\n<p>Load the data and inspect one random row by groups:<\/p>\n<pre class=\"r\"><code>set.seed(123)\r\ndata(\"jobsatisfaction\", package = \"datarium\")\r\njobsatisfaction %&gt;% sample_n_by(gender, education_level, size = 1)<\/code><\/pre>\n<pre><code>## # A tibble: 6 x 4\r\n##   id    gender education_level score\r\n##   &lt;fct&gt; &lt;fct&gt;  &lt;fct&gt;           &lt;dbl&gt;\r\n## 1 3     male   school           5.07\r\n## 2 17    male   college          6.3 \r\n## 3 23    male   university      10   \r\n## 4 37    female school           5.51\r\n## 5 48    female college          5.65\r\n## 6 49    female university       8.26<\/code><\/pre>\n<div class=\"block\">\n<p>In this example, the effect of \u201ceducation_level\u201d is our <strong>focal variable<\/strong>, that is our primary concern. It is thought that the effect of \u201ceducation_level\u201d will depend on one other factor, \u201cgender\u201d, which are called a <strong>moderator variable<\/strong>.<\/p>\n<\/div>\n<\/div>\n<div id=\"summary-statistics-1\" class=\"section level3\">\n<h3>Summary statistics<\/h3>\n<p>Compute the mean and the SD (standard deviation) of the <code>score<\/code> by groups:<\/p>\n<pre class=\"r\"><code>jobsatisfaction %&gt;%\r\n  group_by(gender, education_level) %&gt;%\r\n  get_summary_stats(score, type = \"mean_sd\")<\/code><\/pre>\n<pre><code>## # A tibble: 6 x 6\r\n##   gender education_level variable     n  mean    sd\r\n##   &lt;fct&gt;  &lt;fct&gt;           &lt;chr&gt;    &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;\r\n## 1 male   school          score        9  5.43 0.364\r\n## 2 male   college         score        9  6.22 0.34 \r\n## 3 male   university      score       10  9.29 0.445\r\n## 4 female school          score       10  5.74 0.474\r\n## 5 female college         score       10  6.46 0.475\r\n## 6 female university      score       10  8.41 0.938<\/code><\/pre>\n<\/div>\n<div id=\"visualization-1\" class=\"section level3\">\n<h3>Visualization<\/h3>\n<p>Create a box plot of the score by gender levels, colored by education levels:<\/p>\n<pre class=\"r\"><code>bxp &lt;- ggboxplot(\r\n  jobsatisfaction, x = \"gender\", y = \"score\",\r\n  color = \"education_level\", palette = \"jco\"\r\n  )\r\nbxp<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/r-statistics-2-comparing-groups-means\/figures\/045-anova-analysis-of-variance-two-way-boxplots-1.png\" width=\"432\" \/><\/p>\n<\/div>\n<div id=\"check-assumptions-1\" class=\"section level3\">\n<h3>Check assumptions<\/h3>\n<div id=\"outliers-1\" class=\"section level4\">\n<h4>Outliers<\/h4>\n<p>Identify outliers in each cell design:<\/p>\n<pre class=\"r\"><code>jobsatisfaction %&gt;%\r\n  group_by(gender, education_level) %&gt;%\r\n  identify_outliers(score)<\/code><\/pre>\n<div class=\"success\">\n<p>There were no extreme outliers.<\/p>\n<\/div>\n<\/div>\n<div id=\"normality-assumption-1\" class=\"section level4\">\n<h4>Normality assumption<\/h4>\n<p><strong>Check normality assumption by analyzing the model residuals<\/strong>. QQ plot and Shapiro-Wilk test of normality are used.<\/p>\n<pre class=\"r\"><code># Build the linear model\r\nmodel  &lt;- lm(score ~ gender*education_level,\r\n             data = jobsatisfaction)\r\n# Create a QQ plot of residuals\r\nggqqplot(residuals(model))<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/r-statistics-2-comparing-groups-means\/figures\/045-anova-analysis-of-variance-two-way-qq-plot-residuals-1.png\" width=\"384\" \/><\/p>\n<pre class=\"r\"><code># Compute Shapiro-Wilk test of normality\r\nshapiro_test(residuals(model))<\/code><\/pre>\n<pre><code>## # A tibble: 1 x 3\r\n##   variable         statistic p.value\r\n##   &lt;chr&gt;                &lt;dbl&gt;   &lt;dbl&gt;\r\n## 1 residuals(model)     0.968   0.127<\/code><\/pre>\n<div class=\"success\">\n<p>In the QQ plot, as all the points fall approximately along the reference line, we can assume normality. This conclusion is supported by the Shapiro-Wilk test. The p-value is not significant (p = 0.13), so we can assume normality.<\/p>\n<\/div>\n<p><strong>Check normality assumption by groups<\/strong>. Computing Shapiro-Wilk test for each combinations of factor levels:<\/p>\n<pre class=\"r\"><code>jobsatisfaction %&gt;%\r\n  group_by(gender, education_level) %&gt;%\r\n  shapiro_test(score)<\/code><\/pre>\n<pre><code>## # A tibble: 6 x 5\r\n##   gender education_level variable statistic     p\r\n##   &lt;fct&gt;  &lt;fct&gt;           &lt;chr&gt;        &lt;dbl&gt; &lt;dbl&gt;\r\n## 1 male   school          score        0.980 0.966\r\n## 2 male   college         score        0.958 0.779\r\n## 3 male   university      score        0.916 0.323\r\n## 4 female school          score        0.963 0.819\r\n## 5 female college         score        0.963 0.819\r\n## 6 female university      score        0.950 0.674<\/code><\/pre>\n<div class=\"success\">\n<p>The score were normally distributed (p &gt; 0.05) for each cell, as assessed by Shapiro-Wilk\u2019s test of normality.<\/p>\n<\/div>\n<p>Create QQ plots for each cell of design:<\/p>\n<pre class=\"r\"><code>ggqqplot(jobsatisfaction, \"score\", ggtheme = theme_bw()) +\r\n  facet_grid(gender ~ education_level)<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/r-statistics-2-comparing-groups-means\/figures\/045-anova-analysis-of-variance-two-way-qq-plot-1.png\" width=\"576\" \/><\/p>\n<div class=\"success\">\n<p>All the points fall approximately along the reference line, for each cell. So we can assume normality of the data.<\/p>\n<\/div>\n<\/div>\n<div id=\"homogneity-of-variance-assumption-1\" class=\"section level4\">\n<h4>Homogneity of variance assumption<\/h4>\n<p>This can be checked using the Levene\u2019s test:<\/p>\n<pre class=\"r\"><code>jobsatisfaction %&gt;% levene_test(score ~ gender*education_level)<\/code><\/pre>\n<pre><code>## # A tibble: 1 x 4\r\n##     df1   df2 statistic      p\r\n##   &lt;int&gt; &lt;int&gt;     &lt;dbl&gt;  &lt;dbl&gt;\r\n## 1     5    52      2.20 0.0686<\/code><\/pre>\n<div class=\"success\">\n<p>The Levene\u2019s test is not significant (p &gt; 0.05). Therefore, we can assume the homogeneity of variances in the different groups.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"computation-1\" class=\"section level3\">\n<h3>Computation<\/h3>\n<p>In the R code below, the asterisk represents the interaction effect and the main effect of each variable (and all lower-order interactions).<\/p>\n<pre class=\"r\"><code>res.aov &lt;- jobsatisfaction %&gt;% anova_test(score ~ gender * education_level)\r\nres.aov<\/code><\/pre>\n<pre><code>## ANOVA Table (type II tests)\r\n## \r\n##                   Effect DFn DFd       F        p p&lt;.05   ges\r\n## 1                 gender   1  52   0.745 3.92e-01       0.014\r\n## 2        education_level   2  52 187.892 1.60e-24     * 0.878\r\n## 3 gender:education_level   2  52   7.338 2.00e-03     * 0.220<\/code><\/pre>\n<div class=\"success\">\n<p>There was a statistically significant interaction between gender and level of education for job satisfaction score, <em>F(2, 52) = 7.34, p = 0.002<\/em>.<\/p>\n<\/div>\n<\/div>\n<div id=\"post-hoct-tests\" class=\"section level3\">\n<h3>Post-hoct tests<\/h3>\n<p>A <strong>significant two-way interaction<\/strong> indicates that the impact that one factor (e.g., education_level) has on the outcome variable (e.g., job satisfaction score) depends on the level of the other factor (e.g., gender) (and vice versa). So, you can decompose a significant two-way interaction into:<\/p>\n<ul>\n<li><strong>Simple main effect<\/strong>: run one-way model of the first variable at each level of the second variable,<\/li>\n<li><strong>Simple pairwise comparisons<\/strong>: if the simple main effect is significant, run multiple pairwise comparisons to determine which groups are different.<\/li>\n<\/ul>\n<p>For a <strong>non-significant two-way interaction<\/strong>, you need to determine whether you have any statistically significant <strong>main effects<\/strong> from the ANOVA output. A significant main effect can be followed up by pairwise comparisons between groups.<\/p>\n<div id=\"procedure-for-significant-two-way-interaction\" class=\"section level4\">\n<h4>Procedure for significant two-way interaction<\/h4>\n<div id=\"compute-simple-main-effects\" class=\"section level5\">\n<h5>Compute simple main effects<\/h5>\n<p>In our example, you could therefore investigate the effect of <code>education_level<\/code> at every level of <code>gender<\/code> or investigate the effect of <code>gender<\/code> at every level of the variable <code>education_level<\/code>.<\/p>\n<p>Here, we\u2019ll run a one-way ANOVA of <code>education_level<\/code> at each levels of <code>gender<\/code>.<\/p>\n<div class=\"warning\">\n<p>Note that, if you have met the assumptions of the two-way ANOVA (e.g., homogeneity of variances), it is better to use the overall error term (from the two-way ANOVA) as input in the one-way ANOVA model. This will make it easier to detect any statistically significant differences if they exist (Keppel &amp; Wickens, 2004; Maxwell &amp; Delaney, 2004).<\/p>\n<p>When you have failed the homogeneity of variances assumptions, you might consider running separate one-way ANOVAs with separate error terms.<\/p>\n<\/div>\n<p>In the R code below, we\u2019ll group the data by gender and analyze the <strong>simple main effects<\/strong> of education level on Job Satisfaction score. The argument <code>error<\/code> is used to specify the ANOVA model from which the pooled error sum of squares and degrees of freedom are to be calculated.<\/p>\n<pre class=\"r\"><code># Group the data by gender and fit  anova\r\nmodel &lt;- lm(score ~ gender * education_level, data = jobsatisfaction)\r\njobsatisfaction %&gt;%\r\n  group_by(gender) %&gt;%\r\n  anova_test(score ~ education_level, error = model)<\/code><\/pre>\n<pre><code>## # A tibble: 2 x 8\r\n##   gender Effect            DFn   DFd     F        p `p&lt;.05`   ges\r\n##   &lt;fct&gt;  &lt;chr&gt;           &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;    &lt;dbl&gt; &lt;chr&gt;   &lt;dbl&gt;\r\n## 1 male   education_level     2    52 132.  3.92e-21 *       0.836\r\n## 2 female education_level     2    52  62.8 1.35e-14 *       0.707<\/code><\/pre>\n<div class=\"success\">\n<p>The simple main effect of \u201ceducation_level\u201d on job satisfaction score was statistically significant for both male and female (p &lt; 0.0001).<\/p>\n<p>In other words, there is a statistically significant difference in mean job satisfaction score between <strong>males<\/strong> educated to either school, college or university level, F(2, 52) = 132, p &lt; 0.0001. The same conclusion holds true for <strong>females<\/strong>, F(2, 52) = 62.8, p &lt; 0.0001.<\/p>\n<\/div>\n<div class=\"warning\">\n<p>Note that, statistical significance of the simple main effect analyses was accepted at a Bonferroni-adjusted alpha level of 0.025. This corresponds to the current level you declare statistical significance at (i.e., p &lt; 0.05) divided by the number of simple main effect you are computing (i.e., 2).<\/p>\n<\/div>\n<\/div>\n<div id=\"compute-pairwise-comparisons\" class=\"section level5\">\n<h5>Compute pairwise comparisons<\/h5>\n<p>A statistically significant simple main effect can be followed up by <strong>multiple pairwise comparisons<\/strong> to determine which group means are different. We\u2019ll now perform multiple pairwise comparisons between the different <code>education_level<\/code> groups by <code>gender<\/code>.<\/p>\n<p>You can run and interpret all possible pairwise comparisons using a Bonferroni adjustment. This can be easily done using the function <code>emmeans_test()<\/code> [rstatix package], a wrapper around the <code>emmeans<\/code> package, which needs to be installed. Emmeans stands for <strong>estimated marginal means<\/strong> (aka least square means or adjusted means).<\/p>\n<p><strong>Compare the score of the different education levels<\/strong> by <code>gender<\/code> levels:<\/p>\n<pre class=\"r\"><code># pairwise comparisons\r\nlibrary(emmeans)\r\npwc &lt;- jobsatisfaction %&gt;% \r\n  group_by(gender) %&gt;%\r\n  emmeans_test(score ~ education_level, p.adjust.method = \"bonferroni\") \r\npwc<\/code><\/pre>\n<pre><code>## # A tibble: 6 x 9\r\n##   gender .y.   group1  group2        df statistic        p    p.adj p.adj.signif\r\n## * &lt;fct&gt;  &lt;chr&gt; &lt;chr&gt;   &lt;chr&gt;      &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt; &lt;chr&gt;       \r\n## 1 male   score school  college       52     -3.07 3.37e- 3 1.01e- 2 *           \r\n## 2 male   score school  university    52    -15.3  6.87e-21 2.06e-20 ****        \r\n## 3 male   score college university    52    -12.1  8.42e-17 2.53e-16 ****        \r\n## 4 female score school  college       52     -2.94 4.95e- 3 1.49e- 2 *           \r\n## 5 female score school  university    52    -10.8  6.07e-15 1.82e-14 ****        \r\n## 6 female score college university    52     -7.90 1.84e-10 5.52e-10 ****<\/code><\/pre>\n<div class=\"success\">\n<p>There was a significant difference of job satisfaction score between all groups for both males and females (p &lt; 0.05).<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"procedure-for-non-significant-two-way-interaction\" class=\"section level4\">\n<h4>Procedure for non-significant two-way interaction<\/h4>\n<div id=\"inspect-main-effects\" class=\"section level5\">\n<h5>Inspect main effects<\/h5>\n<p>If the two-way interaction is not statistically significant, you need to consult the main effect for each of the two variables (gender and education_level) in the ANOVA output.<\/p>\n<pre class=\"r\"><code>res.aov<\/code><\/pre>\n<pre><code>## ANOVA Table (type II tests)\r\n## \r\n##                   Effect DFn DFd       F        p p&lt;.05   ges\r\n## 1                 gender   1  52   0.745 3.92e-01       0.014\r\n## 2        education_level   2  52 187.892 1.60e-24     * 0.878\r\n## 3 gender:education_level   2  52   7.338 2.00e-03     * 0.220<\/code><\/pre>\n<div class=\"success\">\n<p>In our example, there was a statistically significant main effects of education_level (F(2, 52) = 187.89, p &lt; 0.0001) on the job satisfaction score. However, the main effect of gender was not significant, F (1, 52) = 0.74, p = 0.39.<\/p>\n<\/div>\n<\/div>\n<div id=\"compute-pairwise-comparisons-1\" class=\"section level5\">\n<h5>Compute pairwise comparisons<\/h5>\n<p>Perform pairwise comparisons between education level groups to determine which groups are significantly different. Bonferroni adjustment is applied. This analysis can be done using simply the R base function <code>pairwise_t_test()<\/code> or using the function <code>emmeans_test()<\/code>.<\/p>\n<ul>\n<li>Pairwise t-test:<\/li>\n<\/ul>\n<pre class=\"r\"><code>jobsatisfaction %&gt;%\r\n  pairwise_t_test(\r\n    score ~ education_level, \r\n    p.adjust.method = \"bonferroni\"\r\n    )<\/code><\/pre>\n<div class=\"success\">\n<p>All pairwise differences were statistically significant (p &lt; 0.05).<\/p>\n<\/div>\n<ul>\n<li>Pairwise comparisons using Emmeans test. You need to specify the overall model, from which the overall degrees of freedom are to be calculated. This will make it easier to detect any statistically significant differences if they exist.<\/li>\n<\/ul>\n<pre class=\"r\"><code>model &lt;- lm(score ~ gender * education_level, data = jobsatisfaction)\r\njobsatisfaction %&gt;% \r\n  emmeans_test(\r\n    score ~ education_level, p.adjust.method = \"bonferroni\",\r\n    model = model\r\n    )<\/code><\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"report-1\" class=\"section level3\">\n<h3>Report<\/h3>\n<p>A two-way ANOVA was conducted to examine the effects of gender and education level on job satisfaction score.<\/p>\n<p>Residual analysis was performed to test for the assumptions of the two-way ANOVA. Outliers were assessed by box plot method, normality was assessed using Shapiro-Wilk\u2019s normality test and homogeneity of variances was assessed by Levene\u2019s test.<\/p>\n<p>There were no extreme outliers, residuals were normally distributed (p &gt; 0.05) and there was homogeneity of variances (p &gt; 0.05).<\/p>\n<p>There was a statistically significant interaction between gender and education level on job satisfaction score, <code>F(2, 52) = 7.33, p = 0.0016, eta2[g] = 0.22<\/code>.<\/p>\n<p>Consequently, an analysis of simple main effects for education level was performed with statistical significance receiving a Bonferroni adjustment. There was a statistically significant difference in mean \u201cjob satisfaction\u201d scores for both males (<code>F(2, 52) = 132, p &lt; 0.0001<\/code>) and females (<code>F(2, 52) = 62.8, p &lt; 0.0001<\/code>) educated to either school, college or university level.<\/p>\n<p>All pairwise comparisons were analyzed between the different <code>education_level<\/code> groups organized by <code>gender<\/code>. There was a significant difference of Job Satisfaction score between all groups for both males and females (p &lt; 0.05).<\/p>\n<pre class=\"r\"><code># Visualization: box plots with p-values\r\npwc &lt;- pwc %&gt;% add_xy_position(x = \"gender\")\r\nbxp +\r\n  stat_pvalue_manual(pwc) +\r\n  labs(\r\n    subtitle = get_test_label(res.aov, detailed = TRUE),\r\n    caption = get_pwc_label(pwc)\r\n    )<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/r-statistics-2-comparing-groups-means\/figures\/045-anova-analysis-of-variance-two-way-anova-box-plots-with-pvalues-1.png\" width=\"480\" \/><\/p>\n<\/div>\n<\/div>\n<div id=\"three-way-independent-anova\" class=\"section level2\">\n<h2>Three-Way ANOVA<\/h2>\n<p>The <strong>three-way ANOVA<\/strong> is an extension of the two-way ANOVA for assessing whether there is an interaction effect between three independent categorical variables on a continuous outcome variable.<\/p>\n<div id=\"data-preparation-2\" class=\"section level3\">\n<h3>Data preparation<\/h3>\n<p>We\u2019ll use the <code>headache<\/code> dataset [datarium package], which contains the measures of migraine headache episode pain score in 72 participants treated with three different treatments. The participants include 36 males and 36 females. Males and females were further subdivided into whether they were at low or high risk of migraine.<\/p>\n<p>We want to understand how each independent variable (type of treatments, risk of migraine and gender) interact to predict the pain score.<\/p>\n<p>Load the data and inspect one random row by group combinations:<\/p>\n<pre class=\"r\"><code>set.seed(123)\r\ndata(\"headache\", package = \"datarium\")\r\nheadache %&gt;% sample_n_by(gender, risk, treatment, size = 1)<\/code><\/pre>\n<pre><code>## # A tibble: 12 x 5\r\n##      id gender risk  treatment pain_score\r\n##   &lt;int&gt; &lt;fct&gt;  &lt;fct&gt; &lt;fct&gt;          &lt;dbl&gt;\r\n## 1    20 male   high  X              100  \r\n## 2    29 male   high  Y               91.2\r\n## 3    33 male   high  Z               81.3\r\n## 4     6 male   low   X               73.1\r\n## 5    12 male   low   Y               67.9\r\n## 6    13 male   low   Z               75.0\r\n## # \u2026 with 6 more rows<\/code><\/pre>\n<div class=\"block\">\n<p>In this example, the effect of the treatment types is our <strong>focal variable<\/strong>, that is our primary concern. It is thought that the effect of treatments will depend on two other factors, \u201cgender\u201d and \u201crisk\u201d level of migraine, which are called <strong>moderator variables<\/strong>.<\/p>\n<\/div>\n<\/div>\n<div id=\"summary-statistics-2\" class=\"section level3\">\n<h3>Summary statistics<\/h3>\n<p>Compute the mean and the standard deviation (SD) of <code>pain_score<\/code> by groups:<\/p>\n<pre class=\"r\"><code>headache %&gt;%\r\n  group_by(gender, risk, treatment) %&gt;%\r\n  get_summary_stats(pain_score, type = \"mean_sd\")<\/code><\/pre>\n<pre><code>## # A tibble: 12 x 7\r\n##   gender risk  treatment variable       n  mean    sd\r\n##   &lt;fct&gt;  &lt;fct&gt; &lt;fct&gt;     &lt;chr&gt;      &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;\r\n## 1 male   high  X         pain_score     6  92.7  5.12\r\n## 2 male   high  Y         pain_score     6  82.3  5.00\r\n## 3 male   high  Z         pain_score     6  79.7  4.05\r\n## 4 male   low   X         pain_score     6  76.1  3.86\r\n## 5 male   low   Y         pain_score     6  73.1  4.76\r\n## 6 male   low   Z         pain_score     6  74.5  4.89\r\n## # \u2026 with 6 more rows<\/code><\/pre>\n<\/div>\n<div id=\"visualization-2\" class=\"section level3\">\n<h3>Visualization<\/h3>\n<p>Create a box plot of <code>pain_score<\/code> by <code>treatment<\/code>, color lines by risk groups and facet the plot by gender:<\/p>\n<pre class=\"r\"><code>bxp &lt;- ggboxplot(\r\n  headache, x = \"treatment\", y = \"pain_score\", \r\n  color = \"risk\", palette = \"jco\", facet.by = \"gender\"\r\n  )\r\nbxp<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/r-statistics-2-comparing-groups-means\/figures\/045-anova-analysis-of-variance-three-way-independent-1.png\" width=\"480\" \/><\/p>\n<\/div>\n<div id=\"check-assumptions-2\" class=\"section level3\">\n<h3>Check assumptions<\/h3>\n<div id=\"outliers-2\" class=\"section level4\">\n<h4>Outliers<\/h4>\n<p>Identify outliers by groups:<\/p>\n<pre class=\"r\"><code>headache %&gt;%\r\n  group_by(gender, risk, treatment) %&gt;%\r\n  identify_outliers(pain_score)<\/code><\/pre>\n<pre><code>## # A tibble: 4 x 7\r\n##   gender risk  treatment    id pain_score is.outlier is.extreme\r\n##   &lt;fct&gt;  &lt;fct&gt; &lt;fct&gt;     &lt;int&gt;      &lt;dbl&gt; &lt;lgl&gt;      &lt;lgl&gt;     \r\n## 1 female high  X            57       68.4 TRUE       TRUE      \r\n## 2 female high  Y            62       73.1 TRUE       FALSE     \r\n## 3 female high  Z            67       75.0 TRUE       FALSE     \r\n## 4 female high  Z            71       87.1 TRUE       FALSE<\/code><\/pre>\n<div class=\"success\">\n<p>It can be seen that, the data contain one extreme outlier (id = 57, female at high risk of migraine taking drug X)<\/p>\n<\/div>\n<div class=\"warning\">\n<p>Outliers can be due to: 1) data entry errors, 2) measurement errors or 3) unusual values.<\/p>\n<p>Yo can include the outlier in the analysis anyway if you do not believe the result will be substantially affected. This can be evaluated by comparing the result of the ANOVA test with and without the outlier.<\/p>\n<p>It\u2019s also possible to keep the outliers in the data and perform robust ANOVA test using the WRS2 package.<\/p>\n<\/div>\n<\/div>\n<div id=\"normality-assumption-2\" class=\"section level4\">\n<h4>Normality assumption<\/h4>\n<p><strong>Check normality assumption by analyzing the model residuals<\/strong>. QQ plot and Shapiro-Wilk test of normality are used.<\/p>\n<pre class=\"r\"><code>model  &lt;- lm(pain_score ~ gender*risk*treatment, data = headache)\r\n# Create a QQ plot of residuals\r\nggqqplot(residuals(model))\r\n# Compute Shapiro-Wilk test of normality\r\nshapiro_test(residuals(model))<\/code><\/pre>\n<pre><code>## # A tibble: 1 x 3\r\n##   variable         statistic p.value\r\n##   &lt;chr&gt;                &lt;dbl&gt;   &lt;dbl&gt;\r\n## 1 residuals(model)     0.982   0.398<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/r-statistics-2-comparing-groups-means\/figures\/045-anova-analysis-of-variance-qq-plot-residuals-1.png\" width=\"384\" \/><\/p>\n<div class=\"success\">\n<p>In the QQ plot, as all the points fall approximately along the reference line, we can assume normality. This conclusion is supported by the Shapiro-Wilk test. The p-value is not significant (p = 0.4), so we can assume normality.<\/p>\n<\/div>\n<p><strong>Check normality assumption by groups<\/strong>. Computing Shapiro-Wilk test for each combinations of factor levels.<\/p>\n<pre class=\"r\"><code>headache %&gt;%\r\n  group_by(gender, risk, treatment) %&gt;%\r\n  shapiro_test(pain_score)<\/code><\/pre>\n<pre><code>## # A tibble: 12 x 6\r\n##   gender risk  treatment variable   statistic     p\r\n##   &lt;fct&gt;  &lt;fct&gt; &lt;fct&gt;     &lt;chr&gt;          &lt;dbl&gt; &lt;dbl&gt;\r\n## 1 male   high  X         pain_score     0.958 0.808\r\n## 2 male   high  Y         pain_score     0.902 0.384\r\n## 3 male   high  Z         pain_score     0.955 0.784\r\n## 4 male   low   X         pain_score     0.982 0.962\r\n## 5 male   low   Y         pain_score     0.920 0.507\r\n## 6 male   low   Z         pain_score     0.924 0.535\r\n## # \u2026 with 6 more rows<\/code><\/pre>\n<div class=\"success\">\n<p>The pain scores were normally distributed (p &gt; 0.05) except for one group (female at high risk of migraine taking drug X, p = 0.0086), as assessed by Shapiro-Wilk\u2019s test of normality.<\/p>\n<\/div>\n<p>Create QQ plot for each cell of design:<\/p>\n<pre class=\"r\"><code>ggqqplot(headache, \"pain_score\", ggtheme = theme_bw()) +\r\n  facet_grid(gender + risk ~ treatment, labeller = \"label_both\")<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/r-statistics-2-comparing-groups-means\/figures\/045-anova-analysis-of-variance-qq-plot-1.png\" width=\"576\" \/><\/p>\n<div class=\"success\">\n<p>All the points fall approximately along the reference line, except for one group (female at high risk of migraine taking drug X), where we already identified an extreme outlier.<\/p>\n<\/div>\n<\/div>\n<div id=\"homogneity-of-variance-assumption-2\" class=\"section level4\">\n<h4>Homogneity of variance assumption<\/h4>\n<p>This can be checked using the Levene\u2019s test:<\/p>\n<pre class=\"r\"><code>headache %&gt;% levene_test(pain_score ~ gender*risk*treatment)<\/code><\/pre>\n<pre><code>## # A tibble: 1 x 4\r\n##     df1   df2 statistic     p\r\n##   &lt;int&gt; &lt;int&gt;     &lt;dbl&gt; &lt;dbl&gt;\r\n## 1    11    60     0.179 0.998<\/code><\/pre>\n<div class=\"success\">\n<p>The Levene\u2019s test is not significant (p &gt; 0.05). Therefore, we can assume the homogeneity of variances in the different groups.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"computation-2\" class=\"section level3\">\n<h3>Computation<\/h3>\n<pre class=\"r\"><code>res.aov &lt;- headache %&gt;% anova_test(pain_score ~ gender*risk*treatment)\r\nres.aov<\/code><\/pre>\n<pre><code>## ANOVA Table (type II tests)\r\n## \r\n##                  Effect DFn DFd      F        p p&lt;.05   ges\r\n## 1                gender   1  60 16.196 1.63e-04     * 0.213\r\n## 2                  risk   1  60 92.699 8.80e-14     * 0.607\r\n## 3             treatment   2  60  7.318 1.00e-03     * 0.196\r\n## 4           gender:risk   1  60  0.141 7.08e-01       0.002\r\n## 5      gender:treatment   2  60  3.338 4.20e-02     * 0.100\r\n## 6        risk:treatment   2  60  0.713 4.94e-01       0.023\r\n## 7 gender:risk:treatment   2  60  7.406 1.00e-03     * 0.198<\/code><\/pre>\n<div class=\"success\">\n<p>There was a statistically significant three-way interaction between gender, risk and treatment, <em>F(2, 60) = 7.41, p = 0.001<\/em>.<\/p>\n<\/div>\n<\/div>\n<div id=\"post-hoc-tests-1\" class=\"section level3\">\n<h3>Post-hoc tests<\/h3>\n<p><strong>If there is a significant three-way interaction effect<\/strong>, you can decompose it into:<\/p>\n<ul>\n<li><strong>Simple two-way interaction<\/strong>: run two-way interaction at each level of third variable,<\/li>\n<li><strong>Simple simple main effect<\/strong>: run one-way model at each level of second variable, and<\/li>\n<li><strong>simple simple pairwise comparisons<\/strong>: run pairwise or other post-hoc comparisons if necessary.<\/li>\n<\/ul>\n<p><strong>If you do not have a statistically significant three-way interaction<\/strong>, you need to determine whether you have any statistically significant two-way interaction from the ANOVA output. You can follow up a significant two-way interaction by simple main effects analyses and pairwise comparisons between groups if necessary.<\/p>\n<p>In this section we\u2019ll describe the procedure for a significant three-way interaction.<\/p>\n<div id=\"compute-simple-two-way-interactions\" class=\"section level4\">\n<h4>Compute simple two-way interactions<\/h4>\n<p>You are free to decide which two variables will form the simple two-way interactions and which variable will act as the third (moderator) variable. In our example, we want to evaluate the effect of <code>risk*treatment<\/code> interaction on <code>pain_score<\/code> at each level of gender.<\/p>\n<div class=\"warning\">\n<p>Note that, when doing the two-way interaction analysis, it\u2019s better to use the overall error term (or residuals) from the three-way ANOVA result, obtained previously using the whole dataset. This is particularly recommended when the homogeneity of variance assumption is met (Keppel &amp; Wickens, 2004).<\/p>\n<p>The use of group-specific error term is \u201csafer\u201d from any violations of the assumptions. However, the pooled error terms have greater power \u2013 particularly with small sample sizes \u2013 but are susceptible to problems if there are any violations of assumptions.<\/p>\n<\/div>\n<p>In the R code below, we\u2019ll group the data by gender and fit the <code>treatment*risk<\/code> two-way interaction. The argument <code>error<\/code> is used to specify the three-way ANOVA model from which the pooled error sum of squares and degrees of freedom are to be calculated.<\/p>\n<pre class=\"r\"><code># Group the data by gender and \r\n# fit simple two-way interaction \r\nmodel  &lt;- lm(pain_score ~ gender*risk*treatment, data = headache)\r\nheadache %&gt;%\r\n  group_by(gender) %&gt;%\r\n  anova_test(pain_score ~ risk*treatment, error = model)<\/code><\/pre>\n<pre><code>## # A tibble: 6 x 8\r\n##   gender Effect           DFn   DFd      F             p `p&lt;.05`   ges\r\n##   &lt;fct&gt;  &lt;chr&gt;          &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt;         &lt;dbl&gt; &lt;chr&gt;   &lt;dbl&gt;\r\n## 1 male   risk               1    60 50.0   0.00000000187 *       0.455\r\n## 2 male   treatment          2    60 10.2   0.000157      *       0.253\r\n## 3 male   risk:treatment     2    60  5.25  0.008         *       0.149\r\n## 4 female risk               1    60 42.8   0.0000000150  *       0.416\r\n## 5 female treatment          2    60  0.482 0.62          \"\"      0.016\r\n## 6 female risk:treatment     2    60  2.87  0.065         \"\"      0.087<\/code><\/pre>\n<div class=\"success\">\n<p>There was a statistically significant simple two-way interaction between risk and treatment (<strong>risk:treatment<\/strong>) for males, F(2, 60) = 5.25, p = 0.008, but not for females, F(2, 60) = 2.87, p = 0.065.<\/p>\n<p>For males, this result suggests that the effect of treatment on \u201cpain_score\u201d depends on one\u2019s \u201crisk\u201d of migraine. In other words, the risk moderates the effect of the type of treatment on pain_score.<\/p>\n<\/div>\n<div class=\"warning\">\n<p>Note that, statistical significance of a simple two-way interaction was accepted at a Bonferroni-adjusted alpha level of 0.025. This corresponds to the current level you declare statistical significance at (i.e., p &lt; 0.05) divided by the number of simple two-way interaction you are computing (i.e., 2).<\/p>\n<\/div>\n<\/div>\n<div id=\"compute-simple-simple-main-effects\" class=\"section level4\">\n<h4>Compute simple simple main effects<\/h4>\n<p>A statistically significant simple two-way interaction can be followed up with <strong>simple simple main effects<\/strong>. In our example, you could therefore investigate the effect of <code>treatment<\/code> on <code>pain_score<\/code> at every level of <code>risk<\/code> or investigate the effect of <code>risk<\/code> at every level of <code>treatment<\/code>.<\/p>\n<div class=\"warning\">\n<p>You will only need to do this for the simple two-way interaction for \u201cmales\u201d as this was the only simple two-way interaction that was statistically significant. The error term again comes from the three-way ANOVA.<\/p>\n<\/div>\n<p>Group the data by <code>gender<\/code> and <code>risk<\/code> and analyze the <strong>simple simple main effects<\/strong> of treatment on pain_score:<\/p>\n<pre class=\"r\"><code># Group the data by gender and risk, and fit  anova\r\ntreatment.effect &lt;- headache %&gt;%\r\n  group_by(gender, risk) %&gt;%\r\n  anova_test(pain_score ~ treatment, error = model)\r\ntreatment.effect %&gt;% filter(gender == \"male\")<\/code><\/pre>\n<pre><code>## # A tibble: 2 x 9\r\n##   gender risk  Effect      DFn   DFd     F         p `p&lt;.05`   ges\r\n##   &lt;fct&gt;  &lt;fct&gt; &lt;chr&gt;     &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt; &lt;chr&gt;   &lt;dbl&gt;\r\n## 1 male   high  treatment     2    60 14.8  0.0000061 *       0.33 \r\n## 2 male   low   treatment     2    60  0.66 0.521     \"\"      0.022<\/code><\/pre>\n<div class=\"warning\">\n<p>In the table above, we only need the results for the simple simple main effects of treatment for: (1) \u201cmales\u201d at \u201clow\u201d risk; and (2) \u201cmales\u201d at \u201chigh\u201d risk.<\/p>\n<p>Statistical significance was accepted at a Bonferroni-adjusted alpha level of 0.025, that is 0.05 divided y the number of simple simple main effects you are computing (i.e., 2).<\/p>\n<\/div>\n<div class=\"success\">\n<p>There was a statistically significant simple simple main effect of treatment for males at high risk of migraine, F(2, 60) = 14.8, p &lt; 0.0001), but not for males at low risk of migraine, F(2, 60) = 0.66, p = 0.521.<\/p>\n<p>This analysis indicates that, the type of treatment taken has a statistically significant effect on pain_score in males who are at high risk.<\/p>\n<p>In other words, the mean pain_score in the treatment X, Y and Z groups was statistically significantly different for males who at high risk, but not for males at low risk.<\/p>\n<\/div>\n<\/div>\n<div id=\"compute-simple-simple-comparisons\" class=\"section level4\">\n<h4>Compute simple simple comparisons<\/h4>\n<p>A statistically significant simple simple main effect can be followed up by <strong>multiple pairwise comparisons<\/strong> to determine which group means are different. This can be easily done using the function <code>emmeans_test()<\/code> [rstatix package] described in the previous section.<\/p>\n<p><strong>Compare the different treatments<\/strong> by <code>gender<\/code> and <code>risk<\/code> variables:<\/p>\n<pre class=\"r\"><code># Pairwise comparisons\r\nlibrary(emmeans)\r\npwc &lt;- headache %&gt;%\r\n  group_by(gender, risk) %&gt;%\r\n  emmeans_test(pain_score ~ treatment, p.adjust.method = \"bonferroni\") %&gt;%\r\n  select(-df, -statistic, -p) # Remove details\r\n# Show comparison results for male at high risk\r\npwc %&gt;% filter(gender == \"male\", risk == \"high\")<\/code><\/pre>\n<pre><code>## # A tibble: 3 x 7\r\n##   gender risk  .y.        group1 group2      p.adj p.adj.signif\r\n##   &lt;fct&gt;  &lt;fct&gt; &lt;chr&gt;      &lt;chr&gt;  &lt;chr&gt;       &lt;dbl&gt; &lt;chr&gt;       \r\n## 1 male   high  pain_score X      Y      0.000386   ***         \r\n## 2 male   high  pain_score X      Z      0.00000942 ****        \r\n## 3 male   high  pain_score Y      Z      0.897      ns<\/code><\/pre>\n<pre class=\"r\"><code># Estimated marginal means (i.e. adjusted means) \r\n# with 95% confidence interval\r\nget_emmeans(pwc) %&gt;% filter(gender == \"male\", risk == \"high\")<\/code><\/pre>\n<pre><code>## # A tibble: 3 x 9\r\n##   gender risk  treatment emmean    se    df conf.low conf.high method      \r\n##   &lt;fct&gt;  &lt;fct&gt; &lt;fct&gt;      &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;    &lt;dbl&gt;     &lt;dbl&gt; &lt;chr&gt;       \r\n## 1 male   high  X           92.7  1.80    60     89.1      96.3 Emmeans test\r\n## 2 male   high  Y           82.3  1.80    60     78.7      85.9 Emmeans test\r\n## 3 male   high  Z           79.7  1.80    60     76.1      83.3 Emmeans test<\/code><\/pre>\n<p>In the pairwise comparisons table above, we are interested only in the simple simple comparisons for males at a high risk of a migraine headache. In our example, there are three possible combinations of group differences.<\/p>\n<div class=\"success\">\n<p>For male at high risk, there was a statistically significant mean difference between treatment X and treatment Y of 10.4 (p.adj &lt; 0.001), and between treatment X and treatment Z of 13.1 (p.adj &lt; 0.0001).<\/p>\n<p>However, the difference between treatment Y and treatment Z (2.66) was not statistically significant, p.adj = 0.897.<\/p>\n<\/div>\n<\/div>\n<div id=\"report-2\" class=\"section level4\">\n<h4>Report<\/h4>\n<p>A three-way ANOVA was conducted to determine the effects of gender, risk and treatment on migraine headache episode <code>pain_score<\/code>.<\/p>\n<p>Residual analysis was performed to test for the assumptions of the three-way ANOVA. Normality was assessed using Shapiro-Wilk\u2019s normality test and homogeneity of variances was assessed by Levene\u2019s test.<\/p>\n<p>Residuals were normally distributed (p &gt; 0.05) and there was homogeneity of variances (p &gt; 0.05).<\/p>\n<p>There was a statistically significant three-way interaction between gender, risk and treatment, <code>F(2, 60) = 7.41, p = 0.001<\/code>.<\/p>\n<p>Statistical significance was accepted at the p &lt; 0.025 level for simple two-way interactions and simple simple main effects. There was a statistically significant simple two-way interaction between risk and treatment for males, F(2, 60) = 5.2, p = 0.008, but not for females, F(2, 60) = 2.8, p = 0.065.<\/p>\n<p>There was a statistically significant simple simple main effect of treatment for males at high risk of migraine, F(2, 60) = 14.8, p &lt; 0.0001), but not for males at low risk of migraine, F(2, 60) = 0.66, p = 0.521.<\/p>\n<p>All simple simple pairwise comparisons, between the different treatment groups, were run for males at high risk of migraine with a Bonferroni adjustment applied.<\/p>\n<p>There was a statistically significant mean difference between treatment X and treatment Y. However, the difference between treatment Y and treatment Z, was not statistically significant.<\/p>\n<pre class=\"r\"><code># Visualization: box plots with p-values\r\npwc &lt;- pwc %&gt;% add_xy_position(x = \"treatment\")\r\npwc.filtered &lt;- pwc %&gt;% filter(gender == \"male\", risk == \"high\")\r\nbxp +\r\n  stat_pvalue_manual(\r\n    pwc.filtered, color = \"risk\", linetype = \"risk\", hide.ns = TRUE,\r\n    tip.length = 0, step.increase = 0.1, step.group.by = \"gender\"\r\n  ) +\r\n  labs(\r\n    subtitle = get_test_label(res.aov, detailed = TRUE),\r\n    caption = get_pwc_label(pwc)\r\n    )<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/dn-tutorials\/r-statistics-2-comparing-groups-means\/figures\/045-anova-analysis-of-variance-three-way-anova-box-plots-with-p-values-1.png\" width=\"576\" \/><\/p>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"summary\" class=\"section level2\">\n<h2>Summary<\/h2>\n<p>This article describes how to compute and interpret ANOVA in R. We also explain the assumptions made by ANOVA tests and provide practical examples of R codes to check whether the test assumptions are met.<\/p>\n<\/div>\n<\/div>\n<p><!--end rdoc--><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The ANOVA test (or Analysis of Variance) is used to compare the mean of multiple groups. This chapter describes the different types of ANOVA for comparing independent groups, including: <\/p>\n<p>1) One-way ANOVA: an extension of the independent samples t-test for comparing the means in a situation where there are more than two groups.<br \/>\n2) two-way ANOVA used to evaluate simultaneously the effect of two different grouping variables on a continuous outcome variable.<br \/>\n3) three-way ANOVA used to evaluate simultaneously the effect of three different grouping variables on a continuous outcome variable. <\/p>\n","protected":false},"author":1,"featured_media":9034,"parent":0,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","class_list":["post-10872","dt_lessons","type-dt_lessons","status-publish","has-post-thumbnail","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>ANOVA in R: The Ultimate Guide - Datanovia<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"ANOVA in R: The Ultimate Guide - Datanovia\" \/>\n<meta property=\"og:description\" content=\"The ANOVA test (or Analysis of Variance) is used to compare the mean of multiple groups. This chapter describes the different types of ANOVA for comparing independent groups, including:    1) One-way ANOVA: an extension of the independent samples t-test for comparing the means in a situation where there are more than two groups.  2) two-way ANOVA used to evaluate simultaneously the effect of two different grouping variables on a continuous outcome variable.  3) three-way ANOVA used to evaluate simultaneously the effect of three different grouping variables on a continuous outcome variable.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/\" \/>\n<meta property=\"og:site_name\" content=\"Datanovia\" \/>\n<meta property=\"article:modified_time\" content=\"2019-11-29T06:29:10+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2019\/05\/X36639381_658465867826694_2093538575294398464_n.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"512\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"33 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/\",\"url\":\"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/\",\"name\":\"ANOVA in R: The Ultimate Guide - Datanovia\",\"isPartOf\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2019\/05\/X36639381_658465867826694_2093538575294398464_n.jpg\",\"datePublished\":\"2019-11-29T06:00:05+00:00\",\"dateModified\":\"2019-11-29T06:29:10+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/#primaryimage\",\"url\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2019\/05\/X36639381_658465867826694_2093538575294398464_n.jpg\",\"contentUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2019\/05\/X36639381_658465867826694_2093538575294398464_n.jpg\",\"width\":1024,\"height\":512},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.datanovia.com\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Lessons\",\"item\":\"https:\/\/www.datanovia.com\/en\/lessons\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"ANOVA in R\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#website\",\"url\":\"https:\/\/www.datanovia.com\/en\/\",\"name\":\"Datanovia\",\"description\":\"Data Mining and Statistics for Decision Support\",\"publisher\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.datanovia.com\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#organization\",\"name\":\"Datanovia\",\"url\":\"https:\/\/www.datanovia.com\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png\",\"contentUrl\":\"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png\",\"width\":98,\"height\":99,\"caption\":\"Datanovia\"},\"image\":{\"@id\":\"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"ANOVA in R: The Ultimate Guide - Datanovia","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/","og_locale":"en_US","og_type":"article","og_title":"ANOVA in R: The Ultimate Guide - Datanovia","og_description":"The ANOVA test (or Analysis of Variance) is used to compare the mean of multiple groups. This chapter describes the different types of ANOVA for comparing independent groups, including:    1) One-way ANOVA: an extension of the independent samples t-test for comparing the means in a situation where there are more than two groups.  2) two-way ANOVA used to evaluate simultaneously the effect of two different grouping variables on a continuous outcome variable.  3) three-way ANOVA used to evaluate simultaneously the effect of three different grouping variables on a continuous outcome variable.","og_url":"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/","og_site_name":"Datanovia","article_modified_time":"2019-11-29T06:29:10+00:00","og_image":[{"width":1024,"height":512,"url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2019\/05\/X36639381_658465867826694_2093538575294398464_n.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"33 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/","url":"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/","name":"ANOVA in R: The Ultimate Guide - Datanovia","isPartOf":{"@id":"https:\/\/www.datanovia.com\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/#primaryimage"},"image":{"@id":"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/#primaryimage"},"thumbnailUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2019\/05\/X36639381_658465867826694_2093538575294398464_n.jpg","datePublished":"2019-11-29T06:00:05+00:00","dateModified":"2019-11-29T06:29:10+00:00","breadcrumb":{"@id":"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/#primaryimage","url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2019\/05\/X36639381_658465867826694_2093538575294398464_n.jpg","contentUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2019\/05\/X36639381_658465867826694_2093538575294398464_n.jpg","width":1024,"height":512},{"@type":"BreadcrumbList","@id":"https:\/\/www.datanovia.com\/en\/lessons\/anova-in-r\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.datanovia.com\/en\/"},{"@type":"ListItem","position":2,"name":"Lessons","item":"https:\/\/www.datanovia.com\/en\/lessons\/"},{"@type":"ListItem","position":3,"name":"ANOVA in R"}]},{"@type":"WebSite","@id":"https:\/\/www.datanovia.com\/en\/#website","url":"https:\/\/www.datanovia.com\/en\/","name":"Datanovia","description":"Data Mining and Statistics for Decision Support","publisher":{"@id":"https:\/\/www.datanovia.com\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.datanovia.com\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.datanovia.com\/en\/#organization","name":"Datanovia","url":"https:\/\/www.datanovia.com\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/","url":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png","contentUrl":"https:\/\/www.datanovia.com\/en\/wp-content\/uploads\/2018\/09\/datanovia-logo.png","width":98,"height":99,"caption":"Datanovia"},"image":{"@id":"https:\/\/www.datanovia.com\/en\/#\/schema\/logo\/image\/"}}]}},"multi-rating":{"mr_rating_results":[]},"_links":{"self":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons\/10872","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons"}],"about":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/types\/dt_lessons"}],"author":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/comments?post=10872"}],"version-history":[{"count":1,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons\/10872\/revisions"}],"predecessor-version":[{"id":10874,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/dt_lessons\/10872\/revisions\/10874"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/media\/9034"}],"wp:attachment":[{"href":"https:\/\/www.datanovia.com\/en\/wp-json\/wp\/v2\/media?parent=10872"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}