Independent Samples t-Test Calculator | Compare Two Group Means Online

Key Takeaways: Independent Samples t-Test

Tip

Purpose: Compare means between two unrelated/independent groups
When to use: For continuous data when comparing two separate groups
Assumptions: Independence, normality of distributions, homogeneity of variances (for Student’s t-test)
Variations: Student’s t-test (equal variances) and Welch’s t-test (unequal variances)
Null hypothesis: The two population means are equal (\(H_0: \mu_1 = \mu_2\))
Interpretation: If p < 0.05, there is a significant difference between the group means
Recommended default: Welch’s t-test (more robust when variances differ)

What is the Independent Samples t-Test?

The independent samples t-test (also called two-sample t-test) is a statistical method used to compare the means of two unrelated groups to determine if there is a significant difference between them. It is one of the most commonly used statistical tests in research, particularly in fields like psychology, medicine, and education.

Tip

When to use the independent samples t-test:

When comparing means between two separate/unrelated groups
When your data is measured on a continuous scale
When your samples are drawn from normally distributed populations
When you need to determine if observed differences are statistically significant

This online calculator allows you to quickly perform an independent samples t-test, check its assumptions, and visualize your data with clear explanations of the results.

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 1400

library(shiny)
library(bslib)
library(ggplot2)
library(bsicons)
library(vroom)
library(shinyjs)
library(Formula)
#library(car) # For Levene's test
f_levene_test <- function(y, group, center = median, ...) {
  if (!is.numeric(y)) 
    stop(deparse(substitute(y)), " is not a numeric variable")
  
  # Convert group to factor if needed
  if (!is.factor(group)) {
    group <- as.factor(group)
  }
  valid <- complete.cases(y, group)
  meds <- tapply(y[valid], group[valid], center, ...)
  resp <- abs(y - meds[group])
  table <- anova(lm(resp ~ group))[, c(1, 4, 5)]
  rownames(table) <- c("group", " ")
  attr(table, "heading") <- paste("Levene's Test for Homogeneity of Variance (center = ", 
                                  deparse(substitute(center)), ")", sep="")
  
  return(table)
}

ui <- page_sidebar(
  title = "Independent Samples t-Test Calculator",
  useShinyjs(),  # Enable shinyjs for resetting inputs
  sidebar = sidebar(
    width = 400,

    card(
      card_header("Data Input"),
      accordion(
        accordion_panel(
          "Manual Input",
          layout_column_wrap(
            width = 1/2,
            style = css(grid_template_columns = "1fr 1fr"),
            textAreaInput("group_input", "Grouping variable [categorical, One value per row]", rows = 8,
                          placeholder = "Paste values here (only two levels)..."),
            textAreaInput("response_input", "Response variable [numeric, One value per row]", rows = 8,
                          placeholder = "Paste values here...")
          ),
          div(
            actionLink("use_example", "Use example data", style = "color:#0275d8;"),
            tags$span(bs_icon("file-earmark-text"), style = "margin-left: 5px; color: #0275d8;")
          )
        ),
        accordion_panel(
          "File Upload",
          fileInput("file_upload", "Upload CSV or TXT file:",
                   accept = c("text/csv", "text/plain", ".csv", ".txt")),
          checkboxInput("header", "File has header", TRUE),
          conditionalPanel(
            condition = "output.file_uploaded",
            div(
              layout_column_wrap(
                width = 1/2,
                style = css(grid_template_columns = "1fr 1fr"),
                selectInput("group_var", "Grouping variable:", choices = NULL),
                selectInput("response_var", "Response variable:", choices = NULL)
              ),
              actionButton("clear_file", "Clear File", class = "btn-danger btn-sm")
            )
          )
        ),
        id = "input_method",
        open = 1
      ),
      
      # Advanced Options accordion with t-test specific options
      accordion(
        accordion_panel(
          "Advanced Options",
          radioButtons("alternative", tags$strong("Alternative hypothesis:"),
                      choices = c("Two-sided" = "two.sided", 
                                 "Difference < 0" = "less",
                                 "Difference > 0" = "greater"),
                      selected = "two.sided"),
          radioButtons("var_equal", tags$strong("Equal variances?"),
                      choices = c("Yes (Student's t)" = "TRUE", 
                                 "No (Welch's t)" = "FALSE"),
                      selected = "FALSE"),
          numericInput("conf_level", tags$strong("Confidence level:"), 
                      value = 0.95, 
                      min = 0.5, 
                      max = 0.99, 
                      step = 0.01)
        ),
        open = FALSE
      ),
      
      actionButton("run_test", "Run Test", class = "btn btn-primary")
    ),

    hr(),

    card(
      card_header("Interpretation"),
      card_body(
        div(class = "alert alert-info",
          tags$ul(
            tags$li("The independent samples t-test compares means between two unrelated groups."),
            tags$li(tags$b("Null hypothesis:"), " Both group means are equal."),
            tags$li(tags$b("Alternative:"), " The means are not equal (or as specified)."),
            tags$li("If p-value < 0.05, there is a significant difference between the group means."),
            tags$li("Welch's t-test does not assume equal variances (recommended default)."),
            tags$li("Cohen's d effect size: 0.2 (small), 0.5 (medium), 0.8 (large)")
          )
        )
      )
    )
  ),

  layout_column_wrap(
    width = 1,

    card(
      card_header("Test Results"),
      card_body(
        navset_tab(
          nav_panel("Results", 
                    uiOutput("error_message"), 
                    verbatimTextOutput("test_results")),
          nav_panel("Assumptions", 
                    navset_tab(
                      nav_panel("Normality",
                                plotOutput("qq_plot"),
                                verbatimTextOutput("shapiro_test"),
                                div(class = "alert alert-info mt-3",
                                   "If p < 0.05 in the Shapiro-Wilk test, your data significantly deviates from normality. Consider using non-parametric tests.")),
                      nav_panel("Homogeneity of Variance",
                                verbatimTextOutput("levene_test"),
                                div(class = "alert alert-info mt-3",
                                   "If p < 0.05 in Levene's test, the groups have significantly different variances. Use Welch's t-test instead of Student's t-test."))
                    )
                   ),
          nav_panel("Explanation", div(style = "font-size: 0.9rem;",
            p("The independent samples t-test compares the means of two independent groups:"),
            tags$ul(
              tags$li("It assumes both samples are drawn from normally distributed populations."),
              tags$li("Student's t-test assumes equal variances between groups. Welch's t-test does not."),
              tags$li("The test compares the observed difference in means to what would be expected by chance.")
            ),
            p("Statistical References:"),
            tags$ul(
              tags$li("Student, B. (1908). The probable error of a mean. Biometrika, 6(1), 1-25."),
              tags$li("Welch, B. L. (1947). The generalization of \"Student's\" problem when several different population variances are involved. Biometrika, 34(1/2), 28-35.")
            )
          ))
        )
      )
    ),

    card(
      card_header("Visual Assessment"),
      card_body(
        navset_tab(
          nav_panel("Mean Plot",
            navset_tab(
              nav_panel("Plot", plotOutput("meanplot")),
              nav_panel("Explanation", div(style = "font-size: 0.9rem;",
                p("The mean plot shows the mean of each group with confidence intervals:"),
                tags$ul(
                  tags$li("The dot represents the mean value of each group."),
                  tags$li("Error bars show the 95% confidence interval for each mean."),
                  tags$li("Non-overlapping error bars typically indicate a significant difference.")
                )
              ))
            )
          ),
          nav_panel("Boxplot",
            navset_tab(
              nav_panel("Plot", plotOutput("boxplot")),
              nav_panel("Explanation", div(style = "font-size: 0.9rem;",
                p("The boxplot shows the distribution of each group:"),
                tags$ul(
                  tags$li("The box represents the interquartile range (IQR) with the median shown as a line."),
                  tags$li("The notch displays the 95% confidence interval around the median."),
                  tags$li("Whiskers extend to the smallest and largest values within 1.5 times the IQR."),
                  tags$li("Points outside the whiskers are potential outliers.")
                )
              ))
            )
          ),
          nav_panel("Density Plot",
            navset_tab(
              nav_panel("Plot", plotOutput("densityplot")),
              nav_panel("Explanation", div(style = "font-size: 0.9rem;",
                p("The density plot shows the distribution of each group:"),
                tags$ul(
                  tags$li("The shape shows the probability distribution of values in each group."),
                  tags$li("The vertical dashed lines show the mean of each group."),
                  tags$li("The spread indicates the variance within each group."),
                  tags$li("The distance between the vertical lines shows the effect size (mean difference).")
                )
              ))
            )
          )
        )
      )
    )
  )
)

server <- function(input, output, session) {
  # Example data
  example_group <- "Control\nControl\nControl\nControl\nControl\nControl\nControl\nControl\nControl\nControl\nTreatment\nTreatment\nTreatment\nTreatment\nTreatment\nTreatment\nTreatment\nTreatment\nTreatment\nTreatment"
  example_response <- "5.2\n6.1\n5.8\n5.5\n5.9\n6.2\n5.7\n6.0\n5.6\n5.8\n7.1\n7.5\n6.9\n7.2\n7.0\n7.3\n6.8\n7.4\n7.1\n6.9"

  # Track input method
  input_method <- reactiveVal("manual")
  
  # Function to clear file inputs
  clear_file_inputs <- function() {
    updateSelectInput(session, "group_var", choices = NULL)
    updateSelectInput(session, "response_var", choices = NULL)
    reset("file_upload")
  }
  
  # Function to clear text inputs
  clear_text_inputs <- function() {
    updateTextAreaInput(session, "group_input", value = "")
    updateTextAreaInput(session, "response_input", value = "")
  }

  # When example data is used, clear file inputs and set text inputs
  observeEvent(input$use_example, {
    input_method("manual")
    clear_file_inputs()
    updateTextAreaInput(session, "group_input", value = example_group)
    updateTextAreaInput(session, "response_input", value = example_response)
  })

  # When file is uploaded, clear text inputs and set file method
  observeEvent(input$file_upload, {
    if (!is.null(input$file_upload)) {
      input_method("file")
      clear_text_inputs()
      
      # Add a loading indicator
      showNotification("Processing file...", type = "message", id = "fileLoading")
    }
  })

  # When clear file button is clicked, clear file and set manual method
  observeEvent(input$clear_file, {
    input_method("manual")
    clear_file_inputs()
  })
  
  # When text inputs change, clear file inputs if they have content
  observeEvent(input$group_input, {
    if (!is.null(input$group_input) && nchar(input$group_input) > 0) {
      input_method("manual")
      clear_file_inputs()
    }
  }, ignoreInit = TRUE)
  
  observeEvent(input$response_input, {
    if (!is.null(input$response_input) && nchar(input$response_input) > 0) {
      input_method("manual")
      clear_file_inputs()
    }
  }, ignoreInit = TRUE)

  file_data <- reactive({
    req(input$file_upload)
    tryCatch({
      data <- vroom::vroom(input$file_upload$datapath, delim = NULL, col_names = input$header, show_col_types = FALSE)
      removeNotification("fileLoading")
      return(data)
    }, error = function(e) {
      removeNotification("fileLoading")
      showNotification(paste("File read error:", e$message), type = "error")
      NULL
    })
  })

  observe({
    df <- file_data()
    if (!is.null(df)) {
      # Get variable types
      var_types <- sapply(df, function(x) {
        if(is.numeric(x)) return("numeric")
        else return("categorical")
      })
      
      # Identify categorical and numeric variables
      cat_vars <- names(df)[var_types == "categorical"]
      num_vars <- names(df)[var_types == "numeric"]
      
      # Also include character variables with 2 unique values as potential group variables
      for(col in names(df)) {
        if(!col %in% cat_vars && !is.numeric(df[[col]])) {
          unique_vals <- unique(na.omit(df[[col]]))
          if(length(unique_vals) <= 5) {  # Allow up to 5 levels for grouping
            cat_vars <- c(cat_vars, col)
          }
        }
      }
      
      # Update select inputs
      updateSelectInput(session, "group_var", choices = cat_vars)
      updateSelectInput(session, "response_var", choices = num_vars)
    }
  })

  output$file_uploaded <- reactive({
    !is.null(input$file_upload)
  })
  outputOptions(output, "file_uploaded", suspendWhenHidden = FALSE)

  # Function to parse text input for numeric values
  parse_numeric_input <- function(text) {
    if (is.null(text) || text == "") return(NULL)
    input_lines <- strsplit(text, "\\r?\\n")[[1]]
    input_lines <- input_lines[input_lines != ""]
    numeric_values <- suppressWarnings(as.numeric(input_lines))
    return(numeric_values)
  }
  
  # Function to parse text input for categorical/grouping values
  parse_group_input <- function(text) {
    if (is.null(text) || text == "") return(NULL)
    input_lines <- strsplit(text, "\\r?\\n")[[1]]
    input_lines <- input_lines[input_lines != ""]
    return(input_lines)
  }

  # Create a data frame with the manual input
  manual_data <- reactive({
    grp <- parse_group_input(input$group_input)
    resp <- parse_numeric_input(input$response_input)
    
    if (is.null(grp) || is.null(resp)) return(NULL)
    
    # If lengths are different, truncate to the shorter length
    min_length <- min(length(grp), length(resp))
    grp <- grp[1:min_length]
    resp <- resp[1:min_length]
    
    # Remove any NA values in the numeric response
    valid_idx <- !is.na(resp)
    if(sum(valid_idx) == 0) return(NULL)
    
    data.frame(
      group = grp[valid_idx],
      response = resp[valid_idx]
    )
  })
  
  # Get the data from either manual input or file upload
  analysis_data <- reactive({
    if(input_method() == "file" && !is.null(file_data()) && 
       !is.null(input$group_var) && !is.null(input$response_var)) {
      df <- file_data()
      result <- data.frame(
        group = df[[input$group_var]],
        response = df[[input$response_var]]
      ) |> na.omit()
      return(result)
    } else {
      return(manual_data())
    }
  })
  
  # Validate the data for analysis
  validate_data <- reactive({
    data <- analysis_data()
    
    if(is.null(data) || nrow(data) == 0) {
      return("Error: Please provide valid input data.")
    }
    
    # Check if response values are numeric
    if(any(is.na(data$response))) {
      return("Error: Response values must be numeric.")
    }
    
    # Check that group variable has exactly two levels
    unique_groups <- unique(data$group)
    if(length(unique_groups) != 2) {
      return(paste("Error: Grouping variable must have exactly 2 levels. Found", length(unique_groups), "levels."))
    }
    
    # Check minimum sample size per group
    group_counts <- table(data$group)
    if(any(group_counts < 3)) {
      return("Error: Each group should have at least 3 observations for the t-test.")
    }
    
    # Check if all values in a group are identical
    group_values <- split(data$response, data$group)
    if(any(sapply(group_values, function(x) length(unique(x)) == 1))) {
      return("Warning: One of your groups has identical values for all observations. This may affect the test results.")
    }
    
    return(NULL)
  })
  
  output$error_message <- renderUI({
    error <- validate_data()
    if(!is.null(error) && input$run_test > 0) {
      div(class = "alert alert-danger", error)
    }
  })
  
  # Extract values for each group
  group_values <- reactive({
    data <- analysis_data()
    if(is.null(data)) return(NULL)
    
    unique_groups <- unique(data$group)
    if(length(unique_groups) != 2) return(NULL)
    
    list(
      group1 = data$response[data$group == unique_groups[1]],
      group2 = data$response[data$group == unique_groups[2]],
      labels = unique_groups
    )
  })
  
  # Run the t-test
  test_result <- eventReactive(input$run_test, {
    showNotification("Calculating results...", type = "message", id = "calculating")
    
    error <- validate_data()
    if(!is.null(error)) {
      removeNotification("calculating")
      return(NULL)
    }
    
    values <- group_values()
    if(is.null(values)) {
      removeNotification("calculating")
      return(NULL)
    }
    
    # Parse var_equal as logical
    var_equal <- as.logical(input$var_equal)
    
    result <- t.test(
      values$group1, 
      values$group2, 
      paired = FALSE,
      alternative = input$alternative,
      var.equal = var_equal,
      conf.level = input$conf_level
    )
    
    # Add group labels to the result
    result$group_labels <- values$labels
    
    # Calculate Cohen's d effect size
    mean1 <- mean(values$group1)
    mean2 <- mean(values$group2)
    n1 <- length(values$group1)
    n2 <- length(values$group2)
    var1 <- var(values$group1)
    var2 <- var(values$group2)
    
    # Pooled standard deviation
    if (var_equal) {
      pooled_sd <- sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))
    } else {
      pooled_sd <- sqrt((var1 + var2)/2)  # Using average SD for Welch's t
    }
    
    # Cohen's d
    d <- abs(mean1 - mean2) / pooled_sd
    result$cohens_d <- d
    
    # Descriptive statistics
    result$descriptives <- list(
      group1_mean = mean1,
      group1_sd = sqrt(var1),
      group1_n = n1,
      group2_mean = mean2,
      group2_sd = sqrt(var2),
      group2_n = n2
    )
    
    removeNotification("calculating")
    return(result)
  })
  
  # Run Shapiro-Wilk test to check for normality
  shapiro_result <- eventReactive(input$run_test, {
    values <- group_values()
    if(is.null(values)) return(NULL)
    
    list(
      group1 = shapiro.test(values$group1),
      group2 = shapiro.test(values$group2),
      labels = values$labels
    )
  })
  
  # Run Levene's test for homogeneity of variance
  levene_result <- eventReactive(input$run_test, {
    data <- analysis_data()
    if(is.null(data)) return(NULL)
    
    # Convert group to factor to ensure Levene's test works correctly
    data$group <- factor(data$group)
    
    # Run Levene's test
    tryCatch({
      test <- f_levene_test(data$response, data$group, center = "median")
      return(test)
    }, error = function(e) {
      return(NULL)
    })
  })
  
  # Output for Shapiro-Wilk test
  output$shapiro_test <- renderPrint({
    req(input$run_test > 0, !is.null(shapiro_result()))
    
    res <- shapiro_result()
    if(is.null(res)) return(NULL)
    
    cat("Shapiro-Wilk Normality Test Results:\n\n")
    cat(res$labels[1], "group:\n")
    cat("W =", round(res$group1$statistic, 4), ", p-value =", round(res$group1$p.value, 6), "\n")
    if(res$group1$p.value < 0.05) {
      cat("The data significantly deviates from normality.\n\n")
    } else {
      cat("The data appears to be normally distributed.\n\n")
    }
    
    cat(res$labels[2], "group:\n")
    cat("W =", round(res$group2$statistic, 4), ", p-value =", round(res$group2$p.value, 6), "\n")
    if(res$group2$p.value < 0.05) {
      cat("The data significantly deviates from normality.\n\n")
    } else {
      cat("The data appears to be normally distributed.\n\n")
    }
    
    if(res$group1$p.value < 0.05 || res$group2$p.value < 0.05) {
      cat("Since at least one group deviates from normality, you might consider a non-parametric alternative like the Wilcoxon rank-sum test.\n")
    } else {
      cat("Both groups appear normally distributed, which supports the use of the t-test.\n")
    }
  })
  
  # Output for Levene's test
  output$levene_test <- renderPrint({
    req(input$run_test > 0, !is.null(levene_result()))
    
    res <- levene_result()
    if(is.null(res)) {
      cat("Levene's test could not be performed. Check if your data meets the requirements.\n")
      return(NULL)
    }
    
    cat("Levene's Test for Homogeneity of Variance:\n\n")
    cat("F =", round(res$`F value`[1], 4), ", df =", paste(res$Df, collapse = ", "), ", p-value =", round(res$`Pr(>F)`[1], 6), "\n\n")
    
    if(res$`Pr(>F)`[1] < 0.05) {

      cat("The variances between groups are significantly different (heterogeneous).\n")
      cat("Use Welch's t-test (unequal variances) instead of Student's t-test.\n")
    } else {
      cat("The variances between groups are not significantly different (homogeneous).\n")
      cat("Student's t-test (equal variances) may be appropriate, but Welch's t-test is generally robust regardless.\n")
    }
  })
  
  # Output for the t-test
  output$test_results <- renderPrint({
    req(input$run_test > 0, !is.null(test_result()))
    res <- test_result()
    
    if(is.null(res)) {
      return(NULL)
    }
    
    # Format the main test results
    . <- NULL
    type <- if(grepl("Welch", res$method)) "Welch's t-test" else "Student's t-test"
    
    cat("INDEPENDENT SAMPLES T-TEST\n")
    cat("==========================\n\n")
    
    # Descriptive statistics
    cat("Group Statistics:\n")
    cat("-----------------\n")
    stats <- res$descriptives
    group_labels <- res$group_labels
    
    cat(sprintf("Group: %s\n", group_labels[1]))
    cat(sprintf("   n = %d, Mean = %.4f, SD = %.4f\n\n", stats$group1_n, stats$group1_mean, stats$group1_sd))
    
    cat(sprintf("Group: %s\n", group_labels[2]))
    cat(sprintf("   n = %d, Mean = %.4f, SD = %.4f\n\n", stats$group2_n, stats$group2_mean, stats$group2_sd))
    
    # Test statistics
    cat("Test Results:\n")
    cat("-------------\n")
    cat(sprintf("Test: %s (Two-Sample)\n", type))
    cat(sprintf("t = %.4f, df = %.2f, p-value = %.6f\n\n", res$statistic, res$parameter, res$p.value))
    
    # Effect size
    cat("Effect Size:\n")
    cat("-----------\n")
    cat(sprintf("Cohen's d = %.4f\n", res$cohens_d))
    effect_size <- if(res$cohens_d < 0.2) {
      "very small"
    } else if(res$cohens_d < 0.5) {
      "small"
    } else if(res$cohens_d < 0.8) {
      "medium"
    } else {
      "large"
    }
    cat(sprintf("Interpretation: %s effect\n\n", effect_size))
    
    # Mean difference and confidence interval
    mean_diff <- abs(stats$group1_mean - stats$group2_mean)
    cat("Mean Difference:\n")
    cat("----------------\n")
    cat(sprintf("Absolute Difference = %.4f\n", mean_diff))
    cat(sprintf("%.1f%% Confidence Interval: [%.4f, %.4f]\n\n", input$conf_level * 100, res$conf.int[1], res$conf.int[2]))
    
    # Conclusion
    cat("Conclusion:\n")
    cat("-----------\n")
    if(res$p.value < 0.05) {
      cat(sprintf("At the 5%% significance level, we reject the null hypothesis.\n"))
      cat(sprintf("There is a statistically significant difference between the group means.\n"))
    } else {
      cat(sprintf("At the 5%% significance level, we fail to reject the null hypothesis.\n"))
      cat(sprintf("There is not enough evidence to suggest a significant difference between the group means.\n"))
    }
  })
  
  # Normal Q-Q plots
  output$qq_plot <- renderPlot({
    req(input$run_test > 0, !is.null(group_values()))
    
    values <- group_values()
    if(is.null(values)) return(NULL)
    
    # Create Q-Q plots for both groups
    par(mfrow = c(1, 2))
    
    # First group
    qqnorm(values$group1, main = paste("Q-Q Plot for", values$labels[1]), 
           col = "blue", pch = 16)
    qqline(values$group1, col = "red", lwd = 2)
    
    # Second group
    qqnorm(values$group2, main = paste("Q-Q Plot for", values$labels[2]), 
           col = "blue", pch = 16)
    qqline(values$group2, col = "red", lwd = 2)
    
    par(mfrow = c(1, 1))
  })
  
  # Generate mean plot with confidence intervals
  output$meanplot <- renderPlot({
    req(input$run_test > 0, !is.null(test_result()))
    
    res <- test_result()
    if(is.null(res)) return(NULL)
    
    # Extract data for the plot
    stats <- res$descriptives
    group_labels <- res$group_labels
    
    # Create a data frame for plotting
    plot_data <- data.frame(
      Group = factor(c(group_labels[1], group_labels[2]), levels = group_labels),
      Mean = c(stats$group1_mean, stats$group2_mean),
      SE = c(stats$group1_sd / sqrt(stats$group1_n), stats$group2_sd / sqrt(stats$group2_n))
    )
    
    # Calculate confidence interval based on the t-distribution
    ci_factor <- qt(1 - (1 - input$conf_level) / 2, c(stats$group1_n - 1, stats$group2_n - 1))
    
    # Add CI lower and upper bounds
    plot_data$CI_lower <- plot_data$Mean - ci_factor * plot_data$SE
    plot_data$CI_upper <- plot_data$Mean + ci_factor * plot_data$SE
    
    # Create the plot
    ggplot(plot_data, aes(x = Group, y = Mean, color = Group)) +
      geom_point(size = 4) +
      geom_errorbar(aes(ymin = CI_lower, ymax = CI_upper), width = 0.2, size = 1) +
      labs(y = "Mean with Confidence Interval", 
           title = "Group Means with Confidence Intervals",
           subtitle = paste0(input$conf_level * 100, "% Confidence Level")) +
      theme_minimal(base_size = 14) +
      theme(legend.position = "none", 
            plot.title = element_text(hjust = 0.5, face = "bold"),
            plot.subtitle = element_text(hjust = 0.5),
            axis.title.x = element_blank()) +
      scale_color_manual(values = c("#5dade2", "#ff7f0e"))
  })
  
  # Generate boxplot
  output$boxplot <- renderPlot({
    req(input$run_test > 0, !is.null(analysis_data()))
    
    data <- analysis_data()
    if(is.null(data)) return(NULL)
    result <- test_result()
    
    # Create a boxplot
    ggplot(data, aes(x = group, y = response, fill = group)) +
      geom_boxplot(outlier.shape = 16, alpha = 0.7) +
      geom_jitter(width = 0.2, alpha = 0.5) +
      scale_fill_manual(values = c("#5dade2", "#ff7f0e")) +
      labs(y = "Value", 
           subtitle = paste("T-test: p =", format.pval(result$p.value, digits = 3)),
           title = "Comparison of Group Values") +
      theme_minimal(base_size = 14) +
      theme(legend.position = "none", 
            plot.subtitle = element_text(face = "italic"))
  })
  
  # Generate density plot
# Generate density plot
output$densityplot <- renderPlot({
  req(input$run_test > 0, !is.null(test_result()))
  
  res <- test_result()
  if(is.null(res)) return(NULL)
  
  values <- group_values()
  if(is.null(values)) return(NULL)
  
  # Calculate means
  mean1 <- res$descriptives$group1_mean
  mean2 <- res$descriptives$group2_mean
  
  # Find range for x-axis
  all_values <- c(values$group1, values$group2)
  min_val <- min(all_values)
  max_val <- max(all_values)
  range_val <- max_val - min_val
  x_min <- min_val - range_val * 0.1
  x_max <- max_val + range_val * 0.1
  
  df <- data.frame(
    Value = all_values,
    Group = factor(rep(values$labels, c(length(values$group1), length(values$group2))))
  )
  
  # Create the density plot
  p <- ggplot(df, aes(x = Value, fill = Group, color = Group)) +
    geom_density(alpha = 0.5) +
    geom_vline(xintercept = c(mean1, mean2), 
               color = c("#5dade2", "#ff7f0e"), 
               linetype = "dashed", 
               linewidth = 1) +
    scale_fill_manual(values = c("#5dade2", "#ff7f0e")) +
    scale_color_manual(values = c("#2874a6", "#d35400")) +
    annotate("text", x = mean1, y = 0, 
             label = paste("Mean =", round(mean1, 2)), 
             hjust = -0.1, vjust = -1, 
             color = "#2874a6", fontface = "bold") +
    annotate("text", x = mean2, y = 0, 
             label = paste("Mean =", round(mean2, 2)), 
             hjust = -0.1, vjust = -2.5, 
             color = "#d35400", fontface = "bold") +
    coord_cartesian(xlim = c(x_min, x_max)) +
    labs(title = "Density Distribution by Group",
         subtitle = paste("Mean difference:", round(abs(mean2 - mean1), 2), 
                          "| Cohen's d =", round(res$cohens_d, 2)),
         x = "Value", 
         y = "Density") +
    theme_minimal(base_size = 14)
  
  # If a confidence interval is available, add shaded area
  if(!is.null(res$conf.int)) {
    # Get max density value for scaling
    max_density <- max(ggplot_build(p)$data[[1]]$density)
    
    # Add confidence interval shading
    p <- p + annotate("rect", 
                    xmin = res$conf.int[1], 
                    xmax = res$conf.int[2], 
                    ymin = 0, 
                    ymax = max_density * 0.15,
                    alpha = 0.2,
                    fill = "darkred") +
           annotate("text", 
                    x = mean(res$conf.int), 
                    y = max_density * 0.17,
                    label = paste0(res$conf.level * 100, "% CI"), 
                    color = "darkred",
                    size = 3)
  }
  
  return(p)
})
  
}

# Run the application
shinyApp(ui = ui, server = server)

Types of t-Tests: Student’s vs. Welch’s

There are two main variations of the independent samples t-test:

Feature	Student’s t-Test	Welch’s t-Test
Assumption of equal variances	Required	Not required
When to use	When variances are similar between groups	When variances may differ between groups
Degrees of freedom	\(n_1 + n_2 - 2\)	Calculated using a complex formula
Robustness	Less robust to violations of assumptions	More robust to violations of assumptions
Recommended as default	No	Yes

The Welch’s t-test is generally recommended as the default choice because:

It does not assume equal variances between groups
It performs well even when sample sizes are unequal
It maintains good statistical power and control of Type I error rates

How the Independent Samples t-Test Works

The t-test compares the observed difference between group means relative to the variability within the groups:

// Visual Style
//   - Used the same blue rectangles (#b3deff) for process nodes
//   - Yellow diamond (#f8d56f) for the decision node
//   - Light gray rectangles (#f0f0f0) for outcome nodes
//   - Added light green rectangles (#e2f0d9) for data input nodes

//| echo: false
tTestFlowchart = {
  // Canvas setup - wider to accommodate horizontal layout
  const width = 1300;
  const height = 500;
  const padding = 60;
  
  // Create SVG with explicit viewBox
  const svg = d3.create("svg")
    .attr("width", width)
    .attr("height", height)
    .attr("viewBox", [0, 0, width, height])
    .attr("style", "max-width: 100%; height: auto; font: 16px sans-serif;");
  
  // Add title at the bottom like in original
  svg.append("text")
    .attr("x", width / 2)
    .attr("y", height - 20)
    .attr("text-anchor", "middle")
    .attr("font-size", "20px")
    .attr("font-weight", "bold")
    .text("T-Test Procedure");
  
  // Define nodes with more horizontal layout similar to mermaid version
  const nodes = [
    // Top branch - means
    {id: "A", label: "Group A Data", x: padding + 100, y: 120},
    {id: "B", label: "Group B Data", x: padding + 100, y: 300},
    {id: "C", label: "Calculate Mean A", x: padding + 280, y: 120},
    {id: "D", label: "Calculate Mean B", x: padding + 280, y: 200},
    {id: "E", label: "Find Mean\nDifference", x: padding + 460, y: 160},
    
    // Bottom branch - variances
    {id: "F", label: "Calculate\nVariance A", x: padding + 280, y: 260},
    {id: "G", label: "Calculate\nVariance B", x: padding + 280, y: 340},
    {id: "H", label: "Estimate\nStandard Error", x: padding + 460, y: 300},
    
    // Final common path
    {id: "I", label: "Calculate\nt-statistic", x: padding + 640, y: 230},
    {id: "J", label: "Determine\np-value", x: padding + 820, y: 230},
    {id: "K", label: "p-value < 0.05?", x: padding + 1000, y: 230, isDecision: true},
    {id: "L", label: "Reject null\nhypothesis", x: padding + 910, y: 350},
    {id: "M", label: "Retain null\nhypothesis", x: padding + 1090, y: 350}
  ];
  
  // Define edges - IMPORTANT: Draw order matters, so we'll draw paths in specific order
  const edges = [
    // Mean calculation path
    {source: "A", target: "C", label: "", order: 1},
    {source: "C", target: "E", label: "", order: 1},
    
    // Variance calculation path
    {source: "A", target: "F", label: "", order: 1},
    {source: "F", target: "H", label: "", order: 1},
    {source: "G", target: "H", label: "", order: 1},
    
    // Final common path
    {source: "E", target: "I", label: "", order: 1},
    {source: "H", target: "I", label: "", order: 1},
    {source: "I", target: "J", label: "", order: 1},
    {source: "J", target: "K", label: "", order: 1},
    {source: "K", target: "L", label: "Yes", order: 1},
    {source: "K", target: "M", label: "No", order: 1},
    
    // Draw these connections last to ensure they appear on top
    {source: "B", target: "D", label: "", order: 2},
    {source: "D", target: "E", label: "", order: 2},
    {source: "B", target: "G", label: "", order: 2}
  ];
  
  // Define arrow marker
  svg.append("defs").append("marker")
    .attr("id", "arrowhead")
    .attr("viewBox", "0 0 10 10")
    .attr("refX", 8)
    .attr("refY", 5)
    .attr("markerWidth", 8)
    .attr("markerHeight", 8)
    .attr("orient", "auto")
    .append("path")
    .attr("d", "M 0 0 L 10 5 L 0 10 z")
    .attr("fill", "#666");
  
  // Sort edges by order to control draw sequence
  edges.sort((a, b) => a.order - b.order);
  
  // Draw edges with improved path calculation for horizontal layout
  const edgeLines = svg.selectAll("path.edge")
    .data(edges)
    .join("path")
    .attr("class", d => `edge order-${d.order}`)
    .attr("d", d => {
      const source = nodes.find(n => n.id === d.source);
      const target = nodes.find(n => n.id === d.target);
      
      // Calculate connector points
      let sourceX, sourceY, targetX, targetY;
      let path = "";
      
      // Special case for decision diamond
      if (source.isDecision) {
        if (d.label === "Yes") {
          // Going down-left to L
          sourceX = source.x - 25;
          sourceY = source.y + 20;
          targetX = target.x;
          targetY = target.y - 25;
          path = `M${sourceX},${sourceY} L${sourceX},${(sourceY + targetY)/2} L${targetX},${(sourceY + targetY)/2} L${targetX},${targetY}`;
        } else if (d.label === "No") {
          // Going down-right to M
          sourceX = source.x + 25;
          sourceY = source.y + 20;
          targetX = target.x;
          targetY = target.y - 25;
          path = `M${sourceX},${sourceY} L${sourceX},${(sourceY + targetY)/2} L${targetX},${(sourceY + targetY)/2} L${targetX},${targetY}`;
        }
      }
      // Special case for B to D connection that needs to route around F
      else if (source.id === "B" && target.id === "D") {
        // Avoid overlapping with Calculate Variance A
        sourceX = source.x + 70;
        sourceY = source.y;
        targetX = target.x - 70;
        targetY = target.y;
        
        // Create path that goes around node F
        path = `M${sourceX},${sourceY} L${sourceX + 30},${sourceY} L${sourceX + 30},${sourceY - 40} L${targetX - 30},${sourceY - 40} L${targetX - 30},${targetY} L${targetX},${targetY}`;
      }
      // Special cases for diagonal flows
      else if ((source.id === "C" && target.id === "E") || 
               (source.id === "D" && target.id === "E")) {
        // Mean calculations to mean difference
        sourceX = source.x + 70;
        sourceY = source.y;
        targetX = target.x - 70;
        targetY = target.y;
        
        // Create curved paths
        const midX = (sourceX + targetX) / 2;
        path = `M${sourceX},${sourceY} C${midX},${sourceY} ${midX},${targetY} ${targetX},${targetY}`;
      }
      else if ((source.id === "F" && target.id === "H") || 
               (source.id === "G" && target.id === "H")) {
        // Variance calculations to standard error
        sourceX = source.x + 70;
        sourceY = source.y;
        targetX = target.x - 70;
        targetY = target.y;
        
        // Create curved paths
        const midX = (sourceX + targetX) / 2;
        path = `M${sourceX},${sourceY} C${midX},${sourceY} ${midX},${targetY} ${targetX},${targetY}`;
      }
      else if ((source.id === "E" && target.id === "I") || 
               (source.id === "H" && target.id === "I")) {
        // Mean difference and standard error to t-statistic
        sourceX = source.x + 70;
        sourceY = source.y;
        targetX = target.x - 70;
        targetY = target.y;
        
        // Create curved paths
        const midX = (sourceX + targetX) / 2;
        path = `M${sourceX},${sourceY} C${midX},${sourceY} ${midX},${targetY} ${targetX},${targetY}`;
      }
      else if (source.id === "A" && target.id === "F") {
        // Group A Data to Variance A (diagonal down)
        sourceX = source.x + 30;
        sourceY = source.y + 25;
        targetX = target.x - 70;
        targetY = target.y;
        
        // Create angled path
        path = `M${sourceX},${sourceY} L${(sourceX + targetX)/2},${(sourceY + targetY)/2} L${targetX},${targetY}`;
      }
      else if (source.id === "B" && target.id === "G") {
        // B to G connection
        sourceX = source.x + 30;
        sourceY = source.y + 25;
        targetX = target.x - 70;
        targetY = target.y;
        
        // Create angled path
        path = `M${sourceX},${sourceY} L${(sourceX + targetX)/2},${(sourceY + targetY)/2} L${targetX},${targetY}`;
      }
      else if (target.y > source.y + 30) {
        // General case: Vertical flow down
        sourceX = source.x;
        sourceY = source.y + 25;
        targetX = target.x;
        targetY = target.y - 25;
        path = `M${sourceX},${sourceY} L${sourceX},${(sourceY + targetY)/2} L${targetX},${(sourceY + targetY)/2} L${targetX},${targetY}`;
      }
      else if (target.y < source.y - 30) {
        // General case: Vertical flow up
        sourceX = source.x;
        sourceY = source.y - 25;
        targetX = target.x;
        targetY = target.y + 25;
        path = `M${sourceX},${sourceY} L${sourceX},${(sourceY + targetY)/2} L${targetX},${(sourceY + targetY)/2} L${targetX},${targetY}`;
      }
      else {
        // Horizontal flow (default)
        sourceX = source.x + 70;
        sourceY = source.y;
        targetX = target.x - 70;
        targetY = target.y;
        path = `M${sourceX},${sourceY} L${targetX},${targetY}`;
      }
      
      return path;
    })
    .attr("stroke", "#666")
    .attr("stroke-width", 2)
    .attr("fill", "none")
    .attr("marker-end", "url(#arrowhead)");
  
  // Add edge labels with better positioning
  svg.selectAll(".edgelabel")
    .data(edges.filter(d => d.label !== ""))
    .join("text")
    .attr("class", "edgelabel")
    .attr("text-anchor", "middle")
    .attr("dominant-baseline", "middle")
    .attr("x", d => {
      const source = nodes.find(n => n.id === d.source);
      const target = nodes.find(n => n.id === d.target);
      
      if (d.label === "Yes") {
        return (source.x + target.x) / 2 - 30;
      } else if (d.label === "No") {
        return (source.x + target.x) / 2 + 30;
      } else {
        return (source.x + target.x) / 2;
      }
    })
    .attr("y", d => {
      const source = nodes.find(n => n.id === d.source);
      const target = nodes.find(n => n.id === d.target);
      
      if (d.label === "Yes" || d.label === "No") {
        return (source.y + target.y) / 2 - 10;
      } else {
        return source.y - 10;
      }
    })
    .attr("font-size", "14px")
    .attr("font-weight", "bold")
    .attr("fill", d => d.label === "Yes" ? "#5a9bd5" : (d.label === "No" ? "#ff9052" : "#333"))
    .text(d => d.label);
  
  // Draw nodes with fixed box sizes - after drawing paths to ensure nodes appear on top
  const node = svg.selectAll(".node")
    .data(nodes)
    .join("g")
    .attr("class", "node")
    .attr("transform", d => `translate(${d.x},${d.y})`);
  
  // Add node shapes (rectangles or diamonds) with consistent sizing
  node.each(function(d) {
    const elem = d3.select(this);
    
    if (d.isDecision) {
      // Diamond for decision node
      elem.append("polygon")
        .attr("points", "0,-30 60,0 0,30 -60,0")
        .attr("fill", "#f8d56f")
        .attr("stroke", "#d4a82e")
        .attr("stroke-width", 2);
    } else {
      // Rectangle for regular node with fixed width
      const boxWidth = 140;
      elem.append("rect")
        .attr("x", -boxWidth/2)
        .attr("y", -25)
        .attr("width", boxWidth)
        .attr("height", 50)
        .attr("rx", 5)
        .attr("ry", 5)
        .attr("fill", d => {
          if (d.id === "L" || d.id === "M") return "#f0f0f0";
          if (d.id === "A" || d.id === "B") return "#e2f0d9"; // Light green for data inputs
          return "#b3deff";
        })
        .attr("stroke", d => {
          if (d.id === "L" || d.id === "M") return "#999";
          if (d.id === "A" || d.id === "B") return "#70ad47"; // Green border for data inputs
          return "#4a98e0";
        })
        .attr("stroke-width", 2);
    }
  });
  
  // Add node labels with better text wrapping
  node.append("text")
    .attr("text-anchor", "middle")
    .attr("dominant-baseline", "middle")
    .attr("font-size", "14px")
    .attr("font-weight", d => (d.id === "K" ? "bold" : "normal"))
    .attr("fill", "#333")
    .each(function(d) {
      const lines = d.label.split('\n');
      const elem = d3.select(this);
      
      if (lines.length === 1) {
        elem.text(d.label);
      } else {
        lines.forEach((line, i) => {
          const lineHeight = 16;
          const yOffset = (i - (lines.length - 1) / 2) * lineHeight;
          elem.append("tspan")
            .attr("x", 0)
            .attr("y", yOffset)
            .text(line);
        });
      }
    });
  
  // Add interactivity
  node.on("mouseover", function(event, d) {
      d3.select(this).select("rect, polygon")
        .transition()
        .duration(200)
        .attr("fill", d => {
          if (d.isDecision) return "#ffc107";
          if (d.id === "A" || d.id === "B") return "#b8e986"; // Brighter green on hover
          if (d.id === "L" || d.id === "M") return "#e6e6e6";
          return "#7fc9ff";
        });
    })
    .on("mouseout", function(event, d) {
      d3.select(this).select("rect, polygon")
        .transition()
        .duration(200)
        .attr("fill", d => {
          if (d.isDecision) return "#f8d56f";
          if (d.id === "L" || d.id === "M") return "#f0f0f0";
          if (d.id === "A" || d.id === "B") return "#e2f0d9";
          return "#b3deff";
        });
    });
  
  return svg.node();
}

Mathematical Procedure

Student’s t-Test (Equal Variances)

Calculate the means for each group: \(\bar{X}_1\) and \(\bar{X}_2\)
Calculate the standard deviations for each group: \(s_1\) and \(s_2\)
Calculate the pooled standard deviation:

\[s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}}\]
Calculate the standard error of the difference between means:

\[SE = s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}\]
Calculate the t-statistic:

\[t = \frac{\bar{X}_1 - \bar{X}_2}{SE}\]
Determine degrees of freedom:

\[df = n_1 + n_2 - 2\]
Calculate p-value by comparing the t-statistic to the t-distribution with the calculated degrees of freedom

Welch’s t-Test (Unequal Variances)

Calculate the means for each group: \(\bar{X}_1\) and \(\bar{X}_2\)
Calculate the standard deviations for each group: \(s_1\) and \(s_2\)
Calculate the standard error of the difference between means:

\[SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\]
Calculate the t-statistic:

\[t = \frac{\bar{X}_1 - \bar{X}_2}{SE}\]
Determine approximate degrees of freedom (Welch-Satterthwaite equation):

\[df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}}\]
Calculate p-value by comparing the t-statistic to the t-distribution with the calculated degrees of freedom

Effect Size (Cohen’s d)

The effect size quantifies the magnitude of the difference between groups, independent of sample size:

\[d = \frac{|\bar{X}_1 - \bar{X}_2|}{s_{pooled}}\]

Where \(s_{pooled}\) is the pooled standard deviation.

Assumptions of the Independent Samples t-Test

Independence: Observations in each group are independent (by research design)
Normality: Both samples come from normally distributed populations
- With large samples (n > 30 per group), the t-test is robust to normality violations due to the Central Limit Theorem
Homogeneity of variance (for Student’s t-test only): Both groups have similar variances
- Test using Levene’s test; if violated, use Welch’s t-test instead

Statistical Power Considerations

Important

Statistical Power Note: The power of a t-test is influenced by: - Sample size - Effect size (magnitude of the difference) - Significance level (α) - Variability within groups

To achieve 80% power (standard convention) for detecting: - Small effect (d = 0.2): Need approximately 394 participants per group - Medium effect (d = 0.5): Need approximately 64 participants per group - Large effect (d = 0.8): Need approximately 26 participants per group

These calculations assume α = 0.05 for a two-tailed test.

Example 1: Comparing Treatment vs. Control Group

A researcher wants to test if a new medication affects cognitive performance. They randomly assign 20 participants to either a treatment group or a control group.

Data:

Treatment Group	Control Group
86, 92, 78, 84, 88, 90, 95, 81, 89, 83	74, 77, 70, 82, 75, 68, 73, 79, 71, 69

Analysis Steps:

Check normality assumption:
- Shapiro-Wilk test: Treatment (p = 0.81), Control (p = 0.66)
- Both p-values > 0.05, so we can assume normality for both groups
Check homogeneity of variance:
- Levene’s test: p = 0.27
- p > 0.05, so we can assume equal variances
Choose appropriate test:
- Since equal variances can be assumed, Student’s t-test is appropriate
- For completeness, we’ll report both Student’s and Welch’s results
Perform t-test:
- Treatment mean = 86.6, SD = 5.3
- Control mean = 73.8, SD = 4.6
- Mean difference = 12.8
- Student’s t(18) = 5.87, p < 0.001
- Welch’s t(17.6) = 5.87, p < 0.001
- Cohen’s d = 2.63 (very large effect)
- 95% CI for difference: [8.3, 17.3]

Results:

t = 5.87, p < 0.001, d = 2.63
Mean treatment: 86.6, Mean control: 73.8
Interpretation: There is a statistically significant difference in cognitive performance between the treatment and control groups (p < 0.05), with the treatment group scoring higher. The effect size is very large (d > 0.8).

How to Report: “Participants who received the medication (M = 86.6, SD = 5.3) scored significantly higher on cognitive performance tests compared to those in the control group (M = 73.8, SD = 4.6), t(18) = 5.87, p < 0.001, d = 2.63, 95% CI [8.3, 17.3]. This represents a very large effect.”

Example 2: Comparing Two Teaching Methods

An educator wants to compare two teaching methods. They implement Method A in one class of 25 students and Method B in another class of 25 students, then administer the same test.

Data (summary statistics):
- Method A: n = 25, Mean = 78.3, SD = 8.7 - Method B: n = 25, Mean = 72.1, SD = 12.3

Results:
- Levene’s test: p = 0.04 (unequal variances) - Welch’s t(42.8) = 2.14, p = 0.038, d = 0.59 - Interpretation: There is a statistically significant difference in test scores between the two teaching methods (p < 0.05), with Method A producing higher scores on average. The effect size is medium (d ≈ 0.6).

How to Report: “Students taught using Method A (M = 78.3, SD = 8.7) performed significantly better than those taught using Method B (M = 72.1, SD = 12.3), Welch’s t(42.8) = 2.14, p = 0.038, d = 0.59, 95% CI [0.4, 12.0]. This represents a medium-sized effect. Welch’s t-test was used due to unequal variances between the groups (Levene’s test p = 0.04).”

How to Report Independent Samples t-Test Results

When reporting the results of an independent samples t-test in academic papers or research reports, include the following elements:

"[Group 1] (M = [mean1], SD = [sd1]) [showed/did not show] significantly [higher/lower/different] 
[variable] compared to [Group 2] (M = [mean2], SD = [sd2]), [Student's/Welch's] t([df]) = [t-value], 
p = [p-value], d = [effect size], 95% CI [lower bound, upper bound]."

For example:

"The treatment group (M = 86.6, SD = 5.3) showed significantly higher cognitive performance 
compared to the control group (M = 73.8, SD = 4.6), t(18) = 5.87, p < 0.001, d = 2.63, 
95% CI [8.3, 17.3]."

Additional information to consider including: - Which version of the t-test was used (Student’s or Welch’s) - Results of assumption tests (normality, homogeneity of variance) - Whether the test was one-tailed or two-tailed - Sample sizes for each group

APA Style Reporting

For APA style papers (7th edition), report the independent samples t-test results as follows:

We conducted an independent samples t-test to examine whether [variable] differed between [Group 1] 
and [Group 2]. Results indicated that [Group 1] (M = [mean1], SD = [sd1]) [showed/did not show] 
significantly [higher/lower] [variable] than [Group 2] (M = [mean2], SD = [sd2]), 
[Student's/Welch's] t([df]) = [t-value], p = [p-value], d = [effect size], 95% CI [lower, upper].

Reporting in Tables

When reporting multiple t-test results in a table, include these columns: - Variables being compared - Means and standard deviations for both groups - t-value - Degrees of freedom - p-value - Effect size (Cohen’s d) - 95% confidence interval

Test Your Understanding

When should you use Welch’s t-test instead of Student’s t-test?
- 1. When sample sizes are very large
- 1. When both groups have equal variances
- 1. When groups have unequal variances
- 1. When data is not normally distributed
What does Cohen’s d measure in a t-test?
- 1. The probability of making a Type I error
- 1. The effect size (magnitude of the difference)
- 1. The variance within groups
- 1. The degrees of freedom
A researcher finds t(28) = 2.15, p = 0.04 when comparing two groups. What can they conclude?
- 1. There is no significant difference between the groups
- 1. There is a significant difference between the groups
- 1. The test is invalid
- 1. More data is needed
What is the appropriate sample size per group to detect a medium effect size (d = 0.5) with 80% power?
- 1. Approximately 10
- 1. Approximately 25
- 1. Approximately 64
- 1. Approximately 400
What happens to the degrees of freedom in Welch’s t-test compared to Student’s t-test?
- 1. They are always higher
- 1. They are always lower
- 1. They depend on the sample variances and sizes
- 1. They remain the same

Answers: 1-C, 2-B, 3-B, 4-C, 5-C

Common Questions About the t-Test

When should I use an independent samples t-test versus a paired t-test?

Use an independent samples t-test when comparing two separate, unrelated groups (e.g., treatment vs. control). Use a paired t-test when comparing two related measurements (e.g., before vs. after treatment on the same subjects).

What if my data isn’t normally distributed?

If your sample size is large (n > 30 per group), the t-test is generally robust to violations of normality due to the Central Limit Theorem. For smaller samples with non-normal data, consider using a non-parametric alternative like the Mann-Whitney U test.

How do I report t-test results in research papers?

For a complete report, include: t-value, degrees of freedom, p-value, mean difference, 95% confidence interval, and effect size (Cohen’s d). For example: “The treatment group (M = 7.13, SD = 0.23) scored significantly higher than the control group (M = 5.78, SD = 0.31), t(18) = 10.82, p < .001, d = 4.84, 95% CI [1.08, 1.62].”

What sample size do I need for adequate statistical power?

The required sample size depends on the expected effect size and desired power level. As a rough guideline, to detect a medium effect (d = 0.5) with 80% power at α = 0.05, you need approximately 64 participants per group. For a large effect (d = 0.8), you need about 26 participants per group.

Why is Welch’s t-test often recommended over Student’s t-test?

Welch’s t-test doesn’t assume equal variances between groups, making it more robust when this assumption is violated. Research has shown that Welch’s t-test maintains good control of Type I error rates while providing adequate statistical power, even when variances are equal. Therefore, many statisticians recommend it as the default choice for independent samples comparisons.

Can I use a t-test if my groups have different sample sizes?

Yes, the t-test can handle unequal sample sizes. However, when sample sizes differ and variances are unequal (heteroscedasticity), Welch’s t-test is strongly recommended over Student’s t-test to maintain proper Type I error control.

Examples of When to Use the Independent Samples t-Test

Medical research: Comparing treatment outcomes between control and experimental groups
Educational research: Comparing test scores between two different teaching methods
Psychology: Comparing psychological measures between different demographic groups
Market research: Comparing consumer satisfaction scores between two product versions
Environmental science: Comparing pollution levels between two different locations
Business: Comparing employee performance between two different management styles
Sports science: Comparing physiological measures between athletes and non-athletes
Sociology: Comparing social attitudes between two different cultures or communities
Agriculture: Comparing crop yields between two different farming methods
Manufacturing: Comparing product quality metrics between two production processes

Step-by-Step Guide to the Independent Samples t-Test

1. Check Assumptions

Before interpreting t-test results, you should verify these assumptions:

Independence: Observations in each group are independent (by research design)
Normality: Both samples come from normally distributed populations
- Check using Shapiro-Wilk test and Q-Q plots in the “Assumptions” tab
- With large samples (n > 30 per group), the t-test is robust to normality violations
Homogeneity of variance: Both groups have similar variances
- Check using Levene’s test in the “Assumptions” tab
- If violated, use Welch’s t-test instead of Student’s t-test

2. Choose the Appropriate Test

If variances are equal (Levene’s test p ≥ 0.05), you can use Student’s t-test
If variances are unequal (Levene’s test p < 0.05), use Welch’s t-test
When in doubt, Welch’s t-test is generally recommended as the safer option

3. Interpret the Results

Check the p-value:
- If p < 0.05, there is a statistically significant difference between group means
- If p ≥ 0.05, there is not enough evidence to conclude the means differ
Examine the effect size (Cohen’s d):
- d ≈ 0.2: Small effect
- d ≈ 0.5: Medium effect
- d ≈ 0.8: Large effect
Look at the confidence interval:
- If it doesn’t include zero, the difference is statistically significant
- The width indicates precision of the estimated difference

References

Student. (1908). The probable error of a mean. Biometrika, 6(1), 1-25.
Welch, B. L. (1947). The generalization of “Student’s” problem when several different population variances are involved. Biometrika, 34(1/2), 28-35.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
Ruxton, G. D. (2006). The unequal variance t-test is an underused alternative to Student’s t-test and the Mann-Whitney U test. Behavioral Ecology, 17(4), 688-690.
Delacre, M., Lakens, D., & Leys, C. (2017). Why psychologists should by default use Welch’s t-test instead of Student’s t-test. International Review of Social Psychology, 30(1), 92-101.
Fagerland, M. W. (2012). t-tests, non-parametric tests, and large studies—a paradox of statistical practice? BMC Medical Research Methodology, 12(1), 78.

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{kassambara2025,
  author = {Kassambara, Alboukadel},
  title = {Independent {Samples} {t-Test} {Calculator} \textbar{}
    {Compare} {Two} {Group} {Means}},
  date = {2025-04-07},
  url = {https://www.datanovia.com/learn/tools/statistical-tests/independent-samples-t-test.html},
  langid = {en}
}

For attribution, please cite this work as:

Kassambara, Alboukadel. 2025. “Independent Samples t-Test Calculator | Compare Two Group Means.” April 7, 2025. https://www.datanovia.com/learn/tools/statistical-tests/independent-samples-t-test.html.