QQ Plot Generator & Analyzer | Test Normality Visually Online

Key Takeaways: QQ Plot Analysis

Tip

Purpose: Visually assess whether data follows a particular distribution (typically normal)
How it works: Compares the quantiles of your data to the quantiles of a theoretical distribution
Advantage over formal tests: Shows exactly where and how data deviates from the theoretical distribution
Interpretation: Points following the diagonal line suggest the distributions match
Common patterns: Identify skewness, heavy/light tails, and outliers from specific curve patterns
Complementary to: Formal tests like Shapiro-Wilk or Kolmogorov-Smirnov
Applications: Testing assumptions for parametric tests, data exploration, distribution fitting

What is a QQ Plot?

A Quantile-Quantile (QQ) plot is a powerful graphical technique for comparing two probability distributions by plotting their quantiles against each other. When testing for normality, it compares the quantiles of your data against the quantiles of a normal distribution. QQ plots are widely considered one of the most effective visual methods for assessing whether a dataset follows a particular distribution.

Tip

When to use QQ plots:

When checking if data follows a normal distribution (or other theoretical distribution)
When formal normality tests give borderline results
When you need to understand exactly how your data deviates from normality
Before applying statistical methods that assume a particular distribution
When analyzing residuals from regression or ANOVA models
When you need to choose an appropriate data transformation

This interactive tool allows you to quickly generate QQ plots, visualize your data distribution patterns, and receive detailed interpretations to guide your statistical analysis decisions.

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 1300

library(shiny)
library(bslib)
library(ggplot2)
library(bsicons)
library(vroom)
library(shinyjs)

ui <- page_sidebar(
  title = "QQ Plot Analysis: Assess Data Normality",
  useShinyjs(),  # Enable shinyjs for dynamic UI updates
  sidebar = sidebar(
    width = 400,
    
    card(
      card_header("Data Input"),
      accordion(
        accordion_panel(
          "Manual Input",
          textAreaInput("data_input", "Enter your data (one value per row):", rows = 8,
                      placeholder = "Paste values here..."),
          div(
            actionLink("use_example", "Use example data", style = "color:#0275d8;"),
            tags$span(bs_icon("file-earmark-text"), style = "margin-left: 5px; color: #0275d8;")
          )
        ),
        accordion_panel(
          "File Upload",
          fileInput("file_upload", "Upload CSV or TXT file:",
                   accept = c("text/csv", "text/plain", ".csv", ".txt")),
          checkboxInput("header", "File has header", TRUE),
          conditionalPanel(
            condition = "output.file_uploaded",
            div(
              selectInput("selected_var", "Select variable:", choices = NULL),
              actionButton("clear_file", "Clear File", class = "btn-danger btn-sm")
            )
          )
        ),
        id = "input_method",
        open = 1
      ),
      
      # Plot Settings accordion
      accordion(
        accordion_panel(
          "Plot Settings",
          
          card(
            card_header("Distribution Options:"),
            card_body(
              selectInput("dist_type", "Reference Distribution:",
                         choices = c("Normal" = "norm", 
                                    "t (Student's)" = "t",
                                    "Uniform" = "unif",
                                    "Log-Normal" = "lnorm",
                                    "Exponential" = "exp"),
                         selected = "norm"),
              conditionalPanel(
                condition = "input.dist_type == 't'",
                sliderInput("t_df", "Degrees of Freedom:", min = 1, max = 30, value = 5, step = 1)
              ),
              checkboxInput("standardize", "Standardize Data (Z-scores)", value = FALSE)
            )
          ),
          
          card(
            card_header("Visual Options:"),
            card_body(
              checkboxInput("conf_interval", "Show Confidence Bands", TRUE),
              sliderInput("conf_level", "Confidence Level:", min = 0.90, max = 0.99, value = 0.95, step = 0.01),
              checkboxInput("points_fill", "Color points by deviation", TRUE),
              checkboxInput("show_deviations", "Show Deviation Segments", FALSE),
              selectInput("theme_choice", "Plot Theme:",
                         choices = c("Minimal" = "minimal",
                                    "Classic" = "classic",
                                    "Light" = "light",
                                    "Dark" = "dark"),
                         selected = "minimal")
            )
          )
        ),
        open = FALSE
      ),
      
      actionButton("analyze", "Analyze Data", class = "btn btn-primary")
    ),

    hr(),

    card(
      card_header("Interpretation Guide"),
      card_body(
        div(class = "alert alert-info",
          tags$p("QQ (Quantile-Quantile) plots compare your data distribution to a theoretical distribution:"),
          tags$ul(
            tags$li(tags$b("Straight line:"), " Data follows the reference distribution."),
            tags$li(tags$b("S-shaped curve:"), " Data shows skewness."),
            tags$li(tags$b("Points above line at ends:"), " Data has heavy tails (more extreme values)."),
            tags$li(tags$b("Points below line at ends:"), " Data has light tails (fewer extreme values)."),
            tags$li(tags$b("Confidence bands:"), " Points falling outside these bands indicate significant deviations from the reference distribution.")
          )
        )
      )
    )
  ),

  layout_column_wrap(
    width = 1,

    card(
      card_header("QQ Plot Analysis"),
      card_body(
        navset_tab(
          nav_panel("QQ Plot", plotOutput("qqplot", height = "500px")),
          nav_panel("Histogram", plotOutput("histogram", height = "500px")),
          nav_panel("Deviations Analysis", plotOutput("deviation_plot", height = "500px"))
        )
      )
    ),

    card(
      card_header("Normality Assessment"),
      card_body(
        navset_tab(
          nav_panel("Results", uiOutput("error_message"), verbatimTextOutput("summary_results")),
          nav_panel("Interpretation", div(style = "font-size: 0.9rem;",
            uiOutput("interpretation"),
            uiOutput("deviation_pattern"),
            hr(),
            h5("Common Patterns in QQ Plots"),
            div(
              class = "row",
              div(class = "col-md-6",
                  h6("Normal Distribution:"),
                  p("Points follow the diagonal line with random scatter."),
                  h6("Skewed Right (Positive Skew):"),
                  p("Points curve above the line at right end (high values)."),
                  h6("Skewed Left (Negative Skew):"),
                  p("Points curve below the line at left end (low values).")
              ),
              div(class = "col-md-6",
                  h6("Heavy Tails (Leptokurtic):"),
                  p("S-shape with points below line in middle, above at ends."),
                  h6("Light Tails (Platykurtic):"),
                  p("Inverted S-shape with points above line in middle, below at ends."),
                  h6("Bimodal/Multimodal:"),
                  p("Step-like pattern with horizontal segments.")
              )
            )
          ))
        )
      )
    )
  )
)

server <- function(input, output, session) {
  # Example data - normally distributed with some outliers
  example_data <- "5.2\n5.5\n6.1\n5.8\n5.5\n5.9\n6.2\n5.7\n5.6\n6.0\n5.8\n5.5\n5.7\n6.3\n5.9\n5.8\n5.6\n5.7\n6.1\n5.9\n5.4\n5.8\n6.2\n5.7\n5.6\n5.8\n6.4\n5.9\n5.7\n5.5\n7.2\n4.3"

  # Track input method
  input_method <- reactiveVal("manual")
  
  # Function to clear file inputs
  clear_file_inputs <- function() {
    updateSelectInput(session, "selected_var", choices = NULL)
    reset("file_upload")
  }
  
  # Function to clear text inputs
  clear_text_inputs <- function() {
    updateTextAreaInput(session, "data_input", value = "")
  }

  # When example data is used, clear file inputs and set text inputs
  observeEvent(input$use_example, {
    input_method("manual")
    clear_file_inputs()
    updateTextAreaInput(session, "data_input", value = example_data)
  })

  # When file is uploaded, clear text inputs and set file method
  observeEvent(input$file_upload, {
    if (!is.null(input$file_upload)) {
      input_method("file")
      clear_text_inputs()
    }
  })

  # When clear file button is clicked, clear file and set manual method
  observeEvent(input$clear_file, {
    input_method("manual")
    clear_file_inputs()
  })
  
  # When text input changes, clear file inputs if it has content
  observeEvent(input$data_input, {
    if (!is.null(input$data_input) && nchar(input$data_input) > 0) {
      input_method("manual")
      clear_file_inputs()
    }
  }, ignoreInit = TRUE)

  # Process uploaded file
  file_data <- reactive({
    req(input$file_upload)
    tryCatch({
      vroom::vroom(input$file_upload$datapath, delim = NULL, col_names = input$header, show_col_types = FALSE)
    }, error = function(e) {
      showNotification(paste("File read error:", e$message), type = "error")
      NULL
    })
  })

  # Update variable selection dropdown with numeric columns from uploaded file
  observe({
    df <- file_data()
    if (!is.null(df)) {
      num_vars <- names(df)[sapply(df, is.numeric)]
      updateSelectInput(session, "selected_var", choices = num_vars)
    }
  })

  output$file_uploaded <- reactive({
    !is.null(input$file_upload)
  })
  outputOptions(output, "file_uploaded", suspendWhenHidden = FALSE)

  # Function to parse text input
  parse_text_input <- function(text) {
    if (is.null(text) || text == "") return(NULL)
    input_lines <- strsplit(text, "\\r?\\n")[[1]]
    input_lines <- input_lines[input_lines != ""]
    numeric_values <- suppressWarnings(as.numeric(input_lines))
    if (all(is.na(numeric_values))) return(NULL)
    return(na.omit(numeric_values))
  }

  # Get data values based on input method
  data_values <- reactive({
    if (input_method() == "file" && !is.null(file_data()) && !is.null(input$selected_var)) {
      df <- file_data()
      return(na.omit(df[[input$selected_var]]))
    } else {
      return(parse_text_input(input$data_input))
    }
  })

  # Validate input data
  validate_data <- reactive({
    values <- data_values()
    
    if (is.null(values)) {
      return("Error: Please check your input. Make sure all values are numeric.")
    }
    
    if (length(values) < 5) {
      return("Error: At least 5 values are recommended for meaningful QQ plot analysis.")
    }
    
    if (length(unique(values)) == 1) {
      return("Error: All values are identical. QQ plot analysis requires variation in the data.")
    }
    
    return(NULL)
  })

  # Display error messages
  output$error_message <- renderUI({
    error <- validate_data()
    if (!is.null(error) && input$analyze > 0) {
      if (startsWith(error, "Warning")) {
        div(class = "alert alert-warning", error)
      } else {
        div(class = "alert alert-danger", error)
      }
    }
  })

  # Prepare data for analysis
  processed_data <- reactive({
    req(input$analyze > 0)
    error <- validate_data()
    if (!is.null(error) && startsWith(error, "Error")) return(NULL)
    
    values <- data_values()
    
    # Standardize if selected
    if (input$standardize) {
      values <- scale(values)
    }
    
    values
  })

  # Generate QQ plot
  output$qqplot <- renderPlot({
    req(processed_data())
    
    values <- processed_data()
    
    # Set up base qq plot data
    n <- length(values)
    
    # Theoretical quantiles based on selected distribution
    p <- (1:n - 0.5) / n
    
    if (input$dist_type == "norm") {
      theoretical_quantiles <- qnorm(p)
      dist_name <- "Normal"
    } else if (input$dist_type == "t") {
      theoretical_quantiles <- qt(p, df = input$t_df)
      dist_name <- paste0("t (df = ", input$t_df, ")")
    } else if (input$dist_type == "unif") {
      theoretical_quantiles <- qunif(p)
      dist_name <- "Uniform"
    } else if (input$dist_type == "lnorm") {
      theoretical_quantiles <- qlnorm(p)
      dist_name <- "Log-Normal"
    } else if (input$dist_type == "exp") {
      theoretical_quantiles <- qexp(p)
      dist_name <- "Exponential"
    }
    
    # Sort the observed data
    ordered_values <- sort(values)
    
    # Calculate deviations from the theoretical line
    if (input$dist_type == "norm") {
      # For normal distribution, calculate line based on mean and SD
      slope <- sd(values)
      intercept <- mean(values)
      
      # Theoretical line values
      line_values <- intercept + slope * theoretical_quantiles
      
      # Deviations
      deviations <- ordered_values - line_values
    } else {
      # For other distributions, use the sorted values as observed quantiles
      # and calculate deviations directly
      deviations <- ordered_values - theoretical_quantiles
    }
    
    # Create data frame for plotting
    qq_data <- data.frame(
      theoretical = theoretical_quantiles,
      observed = ordered_values,
      deviation = deviations,
      deviation_color = ifelse(abs(deviations) > 1.5*sd(deviations), "Large", "Small")
    )
    
    # Calculate confidence bands (simulation-based for normal)
    if (input$conf_interval) {
      alpha <- 1 - input$conf_level
      
      # Number of simulations
      n_sim <- 1000
      
      # Initialize matrix to hold simulation results
      sorted_sims <- matrix(NA, nrow = n, ncol = n_sim)
      
      # Generate simulations based on selected distribution
      for (i in 1:n_sim) {
        if (input$dist_type == "norm") {
          sim_data <- rnorm(n, mean = mean(values), sd = sd(values))
        } else if (input$dist_type == "t") {
          sim_data <- rt(n, df = input$t_df)
          if (input$standardize == FALSE) {
            sim_data <- sim_data * sd(values) + mean(values)
          }
        } else if (input$dist_type == "unif") {
          sim_data <- runif(n, min = min(values), max = max(values))
        } else if (input$dist_type == "lnorm") {
          sim_data <- rlnorm(n)
          if (input$standardize == FALSE) {
            sim_data <- sim_data * sd(values) / sd(sim_data) + mean(values) - mean(sim_data) * sd(values) / sd(sim_data)
          }
        } else if (input$dist_type == "exp") {
          sim_data <- rexp(n, rate = 1/mean(values))
        }
        
        sorted_sims[, i] <- sort(sim_data)
      }
      
      # Calculate confidence bands
      lower_band <- apply(sorted_sims, 1, function(x) quantile(x, alpha/2))
      upper_band <- apply(sorted_sims, 1, function(x) quantile(x, 1 - alpha/2))
      
      # Add to data frame
      qq_data$lower <- lower_band
      qq_data$upper <- upper_band
    }
    
    # Create QQ plot
    p <- ggplot(qq_data, aes(x = theoretical, y = observed)) +
      labs(title = paste("QQ Plot against", dist_name, "Distribution"),
           subtitle = paste("Sample size =", n),
           x = "Theoretical Quantiles",
           y = "Sample Quantiles")
    
    # Add confidence bands if selected
    if (input$conf_interval) {
      p <- p + geom_ribbon(aes(ymin = lower, ymax = upper), fill = "gray80", alpha = 0.3)
    }
    
    # Add reference line for normal distribution
    if (input$dist_type == "norm") {
      p <- p + geom_abline(intercept = mean(values), slope = sd(values), 
                         color = "blue", linetype = "dashed", linewidth = 1)
    } else {
      p <- p + geom_line(aes(x = theoretical, y = theoretical), 
                       color = "blue", linetype = "dashed", linewidth = 1)
    }
    
    # Add points colored by deviation if selected
    if (input$points_fill) {
      p <- p + geom_point(aes(color = deviation_color), size = 3, alpha = 0.7) +
        scale_color_manual(values = c("Small" = "#2980b9", "Large" = "#c0392b"), 
                         name = "Deviation")
    } else {
      p <- p + geom_point(color = "#2980b9", size = 3, alpha = 0.7)
    }
    
    # Add deviation segments if selected
    if (input$show_deviations) {
      if (input$dist_type == "norm") {
        p <- p + geom_segment(aes(xend = theoretical, 
                               yend = mean(values) + sd(values) * theoretical,
                               color = deviation_color), 
                           alpha = 0.5, linewidth = 0.8)
      } else {
        p <- p + geom_segment(aes(xend = theoretical, 
                               yend = theoretical,
                               color = deviation_color), 
                           alpha = 0.5, linewidth = 0.8)
      }
    }
    
    # Apply selected theme
    if (input$theme_choice == "minimal") {
      p <- p + theme_minimal(base_size = 14)
    } else if (input$theme_choice == "classic") {
      p <- p + theme_classic(base_size = 14)
    } else if (input$theme_choice == "light") {
      p <- p + theme_light(base_size = 14)
    } else if (input$theme_choice == "dark") {
      p <- p + theme_dark(base_size = 14) +
        theme(
          plot.background = element_rect(fill = "gray10"),
          panel.background = element_rect(fill = "gray15"),
          panel.grid = element_line(color = "gray30")
        )
    }
    
    p
  })

  # Generate histogram
  output$histogram <- renderPlot({
    req(processed_data())
    
    values <- processed_data()
    
    # Create histogram with density
    p <- ggplot(data.frame(x = values), aes(x = x)) +
      geom_histogram(aes(y = ..density..), bins = min(30, max(10, length(values)/3)), 
                     fill = "#5dade2", color = "#2874a6", alpha = 0.7) +
      geom_density(color = "#c0392b", linewidth = 1.2) +
      labs(title = "Distribution of Data",
           subtitle = paste("Sample size =", length(values)),
           x = "Value", y = "Density")
    
    # Add theoretical density function
    if (input$dist_type == "norm") {
      p <- p + stat_function(fun = dnorm, args = list(mean = mean(values), sd = sd(values)), 
                          color = "#2471a3", linewidth = 1.2, linetype = "dashed")
      
    } else if (input$dist_type == "t") {
      # Scale the t-distribution to match data
      scaled_dt <- function(x) {
        sd_val <- sd(values)
        mean_val <- mean(values)
        dt((x - mean_val) / sd_val, df = input$t_df) / sd_val
      }
      p <- p + stat_function(fun = scaled_dt, color = "#2471a3", linewidth = 1.2, linetype = "dashed")
      
    } else if (input$dist_type == "unif") {
      p <- p + stat_function(fun = dunif, args = list(min = min(values), max = max(values)), 
                          color = "#2471a3", linewidth = 1.2, linetype = "dashed")
      
    } else if (input$dist_type == "lnorm") {
      # Scale log-normal to match data
      log_mean <- mean(log(values[values > 0]))
      log_sd <- sd(log(values[values > 0]))
      p <- p + stat_function(fun = dlnorm, args = list(meanlog = log_mean, sdlog = log_sd), 
                          color = "#2471a3", linewidth = 1.2, linetype = "dashed")
      
    } else if (input$dist_type == "exp") {
      p <- p + stat_function(fun = dexp, args = list(rate = 1/mean(values)), 
                          color = "#2471a3", linewidth = 1.2, linetype = "dashed")
    }
    
    # Apply selected theme
    if (input$theme_choice == "minimal") {
      p <- p + theme_minimal(base_size = 14)
    } else if (input$theme_choice == "classic") {
      p <- p + theme_classic(base_size = 14)
    } else if (input$theme_choice == "light") {
      p <- p + theme_light(base_size = 14)
    } else if (input$theme_choice == "dark") {
      p <- p + theme_dark(base_size = 14) +
        theme(
          plot.background = element_rect(fill = "gray10"),
          panel.background = element_rect(fill = "gray15"),
          panel.grid = element_line(color = "gray30")
        )
    }
    
    p
  })

  # Generate deviation analysis plot
  output$deviation_plot <- renderPlot({
    req(processed_data())
    
    values <- processed_data()
    
    # Set up base qq plot data
    n <- length(values)
    
    # Theoretical quantiles based on selected distribution
    p <- (1:n - 0.5) / n
    
    if (input$dist_type == "norm") {
      theoretical_quantiles <- qnorm(p)
      dist_name <- "Normal"
    } else if (input$dist_type == "t") {
      theoretical_quantiles <- qt(p, df = input$t_df)
      dist_name <- paste0("t (df = ", input$t_df, ")")
    } else if (input$dist_type == "unif") {
      theoretical_quantiles <- qunif(p)
      dist_name <- "Uniform"
    } else if (input$dist_type == "lnorm") {
      theoretical_quantiles <- qlnorm(p)
      dist_name <- "Log-Normal"
    } else if (input$dist_type == "exp") {
      theoretical_quantiles <- qexp(p)
      dist_name <- "Exponential"
    }
    
    # Sort the observed data
    ordered_values <- sort(values)
    
    # Calculate deviations from the theoretical line
    if (input$dist_type == "norm") {
      # For normal distribution, calculate line based on mean and SD
      slope <- sd(values)
      intercept <- mean(values)
      
      # Theoretical line values
      line_values <- intercept + slope * theoretical_quantiles
      
      # Deviations
      deviations <- ordered_values - line_values
    } else {
      # For other distributions, use the sorted values as observed quantiles
      # and calculate deviations directly
      deviations <- ordered_values - theoretical_quantiles
    }
    
    # Create data frame for plotting
    deviation_data <- data.frame(
      index = 1:n,
      theoretical = theoretical_quantiles,
      observed = ordered_values,
      deviation = deviations,
      deviation_color = ifelse(abs(deviations) > 1.5*sd(deviations), "Large", "Small")
    )
    
    # Create deviation plot
    p <- ggplot(deviation_data, aes(x = theoretical, y = deviation)) +
      geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
      geom_point(aes(color = deviation_color), size = 3, alpha = 0.7) +
      geom_smooth(method = "loess", se = TRUE, color = "#8e44ad", fill = "#9b59b6", alpha = 0.2) +
      scale_color_manual(values = c("Small" = "#2980b9", "Large" = "#c0392b"), 
                       name = "Deviation Size") +
      labs(title = paste("Deviations from", dist_name, "Distribution"),
           subtitle = "Pattern of deviations reveals distribution characteristics",
           x = "Theoretical Quantiles",
           y = "Deviation (Sample - Theoretical)") 
    
    # Apply selected theme
    if (input$theme_choice == "minimal") {
      p <- p + theme_minimal(base_size = 14)
    } else if (input$theme_choice == "classic") {
      p <- p + theme_classic(base_size = 14)
    } else if (input$theme_choice == "light") {
      p <- p + theme_light(base_size = 14)
    } else if (input$theme_choice == "dark") {
      p <- p + theme_dark(base_size = 14) +
        theme(
          plot.background = element_rect(fill = "gray10"),
          panel.background = element_rect(fill = "gray15"),
          panel.grid = element_line(color = "gray30")
        )
    }
    
    p
  })

  # Calculate deviation statistics
  deviation_stats <- reactive({
    req(processed_data())
    
    values <- processed_data()
    
    # Set up base qq plot data
    n <- length(values)
    
    # Theoretical quantiles based on selected distribution
    p <- (1:n - 0.5) / n
    
    if (input$dist_type == "norm") {
      theoretical_quantiles <- qnorm(p)
    } else if (input$dist_type == "t") {
      theoretical_quantiles <- qt(p, df = input$t_df)
    } else if (input$dist_type == "unif") {
      theoretical_quantiles <- qunif(p)
    } else if (input$dist_type == "lnorm") {
      theoretical_quantiles <- qlnorm(p)
    } else if (input$dist_type == "exp") {
      theoretical_quantiles <- qexp(p)
    }
    
    # Sort the observed data
    ordered_values <- sort(values)
    
    # Calculate deviations from the theoretical line
    if (input$dist_type == "norm") {
      # For normal distribution, calculate line based on mean and SD
      slope <- sd(values)
      intercept <- mean(values)
      
      # Theoretical line values
      line_values <- intercept + slope * theoretical_quantiles
      
      # Deviations
      deviations <- ordered_values - line_values
    } else {
      # For other distributions, use the sorted values as observed quantiles
      # and calculate deviations directly
      deviations <- ordered_values - theoretical_quantiles
    }
    
    # Divide deviations into segments (lower tail, middle, upper tail)
    lower_third <- deviations[1:floor(n/3)]
    middle_third <- deviations[(floor(n/3)+1):ceiling(2*n/3)]
    upper_third <- deviations[(ceiling(2*n/3)+1):n]
    
    # Calculate statistics
    list(
      mean_deviation = mean(deviations),
      sd_deviation = sd(deviations),
      max_deviation = max(abs(deviations)),
      mean_lower = mean(lower_third),
      mean_middle = mean(middle_third),
      mean_upper = mean(upper_third),
      sd_lower = sd(lower_third),
      sd_middle = sd(middle_third),
      sd_upper = sd(upper_third)
    )
  })

  # Display summary results
  output$summary_results <- renderPrint({
    req(input$analyze > 0)
    error <- validate_data()
    if (!is.null(error) && startsWith(error, "Error")) return(NULL)
    
    values <- processed_data()
    stats <- deviation_stats()
    
    cat("QQ Plot Analysis Results:\n\n")
    
    cat("Data Summary:\n")
    cat("Sample size:", length(values), "\n")
    cat("Mean:", round(mean(values), 4), "\n")
    cat("Median:", round(median(values), 4), "\n")
    cat("Standard deviation:", round(sd(values), 4), "\n")
    
    # Calculate skewness and kurtosis if e1071 is available
    skew_val <- tryCatch({
      e1071::skewness(values)
    }, error = function(e) {
      NA
    })
    
    kurt_val <- tryCatch({
      e1071::kurtosis(values)
    }, error = function(e) {
      NA
    })
    
    if (!is.na(skew_val)) {
      cat("Skewness:", round(skew_val, 4), 
          ifelse(abs(skew_val) < 0.5, " (approximately symmetric)", 
                ifelse(skew_val > 0, " (right-skewed)", " (left-skewed)")), "\n")
    }
    
    if (!is.na(kurt_val)) {
      cat("Kurtosis:", round(kurt_val, 4), 
          ifelse(abs(kurt_val) < 0.5, " (approximately mesokurtic)", 
                ifelse(kurt_val > 0, " (leptokurtic - heavy tails)", " (platykurtic - light tails)")), "\n")
    }
    
    cat("\nDeviation Analysis:\n")
    cat("Mean deviation:", round(stats$mean_deviation, 4), "\n")
    cat("Std. deviation of deviations:", round(stats$sd_deviation, 4), "\n")
    cat("Maximum absolute deviation:", round(stats$max_deviation, 4), "\n\n")
    
    cat("Distribution Pattern:\n")
    cat("- Lower tail (mean deviation):", round(stats$mean_lower, 4), "\n")
    cat("- Middle section (mean deviation):", round(stats$mean_middle, 4), "\n")
    cat("- Upper tail (mean deviation):", round(stats$mean_upper, 4), "\n\n")
    
    # Normality tests if the selected distribution is normal
    if (input$dist_type == "norm") {
      cat("Formal Normality Tests:\n")
      
      # Shapiro-Wilk Test
      sw_test <- shapiro.test(values)
      cat("Shapiro-Wilk Test: W =", round(sw_test$statistic, 4), 
          ", p-value =", format.pval(sw_test$p.value, digits = 4), "\n")
      
      # Kolmogorov-Smirnov Test
# Kolmogorov-Smirnov Test
      ks_test <- ks.test(values, "pnorm", mean = mean(values), sd = sd(values))
      cat("Kolmogorov-Smirnov Test: D =", round(ks_test$statistic, 4), 
          ", p-value =", format.pval(ks_test$p.value, digits = 4), "\n")
      
      cat("\nInterpretation of formal tests:\n")
      if (sw_test$p.value < 0.05 || ks_test$p.value < 0.05) {
        cat("At least one formal test indicates significant deviation from normality (p < 0.05).\n")
        cat("This supports the visual assessment from the QQ plot.\n")
      } else {
        cat("Formal tests do not indicate significant deviation from normality (p ≥ 0.05).\n")
        cat("This supports the visual assessment from the QQ plot if points generally follow the reference line.\n")
      }
    }
  })

  # Generate interpretation text
  output$interpretation <- renderUI({
    req(input$analyze > 0)
    error <- validate_data()
    if (!is.null(error) && startsWith(error, "Error")) return(NULL)
    
    stats <- deviation_stats()
    values <- processed_data()
    
    # Interpretation based on pattern of deviations
    interpretation_text <- div(
      h5("QQ Plot Assessment"),
      p("Based on the QQ plot analysis, the data shows the following characteristics:")
    )
    
    # Calculate measures to determine distribution characteristics
    tail_diff <- stats$mean_upper - stats$mean_lower
    middle_vs_tails <- stats$mean_middle - (stats$mean_lower + stats$mean_upper)/2
    
    # Determine distribution type based on deviation patterns
    dist_type <- ""
    
    if (abs(tail_diff) < 0.25 * stats$sd_deviation && 
        abs(middle_vs_tails) < 0.25 * stats$sd_deviation) {
      # Close to reference distribution
      if (input$dist_type == "norm") {
        dist_type <- "The data appears to approximately follow a normal distribution."
      } else {
        dist_type <- paste0("The data appears to approximately follow the selected ", 
                           input$dist_type, " distribution.")
      }
    } else if (tail_diff > 0.5 * stats$sd_deviation) {
      # Right skewed
      dist_type <- "The data appears to be right-skewed (positive skew)."
    } else if (tail_diff < -0.5 * stats$sd_deviation) {
      # Left skewed
      dist_type <- "The data appears to be left-skewed (negative skew)."
    } else if (middle_vs_tails > 0.5 * stats$sd_deviation) {
      # Light tails (platykurtic)
      dist_type <- "The data appears to have light tails (platykurtic) compared to the reference distribution."
    } else if (middle_vs_tails < -0.5 * stats$sd_deviation) {
      # Heavy tails (leptokurtic)
      dist_type <- "The data appears to have heavy tails (leptokurtic) compared to the reference distribution."
    } else {
      # No clear pattern
      dist_type <- "The data shows some deviations from the reference distribution, but without a clear pattern."
    }
    
    # Add overall deviation assessment
    if (stats$max_deviation > 2 * stats$sd_deviation) {
      overall_dev <- "There are notable deviations from the reference distribution, particularly in certain regions."
    } else {
      overall_dev <- "The overall fit to the reference distribution is reasonable, with only minor deviations."
    }
    
    # Complete interpretation
    interpretation_text <- tagList(
      interpretation_text,
      tags$ul(
        tags$li(tags$strong(dist_type)),
        tags$li(overall_dev)
      )
    )
    
    # Add advice based on the analysis
    advice <- div(
      h5("Recommendations"),
      p("Based on this analysis, consider the following:")
    )
    
    advice_items <- list()
    
    if (grepl("appears to approximately follow", dist_type)) {
      # Good fit to reference distribution
      if (input$dist_type == "norm") {
        advice_items <- c(advice_items, 
                         list(tags$li("Parametric statistical methods assuming normality are likely appropriate.")))
      } else {
        advice_items <- c(advice_items, 
                         list(tags$li("Statistical methods appropriate for the selected distribution can be used.")))
      }
    } else {
      # Poor fit to reference distribution
      if (input$dist_type == "norm") {
        advice_items <- c(advice_items, 
                         list(tags$li("Consider non-parametric statistical methods due to deviations from normality.")))
        
        # Suggest data transformations if appropriate
        if (grepl("right-skewed", dist_type)) {
          advice_items <- c(advice_items, 
                           list(tags$li("Try log or square root transformations to address right skew.")))
        } else if (grepl("left-skewed", dist_type)) {
          advice_items <- c(advice_items, 
                           list(tags$li("Try square or cube transformations to address left skew.")))
        } else if (grepl("heavy tails", dist_type)) {
          advice_items <- c(advice_items, 
                           list(tags$li("Consider robust statistical methods that are less sensitive to outliers.")))
        }
      } else {
        advice_items <- c(advice_items, 
                         list(tags$li("Try different theoretical distributions to find a better fit for your data.")))
      }
    }
    
    # Always suggest visual confirmation
    advice_items <- c(advice_items, 
                     list(tags$li("Always supplement this analysis with domain knowledge and other diagnostic plots.")))
    
    # Complete advice
    advice <- tagList(
      advice,
      tags$ul(advice_items)
    )
    
    # Return complete interpretation
    tagList(interpretation_text, advice)
  })

  # Generate deviation pattern interpretation
  output$deviation_pattern <- renderUI({
    req(input$analyze > 0)
    error <- validate_data()
    if (!is.null(error) && startsWith(error, "Error")) return(NULL)
    
    stats <- deviation_stats()
    
    # Create detailed deviation pattern analysis
    div(
      h5("Deviation Pattern Analysis"),
      p("The pattern of deviations in the QQ plot provides insight into how the data distribution differs from the reference distribution:"),
      tags$ul(
        tags$li(tags$b("Lower Tail:"), 
               ifelse(stats$mean_lower < -0.5 * stats$sd_deviation, 
                      " Points fall below the reference line, indicating fewer small values than expected.", 
                      ifelse(stats$mean_lower > 0.5 * stats$sd_deviation,
                             " Points fall above the reference line, indicating more small values than expected.",
                             " Points generally follow the reference line, indicating the lower tail fits the reference distribution."))),
        tags$li(tags$b("Middle Section:"), 
               ifelse(stats$mean_middle < -0.5 * stats$sd_deviation, 
                      " Points fall below the reference line, suggesting the center of the distribution is shifted left.", 
                      ifelse(stats$mean_middle > 0.5 * stats$sd_deviation,
                             " Points fall above the reference line, suggesting the center of the distribution is shifted right.",
                             " Points generally follow the reference line, indicating the middle fits the reference distribution."))),
        tags$li(tags$b("Upper Tail:"), 
               ifelse(stats$mean_upper < -0.5 * stats$sd_deviation, 
                      " Points fall below the reference line, indicating fewer large values than expected.", 
                      ifelse(stats$mean_upper > 0.5 * stats$sd_deviation,
                             " Points fall above the reference line, indicating more large values than expected.",
                             " Points generally follow the reference line, indicating the upper tail fits the reference distribution.")))
      )
    )
  })
}

shinyApp(ui = ui, server = server)

How QQ Plots Work

QQ plots work by comparing the quantiles (percentiles) of your observed data to the quantiles of a theoretical distribution:

qqPlotFlowchart = {
  // Canvas setup with landscape orientation
  const width = 1400;
  const height = 400;
  const padding = 60;
  
  // Create SVG with explicit viewBox
  const svg = d3.create("svg")
    .attr("width", width)
    .attr("height", height)
    .attr("viewBox", [0, 0, width, height])
    .attr("style", "max-width: 100%; height: auto; font: 16px sans-serif;");
  
  // Add title at the bottom
  svg.append("text")
    .attr("x", width / 2)
    .attr("y", height - 20)
    .attr("text-anchor", "middle")
    .attr("font-size", "20px")
    .attr("font-weight", "bold")
    .text("Q-Q Plot Assessment Procedure");
  
  // Define nodes with combined horizontal and vertical layout
  const nodes = [
    // Top row (horizontal flow from left)
    {id: "A", label: "Your Data", x: padding + 80, y: padding + 80},
    {id: "B", label: "Sort Values", x: padding + 260, y: padding + 80},
    {id: "C", label: "Calculate Empirical Quantiles\n1st, 2nd, ... 99th percentiles", x: padding + 480, y: padding + 80},
    
    // Middle process nodes - horizontal layout
    {id: "F", label: "Plot Points:\nEmpirical vs Theoretical Quantiles", x: padding + 770, y: padding + 150},
    {id: "G", label: "Assess Pattern\nof Points", x: padding + 1000, y: padding + 150},
    
    // Bottom row (also horizontal flow from left)
    {id: "D", label: "Theoretical Distribution", x: padding + 80, y: padding + 280},
    {id: "E", label: "Calculate Theoretical Quantiles\n1st, 2nd, ... 99th percentiles", x: padding + 480, y: padding + 280},
    
    // Decision diamond - at the right
    {id: "H", label: "Do points follow\ndiagonal line?", x: padding + 1200, y: padding + 150, isDecision: true},
    
    // Final column (outcomes) - vertical layout from diamond
    {id: "I", label: "Data follows\ntheoretical distribution", x: padding + 1200, y: padding + 50},
    {id: "J", label: "Data deviates from\ntheoretical distribution", x: padding + 1200, y: padding + 250}
  ];
  
  // Define edges
  const edges = [
    // Top branch
    {source: "A", target: "B", label: "", order: 1},
    {source: "B", target: "C", label: "", order: 1},
    
    // Bottom branch
    {source: "D", target: "E", label: "", order: 1},
    
    // Merging paths
    {source: "C", target: "F", label: "", order: 1},
    {source: "E", target: "F", label: "", order: 1},
    
    // Final horizontal path
    {source: "F", target: "G", label: "", order: 1},
    {source: "G", target: "H", label: "", order: 1},
    
    // Decision outcomes - vertical layout
    {source: "H", target: "I", label: "Yes", order: 2},
    {source: "H", target: "J", label: "No", order: 2}
  ];
  
  // Sort edges by order
  edges.sort((a, b) => a.order - b.order);
  
  // Define arrow marker
  svg.append("defs").append("marker")
    .attr("id", "arrowhead")
    .attr("viewBox", "0 0 10 10")
    .attr("refX", 9)
    .attr("refY", 5)
    .attr("markerWidth", 8)
    .attr("markerHeight", 8)
    .attr("orient", "auto")
    .append("path")
    .attr("d", "M 0 0 L 10 5 L 0 10 z")
    .attr("fill", "#444");
  
  // Draw edges with path calculation
  const edgeLines = svg.selectAll("path.edge")
    .data(edges)
    .join("path")
    .attr("class", d => `edge order-${d.order}`)
    .attr("d", d => {
      const source = nodes.find(n => n.id === d.source);
      const target = nodes.find(n => n.id === d.target);
      
      // Calculate connector points
      let sourceX, sourceY, targetX, targetY;
      let path = "";
      
      // DECISION DIAMOND PATHS - VERTICAL LAYOUT
      if (source.isDecision) {
        if (d.label === "Yes") {
          // Going up from decision
          sourceX = source.x;
          sourceY = source.y - 30;
          targetX = target.x;
          targetY = target.y + 25;
          
          // Direct vertical path
          path = `M${sourceX},${sourceY} L${targetX},${targetY}`;
        } else if (d.label === "No") {
          // Going down from decision
          sourceX = source.x;
          sourceY = source.y + 30;
          targetX = target.x;
          targetY = target.y - 25;
          
          // Direct vertical path
          path = `M${sourceX},${sourceY} L${targetX},${targetY}`;
        }
      }
      // MERGING PATHS TO F - SPECIAL CASE
      else if (source.id === "C" && target.id === "F") {
        // C to F (down and right)
        sourceX = source.x + 90;
        sourceY = source.y;
        targetX = target.x - 90;
        targetY = target.y;
        
        // Create angled path with right angle
        path = `M${sourceX},${sourceY} 
               L${sourceX + 30},${sourceY} 
               L${sourceX + 30},${targetY} 
               L${targetX},${targetY}`;
      }
      else if (source.id === "E" && target.id === "F") {
        // E to F (up and right)
        sourceX = source.x + 90;
        sourceY = source.y;
        targetX = target.x - 90;
        targetY = target.y;
        
        // Create angled path with right angle
        path = `M${sourceX},${sourceY} 
               L${sourceX + 30},${sourceY} 
               L${sourceX + 30},${targetY} 
               L${targetX},${targetY}`;
      }
      // HORIZONTAL PATHS (most common in this layout)
      else if (Math.abs(target.y - source.y) < 30) {
        // Direct horizontal with small vertical difference
        sourceX = source.x + 90;
        sourceY = source.y;
        targetX = target.x - 90;
        targetY = target.y;
        path = `M${sourceX},${sourceY} L${targetX},${targetY}`;
      }
      // DEFAULT CONNECTOR PATH
      else {
        sourceX = source.x + 90;
        sourceY = source.y;
        targetX = target.x - 90;
        targetY = target.y;
        path = `M${sourceX},${sourceY} L${targetX},${targetY}`;
      }
      
      return path;
    })
    .attr("stroke", "#666")
    .attr("stroke-width", 2)
    .attr("fill", "none")
    .attr("marker-end", "url(#arrowhead)");
  
  // Add edge labels for Yes/No
  svg.selectAll(".edgelabel")
    .data(edges.filter(d => d.label !== ""))
    .join("text")
    .attr("class", "edgelabel")
    .attr("text-anchor", "middle")
    .attr("dominant-baseline", "middle")
    .attr("x", d => {
      const source = nodes.find(n => n.id === d.source);
      
      if (d.label === "Yes") {
        return source.x + 40; // Position to the right of the vertical Yes path
      } else if (d.label === "No") {
        return source.x + 40; // Position to the right of the vertical No path
      } else {
        return source.x + 40;
      }
    })
    .attr("y", d => {
      const source = nodes.find(n => n.id === d.source);
      const target = nodes.find(n => n.id === d.target);
      
      if (d.label === "Yes") {
        return (source.y + target.y) / 2 - 10; // Above midpoint of vertical path
      } else if (d.label === "No") {
        return (source.y + target.y) / 2 + 10; // Below midpoint of vertical path
      } else {
        return source.y - 10;
      }
    })
    .attr("font-size", "14px")
    .attr("font-weight", "bold")
    .attr("fill", d => d.label === "Yes" ? "#5a9bd5" : (d.label === "No" ? "#ff9052" : "#333"))
    .text(d => d.label);
  
  // Draw nodes
  const node = svg.selectAll(".node")
    .data(nodes)
    .join("g")
    .attr("class", "node")
    .attr("transform", d => `translate(${d.x},${d.y})`);
  
  // Add node shapes (rectangles or diamonds)
  node.each(function(d) {
    const elem = d3.select(this);
    
    if (d.isDecision) {
      // Diamond for decision node
      elem.append("polygon")
        .attr("points", "0,-30 60,0 0,30 -60,0")
        .attr("fill", "#f8d56f")
        .attr("stroke", "#d4a82e")
        .attr("stroke-width", 2);
    } else {
      // Rectangle for regular node with fixed width
      const boxWidth = 180;
      elem.append("rect")
        .attr("x", -boxWidth/2)
        .attr("y", -25)
        .attr("width", boxWidth)
        .attr("height", 50)
        .attr("rx", 5)
        .attr("ry", 5)
        .attr("fill", d => {
          if (d.id === "I" || d.id === "J") return "#f0f0f0";
          if (d.id === "A" || d.id === "D") return "#e2f0d9"; // Light green for data inputs
          return "#b3deff";
        })
        .attr("stroke", d => {
          if (d.id === "I" || d.id === "J") return "#999";
          if (d.id === "A" || d.id === "D") return "#70ad47"; // Green border for data inputs
          return "#4a98e0";
        })
        .attr("stroke-width", 2);
    }
  });
  
  // Add node labels with text wrapping
  node.append("text")
    .attr("text-anchor", "middle")
    .attr("dominant-baseline", "middle")
    .attr("font-size", "14px")
    .attr("font-weight", d => (d.id === "H" ? "bold" : "normal"))
    .attr("fill", "#333")
    .each(function(d) {
      const lines = d.label.split('\n');
      const elem = d3.select(this);
      
      if (lines.length === 1) {
        elem.text(d.label);
      } else {
        lines.forEach((line, i) => {
          const lineHeight = 16;
          const yOffset = (i - (lines.length - 1) / 2) * lineHeight;
          elem.append("tspan")
            .attr("x", 0)
            .attr("y", yOffset)
            .text(line);
        });
      }
    });
  
  // Add interactivity
  node.on("mouseover", function(event, d) {
      d3.select(this).select("rect, polygon")
        .transition()
        .duration(200)
        .attr("fill", d => {
          if (d.isDecision) return "#ffc107";
          if (d.id === "A" || d.id === "D") return "#b8e986"; // Brighter green on hover
          if (d.id === "I" || d.id === "J") return "#e6e6e6";
          return "#7fc9ff";
        });
    })
    .on("mouseout", function(event, d) {
      d3.select(this).select("rect, polygon")
        .transition()
        .duration(200)
        .attr("fill", d => {
          if (d.isDecision) return "#f8d56f";
          if (d.id === "I" || d.id === "J") return "#f0f0f0";
          if (d.id === "A" || d.id === "D") return "#e2f0d9";
          return "#b3deff";
        });
    });
  
  return svg.node();
}

Mathematical Procedure

Sort your data in ascending order: \(x_{(1)} \leq x_{(2)} \leq ... \leq x_{(n)}\)
Calculate empirical quantiles by assigning probabilities to each data point:
- For \(n\) observations, the \(i\)-th ordered value corresponds approximately to the \((i - 0.5)/n\) quantile
- So each data point \(x_{(i)}\) is plotted against the theoretical quantile corresponding to probability \((i - 0.5)/n\)
Calculate theoretical quantiles for the reference distribution:
- For normal distribution, use \(\Phi^{-1}((i - 0.5)/n)\) where \(\Phi^{-1}\) is the inverse of the standard normal CDF
- For other distributions, use the appropriate quantile function
Plot empirical quantiles against theoretical quantiles:
- If the data follows the theoretical distribution, points will approximately follow a straight line
- Deviations from the straight line indicate deviations from the theoretical distribution

Interpreting QQ Plot Patterns

Different patterns in QQ plots reveal specific types of deviations from the theoretical distribution:

Pattern	Interpretation
Points follow diagonal line	Data follows the theoretical distribution
S-shaped curve	Data has different skewness than the theoretical distribution
Points curve upward at ends	Data has heavier tails (more extreme values)
Points curve downward at ends	Data has lighter tails (fewer extreme values)
Points above line at left, below at right	Data is left-skewed (negative skew)
Points below line at left, above at right	Data is right-skewed (positive skew)
Step pattern	Data is discrete or has tied values
Isolated points at ends	Data contains outliers

Confidence Bands

Confidence bands on a QQ plot provide a visual guide to assess whether deviations from the diagonal line are statistically significant:

Points falling within the bands could be consistent with random variation from the theoretical distribution
Points falling outside the bands suggest significant deviations from the theoretical distribution
Width of the bands depends on the sample size and chosen confidence level

QQ Plots vs. Formal Normality Tests

QQ plots complement formal normality tests like Shapiro-Wilk or Kolmogorov-Smirnov:

Aspect	QQ Plots	Formal Tests
Output	Visual assessment	p-value
Information	Shows where and how data deviates	Binary decision (normal/not normal)
Sample size sensitivity	Works for any sample size	May flag trivial deviations in large samples
Multiple distributions	Can compare to any distribution	Usually specific to one distribution
Learning curve	Requires practice to interpret patterns	Simple p-value interpretation
Transformation guidance	Shows which transformation might help	Doesn’t suggest transformations

For the most comprehensive assessment, use both QQ plots and formal tests together.

Example 1: Normally Distributed Data

A researcher collected measurements of adult heights (in cm) and wants to check if the data follows a normal distribution.

Data (sample of 30 height measurements in cm):

165, 172, 168, 175, 171, 163, 169, 170, 178, 167, 173, 180, 166, 174, 172, 169, 177, 168, 173, 171, 169, 175, 172, 170, 174, 168, 176, 171, 173, 170

Analysis:

QQ Plot Assessment:
- Points generally follow the diagonal line with minor random deviations
- No systematic curvature is observed
- All points fall within the 95% confidence bands
Deviation Analysis:
- Lower tail: Small deviations with no systematic pattern
- Middle section: Points closely follow the diagonal line
- Upper tail: Small deviations with no systematic pattern
Formal Tests:
- Shapiro-Wilk: W = 0.9827, p = 0.8833
- Kolmogorov-Smirnov: D = 0.0948, p = 0.9367

Conclusion:

The height data appears to follow a normal distribution. Both the visual assessment (QQ plot) and formal tests support this conclusion. The minor deviations from the diagonal line are consistent with random sampling variation.

How to Report: “Adult height measurements were assessed for normality using a QQ plot and formal tests. The QQ plot showed points following the diagonal reference line with no systematic deviations, and formal tests did not reject normality (Shapiro-Wilk: W = 0.983, p = 0.883; Kolmogorov-Smirnov: D = 0.095, p = 0.937). This confirms that parametric statistical methods assuming normality are appropriate for analyzing this data.”

Example 2: Right-Skewed Data

A researcher collected response time data (in milliseconds) from a cognitive experiment and wants to check the distribution.

Data Summary:

Sample size: 40 response times
Mean: 342 ms
Median: 328 ms
Range: 203 to 687 ms

Analysis:

QQ Plot Assessment:
- Points follow an upward curve pattern (below the line on the left, above the line on the right)
- Several points in the upper tail fall outside the confidence bands
- The deviation pattern indicates right skew (positive skew)
Deviation Analysis:
- Lower tail: Points fall below the diagonal line, indicating fewer small values than expected
- Middle section: Points cross the diagonal line from below to above
- Upper tail: Points fall significantly above the diagonal line, indicating more large values than expected
Formal Tests:
- Shapiro-Wilk: W = 0.8651, p = 0.0012
- Kolmogorov-Smirnov: D = 0.1862, p = 0.0012

Conclusion:

The response time data shows a clear right-skewed (positively skewed) distribution and does not follow a normal distribution. Both the QQ plot and formal tests strongly indicate non-normality.

How to Report: “Response time data was assessed for normality using a QQ plot and formal tests. The QQ plot revealed a clear right-skewed pattern with points falling below the diagonal line in the lower tail and above the line in the upper tail. Formal tests confirmed significant deviation from normality (Shapiro-Wilk: W = 0.865, p = 0.001). A log transformation is recommended before applying parametric statistical methods to this data.”

Common QQ Plot Patterns and What They Mean

Understanding the specific patterns in QQ plots helps identify the exact nature of your data distribution:

Normal Distribution

If your data follows a normal distribution, the QQ plot shows:

Points following the diagonal reference line
Minor random deviations equally distributed above and below the line
No systematic patterns or curvature
Most or all points falling within the confidence bands

Right-Skewed (Positive Skew)

A right-skewed distribution shows:

Points following a curve that starts below the diagonal line on the left
Points rising above the diagonal line on the right
The pattern resembles a curve that bends upward
Consider log or square root transformations

Left-Skewed (Negative Skew)

A left-skewed distribution shows:

Points following a curve that starts above the diagonal line on the left
Points falling below the diagonal line on the right
The pattern resembles a curve that bends downward
Consider square or cube transformations

Heavy Tails (Leptokurtic)

Data with heavier tails than the normal distribution shows:

Points following an S-shaped pattern
Middle points falling below the diagonal line
End points rising above the diagonal line at both ends
Indicates more extreme values than expected in a normal distribution

Light Tails (Platykurtic)

Data with lighter tails than the normal distribution shows:

Points following an inverted S-shaped pattern
Middle points rising above the diagonal line
End points falling below the diagonal line at both ends
Indicates fewer extreme values than expected in a normal distribution

Bimodal/Multimodal Distribution

Data with multiple modes often shows:

Step-like patterns in the QQ plot
Horizontal segments or abrupt changes in slope
May indicate mixed populations or grouped data

Choosing Data Transformations Based on QQ Plots

QQ plots can guide your choice of data transformation to achieve normality:

Distribution Pattern	Recommended Transformations
Right-skewed (positive)	Log, Square root, Reciprocal
Left-skewed (negative)	Square, Cube, Exponential
Heavy-tailed	Box-Cox, Logit
Light-tailed	Arcsine, Probit
Bimodal	Consider analyzing subgroups separately

A good transformation will produce a new QQ plot with points that better follow the diagonal line.

Test Your Understanding

What does a straight line pattern in a QQ plot indicate?
- 1. The data is bimodal
- 1. The data follows the reference distribution
- 1. The data has outliers
- 1. The data is highly skewed
If points in a QQ plot curve upward at both ends (above the line), what does this suggest?
- 1. The data has heavy tails (more extreme values than a normal distribution)
- 1. The data has light tails (fewer extreme values than a normal distribution)
- 1. The data is right-skewed
- 1. The data is left-skewed
What pattern would indicate right-skewed (positively skewed) data in a QQ plot?
- 1. Points above the line at the left end, below at the right end
- 1. Points below the line at the left end, above at the right end
- 1. Points following the diagonal line perfectly
- 1. Points forming a horizontal pattern
What advantage do QQ plots have over formal normality tests like Shapiro-Wilk?
- 1. They’re always more accurate
- 1. They provide exact p-values
- 1. They show where and how data deviates from normality
- 1. They require smaller sample sizes
If data follows a log-normal distribution, what pattern would you expect in a QQ plot against a normal distribution?
- 1. A straight diagonal line
- 1. An upward curve (suggesting positive skew)
- 1. A downward curve (suggesting negative skew)
- 1. A horizontal line

Answers: 1-B, 2-A, 3-B, 4-C, 5-B

Common Questions About QQ Plots

What’s the difference between a QQ plot and a PP plot?

Both QQ (Quantile-Quantile) and PP (Probability-Probability) plots assess distributional fit, but they do so differently:

QQ plots compare the actual data quantiles against theoretical distribution quantiles. They’re better at showing deviations in the tails of distributions and are more widely used for normality assessment.
PP plots compare the empirical cumulative distribution function (CDF) to the theoretical CDF. They’re sometimes better for assessing the fit in the center of the distribution.

QQ plots are generally preferred for normality testing because they magnify deviations in the tails, which are often of particular interest.

How many data points do I need for a meaningful QQ plot?

QQ plots can be used with almost any sample size, but their interpretability varies:

With very small samples (n < 10), patterns may be difficult to discern, and random variation can mislead interpretation
For moderate samples (10 ≤ n ≤ 50), QQ plots are highly effective and provide reliable visual assessment
For large samples (n > 50), even minor deviations from normality become visible, so you should focus on practical significance rather than perfect alignment

Unlike formal tests that may become overly sensitive with large samples, QQ plots remain useful as they show the magnitude and pattern of deviations.

What do the confidence bands in QQ plots represent?

Confidence bands in QQ plots provide a visual guide to assess whether deviations from the diagonal line are statistically significant:

Points falling within the bands are consistent with random variation from the theoretical distribution
Points falling outside the bands suggest statistically significant deviations
The width of the bands depends on the sample size and chosen confidence level (typically 95%)

The bands are particularly useful for distinguishing between meaningful deviations and random sampling variation, especially with smaller sample sizes.

Should I use QQ plots instead of formal normality tests?

The best approach is to use both QQ plots and formal tests together:

QQ plots show the pattern, location, and magnitude of deviations from normality, helping you understand how your data deviates
Formal tests (like Shapiro-Wilk or Kolmogorov-Smirnov) provide objective criteria through p-values

When they agree, you can be more confident in your conclusion. When they disagree (particularly with large samples where formal tests may reject normality despite minor deviations), QQ plots help you assess practical significance.

Can QQ plots be used for distributions other than normal?

Yes, QQ plots can compare your data to any theoretical distribution, not just normal:

For lognormal distributions, exponential distributions, Weibull distributions, etc.
You can even create QQ plots to compare two empirical data samples to each other

When using non-normal reference distributions, the interpretation remains the same: points following a straight line indicate that your data follows the specified distribution.

How do I choose the right transformation based on a QQ plot?

The pattern of deviation in your QQ plot suggests which transformation might normalize your data:

Upward curve (right skew/positive skew): Try log, square root, or inverse transformations
Downward curve (left skew/negative skew): Try square, cube, or exponential transformations
S-curve with heavy tails: Try logit transformation or Box-Cox with λ < 1
Inverted S-curve with light tails: Try arcsine transformation or Box-Cox with λ > 1

After applying a transformation, create a new QQ plot to assess whether normality has improved. You may need to try several transformations to find the optimal one.

Examples of When to Use QQ Plots

Before parametric tests: To verify normality assumptions for t-tests, ANOVA, or regression
Regression diagnostics: To check if residuals are normally distributed
Exploratory data analysis: To understand the shape and characteristics of your data distribution
After data transformations: To evaluate if transformations successfully normalized your data
Model comparison: To determine which theoretical distribution best fits your data
Quality control: To detect changes in process distributions
Financial analysis: To assess return distributions and risk models
Environmental studies: To examine distribution of pollutant measurements
Multivariate outlier detection: To identify multivariate outliers using Mahalanobis distances
Meta-analysis: To check for publication bias using funnel plots (a specialized application of QQ plots)

References

Chambers, J. M., Cleveland, W. S., Kleiner, B., & Tukey, P. A. (1983). Graphical Methods for Data Analysis. Wadsworth.
Wilk, M. B., & Gnanadesikan, R. (1968). Probability plotting methods for the analysis of data. Biometrika, 55(1), 1-17.
Thode, H. C. (2002). Testing for normality. Marcel Dekker.
Cleveland, W. S. (1994). The Elements of Graphing Data. Hobart Press.
Ghasemi, A., & Zahediasl, S. (2012). Normality tests for statistical analysis: a guide for non-statisticians. International Journal of Endocrinology and Metabolism, 10(2), 486-489.
Loy, A., Follett, L., & Hofmann, H. (2016). Variations of Q–Q Plots: The Power of Our Eyes! The American Statistician, 70(2), 202-214.

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{kassambara2025,
  author = {Kassambara, Alboukadel},
  title = {QQ {Plot} {Analysis} \& {Generator} \textbar{} {Test} {Data}
    {Normality} {Visually}},
  date = {2025-04-12},
  url = {https://www.datanovia.com/apps/statfusion/analysis/inferential/goodness-fit/normality/qq-plot-analysis.html},
  langid = {en}
}

For attribution, please cite this work as:

Kassambara, Alboukadel. 2025. “QQ Plot Analysis & Generator | Test Data Normality Visually.” April 12, 2025. https://www.datanovia.com/apps/statfusion/analysis/inferential/goodness-fit/normality/qq-plot-analysis.html.