Kolmogorov-Smirnov Test Calculator | Check Data Normality Online

Key Takeaways: Kolmogorov-Smirnov Normality Test

Tip

Purpose: Test whether data follows a normal distribution or compare two distributions
When to use: For checking normality, especially with larger sample sizes
What it measures: Maximum vertical distance between empirical and theoretical cumulative distributions
Null hypothesis: The data follows a normal distribution (\(H_0\): data is normal)
Alternative hypothesis: The data does not follow a normal distribution (\(H_1\): data is not normal)
Interpretation: If p < 0.05, the data significantly deviates from normality
Advantages: Works well for large samples; can be used to compare any two distributions
Visual component: Shows exactly where distributions differ the most

What is the Kolmogorov-Smirnov Test?

The Kolmogorov-Smirnov (K-S) test is a nonparametric statistical method that measures the maximum difference between an empirical distribution function and a theoretical distribution function. When used as a normality test, it compares your data to a normal distribution to determine if your data could reasonably have been drawn from a normal distribution.

Tip

When to use the Kolmogorov-Smirnov normality test:

When testing if data follows a normal distribution, especially with larger samples
When comparing two sample distributions to see if they differ
When you need to visualize exactly where distributions differ
As an alternative to the Shapiro-Wilk test, particularly with larger datasets
When checking assumptions for parametric statistical tests

This online calculator allows you to quickly perform a Kolmogorov-Smirnov normality test, visualize your data distribution, and interpret the results with confidence.

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 1300

library(shiny)
library(bslib)
library(ggplot2)
library(bsicons)
library(vroom)
library(shinyjs)

ui <- page_sidebar(
  title = "Kolmogorov-Smirnov Normality Test",
  useShinyjs(),  # Enable shinyjs for dynamic UI updates
  sidebar = sidebar(
    width = 400,
    
    card(
      card_header("Data Input"),
      accordion(
        accordion_panel(
          "Manual Input",
          textAreaInput("data_input", "Enter your data (one value per row):", rows = 8,
                      placeholder = "Paste values here..."),
          div(
            actionLink("use_example", "Use example data", style = "color:#0275d8;"),
            tags$span(bs_icon("file-earmark-text"), style = "margin-left: 5px; color: #0275d8;")
          )
        ),
        accordion_panel(
          "File Upload",
          fileInput("file_upload", "Upload CSV or TXT file:",
                   accept = c("text/csv", "text/plain", ".csv", ".txt")),
          checkboxInput("header", "File has header", TRUE),
          conditionalPanel(
            condition = "output.file_uploaded",
            div(
              selectInput("selected_var", "Select variable:", choices = NULL),
              actionButton("clear_file", "Clear File", class = "btn-danger btn-sm")
            )
          )
        ),
        id = "input_method",
        open = 1
      ),
      
      # Advanced Options accordion
      accordion(
        accordion_panel(
          "Advanced Options",
          
          card(
            card_header("Significance Level:"),
            card_body(
              sliderInput("alpha", NULL, min = 0.01, max = 0.10, value = 0.05, step = 0.01)
            )
          ),
          
          card(
            card_header("Test Options:"),
            card_body(
              selectInput("alternative", "Alternative Hypothesis:",
                         choices = c("Two-sided" = "two.sided", 
                                    "Less than normal" = "less",
                                    "Greater than normal" = "greater"),
                         selected = "two.sided"),
              checkboxInput("exact", "Use exact p-values when possible", TRUE)
            )
          ),
          
          card(
            card_header("Plot Options:"),
            card_body(
              checkboxInput("show_density", "Show density curve", TRUE),
              checkboxInput("show_normal", "Show normal curve", TRUE),
              sliderInput("bins", "Number of histogram bins:", min = 5, max = 50, value = 20)
            )
          )
        ),
        open = FALSE
      ),
      
      actionButton("run_test", "Run Test", class = "btn btn-primary")
    ),

    hr(),

    card(
      card_header("Interpretation"),
      card_body(
        div(class = "alert alert-info",
          tags$ul(
            tags$li(tags$b("Null hypothesis (H₀):"), " The data follows a normal distribution."),
            tags$li(tags$b("Alternative hypothesis (H₁):"), " The data does not follow a normal distribution."),
            tags$li("If p-value ≥ 0.05, there is not enough evidence to reject normality."),
            tags$li("If p-value < 0.05, the data significantly deviates from a normal distribution."),
            tags$li("The Kolmogorov-Smirnov test compares your data's cumulative distribution to a theoretical normal distribution.")
          )
        )
      )
    )
  ),

  layout_column_wrap(
    width = 1,

    card(
      card_header("Test Results"),
      card_body(
        navset_tab(
          nav_panel("Results", uiOutput("error_message"), verbatimTextOutput("test_results")),
          nav_panel("Explanation", div(style = "font-size: 0.9rem;",
            p("The Kolmogorov-Smirnov test compares your data to a normal distribution:"),
            tags$ul(
              tags$li("It measures the maximum vertical distance (D statistic) between the empirical cumulative distribution function (ECDF) of your data and the cumulative distribution function (CDF) of a normal distribution."),
              tags$li("Unlike the Shapiro-Wilk test, it works well for larger sample sizes."),
              tags$li("The test is less powerful than Shapiro-Wilk for small to medium samples."),
              tags$li("Visual inspection (histograms, Q-Q plots, and ECDF plots) should always supplement this test.")
            )
          ))
        )
      )
    ),

    card(
      card_header("Visual Assessment"),
      card_body(
        navset_tab(
          nav_panel("Histogram",
            navset_tab(
              nav_panel("Plot", plotOutput("histogram")),
              nav_panel("Explanation", div(style = "font-size: 0.9rem;",
                p("The histogram helps visualize the shape of your data distribution:"),
                tags$ul(
                  tags$li("For normal data, the histogram should appear approximately bell-shaped and symmetric."),
                  tags$li("The red curve shows the kernel density estimate of your data."),
                  tags$li("The blue dashed curve shows a normal distribution with the same mean and standard deviation as your data."),
                  tags$li("Compare these curves to assess normality visually.")
                )
              ))
            )
          ),
          nav_panel("Q-Q Plot",
            navset_tab(
              nav_panel("Plot", plotOutput("qqplot")),
              nav_panel("Explanation", div(style = "font-size: 0.9rem;",
                p("The Q-Q (Quantile-Quantile) plot compares your data's quantiles against theoretical quantiles from a normal distribution:"),
                tags$ul(
                  tags$li("If points closely follow the diagonal reference line, the data is approximately normal."),
                  tags$li("Systematic deviations from the line indicate non-normality."),
                  tags$li("Curves at the ends suggest heavy or light tails."),
                  tags$li("S-shaped patterns indicate skewness.")
                )
              ))
            )
          ),
          nav_panel("ECDF Plot",
            navset_tab(
              nav_panel("Plot", plotOutput("ecdf_plot")),
              nav_panel("Explanation", div(style = "font-size: 0.9rem;",
                p("The ECDF (Empirical Cumulative Distribution Function) plot is central to the Kolmogorov-Smirnov test:"),
                tags$ul(
                  tags$li("The solid line shows your data's empirical cumulative distribution."),
                  tags$li("The dashed line shows the cumulative distribution function of a normal distribution."),
                  tags$li("The maximum vertical distance between these lines is the D statistic used in the test."),
                  tags$li("For normal data, these lines should be very close to each other.")
                )
              ))
            )
          )
        )
      )
    )
  )
)

server <- function(input, output, session) {
  # Example data
  example_data <- "8.44\n7.16\n16.94\n9.59\n13.25\n12.94\n11\n5.61\n10.6\n12.81"

  # Track input method
  input_method <- reactiveVal("manual")
  
  # Function to clear file inputs
  clear_file_inputs <- function() {
    updateSelectInput(session, "selected_var", choices = NULL)
    reset("file_upload")
  }
  
  # Function to clear text inputs
  clear_text_inputs <- function() {
    updateTextAreaInput(session, "data_input", value = "")
  }

  # When example data is used, clear file inputs and set text inputs
  observeEvent(input$use_example, {
    input_method("manual")
    clear_file_inputs()
    updateTextAreaInput(session, "data_input", value = example_data)
  })

  # When file is uploaded, clear text inputs and set file method
  observeEvent(input$file_upload, {
    if (!is.null(input$file_upload)) {
      input_method("file")
      clear_text_inputs()
    }
  })

  # When clear file button is clicked, clear file and set manual method
  observeEvent(input$clear_file, {
    input_method("manual")
    clear_file_inputs()
  })
  
  # When text input changes, clear file inputs if it has content
  observeEvent(input$data_input, {
    if (!is.null(input$data_input) && nchar(input$data_input) > 0) {
      input_method("manual")
      clear_file_inputs()
    }
  }, ignoreInit = TRUE)

  # Process uploaded file
  file_data <- reactive({
    req(input$file_upload)
    tryCatch({
      vroom::vroom(input$file_upload$datapath, delim = NULL, col_names = input$header, show_col_types = FALSE)
    }, error = function(e) {
      showNotification(paste("File read error:", e$message), type = "error")
      NULL
    })
  })

  # Update variable selection dropdown with numeric columns from uploaded file
  observe({
    df <- file_data()
    if (!is.null(df)) {
      num_vars <- names(df)[sapply(df, is.numeric)]
      updateSelectInput(session, "selected_var", choices = num_vars)
    }
  })

  output$file_uploaded <- reactive({
    !is.null(input$file_upload)
  })
  outputOptions(output, "file_uploaded", suspendWhenHidden = FALSE)

  # Function to parse text input
  parse_text_input <- function(text) {
    if (is.null(text) || text == "") return(NULL)
    input_lines <- strsplit(text, "\\r?\\n")[[1]]
    input_lines <- input_lines[input_lines != ""]
    numeric_values <- suppressWarnings(as.numeric(input_lines))
    if (all(is.na(numeric_values))) return(NULL)
    return(na.omit(numeric_values))
  }

  # Get data values based on input method
  data_values <- reactive({
    if (input_method() == "file" && !is.null(file_data()) && !is.null(input$selected_var)) {
      df <- file_data()
      return(na.omit(df[[input$selected_var]]))
    } else {
      return(parse_text_input(input$data_input))
    }
  })

  # Validate input data
  validate_data <- reactive({
    values <- data_values()
    
    if (is.null(values)) {
      return("Error: Please check your input. Make sure all values are numeric.")
    }
    
    if (length(values) < 5) {
      return("Error: At least 5 values are recommended for the Kolmogorov-Smirnov test.")
    }
    
    if (length(unique(values)) == 1) {
      return("Error: All values are identical. The Kolmogorov-Smirnov test requires variation in the data.")
    }
    
    return(NULL)
  })

  # Display error messages
  output$error_message <- renderUI({
    error <- validate_data()
    if (!is.null(error) && input$run_test > 0) {
      if (startsWith(error, "Warning")) {
        div(class = "alert alert-warning", error)
      } else {
        div(class = "alert alert-danger", error)
      }
    }
  })

  # Run the Kolmogorov-Smirnov test
  test_result <- eventReactive(input$run_test, {
    error <- validate_data()
    if (!is.null(error) && startsWith(error, "Error")) return(NULL)
    
    values <- data_values()
    
    # Normalize the data for standard normal distribution comparison
    z_values <- (values - mean(values)) / sd(values)
    
    # Run the KS test against normal distribution
    ks.test(values, "pnorm", mean = mean(values), sd = sd(values), 
            alternative = input$alternative, exact = input$exact)
  })

  # Calculate critical D value for the KS test
  critical_d <- reactive({
    req(test_result())
    values <- data_values()
    n <- length(values)
    
    # Calculate critical D value at alpha significance level
    # Formula depends on sample size and significance level
    if (input$alternative == "two.sided") {
      # For two-sided test
      if (n > 35) {
        # For large samples, asymptotic formula
        return(sqrt(-0.5 * log(input$alpha / 2) / n))
      } else {
        # Approximate for small samples
        return(ifelse(input$alpha == 0.05, 1.36 / sqrt(n), 
                     ifelse(input$alpha == 0.01, 1.63 / sqrt(n), 1.22 / sqrt(n))))
      }
    } else {
      # For one-sided test
      if (n > 35) {
        # For large samples, asymptotic formula
        return(sqrt(-0.5 * log(input$alpha) / n))
      } else {
        # Approximate for small samples
        return(ifelse(input$alpha == 0.05, 1.22 / sqrt(n), 
                     ifelse(input$alpha == 0.01, 1.52 / sqrt(n), 1.07 / sqrt(n))))
      }
    }
  })

  # Display test results
  output$test_results <- renderPrint({
    if (is.null(test_result())) return(NULL)
    
    result <- test_result()
    values <- data_values()
    
    # Calculate skewness and kurtosis if e1071 is available
    skew_val <- tryCatch({
      e1071::skewness(values)
    }, error = function(e) {
      NA
    })
    
    kurt_val <- tryCatch({
      e1071::kurtosis(values)
    }, error = function(e) {
      NA
    })
    
    cat("Kolmogorov-Smirnov Normality Test Results:\n\n")
    cat("D statistic:", round(result$statistic, 4), "\n")
    cat("p-value:", format.pval(result$p.value, digits = 4), "\n\n")
    
    cat("Data Summary:\n")
    cat("Sample size:", length(values), "\n")
    cat("Mean:", round(mean(values), 4), "\n")
    cat("Median:", round(median(values), 4), "\n")
    cat("Standard deviation:", round(sd(values), 4), "\n")
    
    if (!is.na(skew_val)) {
      cat("Skewness:", round(skew_val, 4), "\n")
    }
    
    if (!is.na(kurt_val)) {
      cat("Kurtosis:", round(kurt_val, 4), "\n")
    }
    
    cat("\nCritical value (D critical at α =", input$alpha, "):", round(critical_d(), 4), "\n\n")
    
    cat("Test Interpretation:\n")
    if (result$p.value < input$alpha) {
      cat("The p-value (", format.pval(result$p.value, digits = 4), 
          ") is less than the significance level (", input$alpha, ").\n", sep = "")
      cat("We reject the null hypothesis. There is significant evidence\n")
      cat("to suggest the data does not follow a normal distribution.")
    } else {
      cat("The p-value (", format.pval(result$p.value, digits = 4), 
          ") is greater than or equal to the significance level (", input$alpha, ").\n", sep = "")
      cat("We fail to reject the null hypothesis. There is not enough evidence\n")
      cat("to suggest the data deviates from a normal distribution.")
    }
  })

  # Generate histogram
  output$histogram <- renderPlot({
    req(input$run_test > 0)
    error <- validate_data()
    if (!is.null(error) && startsWith(error, "Error")) return(NULL)
    
    values <- data_values()
    
    p <- ggplot(data.frame(x = values), aes(x = x)) +
      geom_histogram(aes(y = ..density..), bins = input$bins, 
                     fill = "#5dade2", color = "#2874a6", alpha = 0.7) +
      labs(title = "Distribution of Data",
           subtitle = paste("Kolmogorov-Smirnov test: D =", round(test_result()$statistic, 4), 
                           ", p =", format.pval(test_result()$p.value, digits = 4)),
           x = "Value", y = "Density") +
      theme_minimal(base_size = 14)
    
    if (input$show_density) {
      p <- p + geom_density(color = "#c0392b", linewidth = 1.2)
    }
    
    if (input$show_normal) {
      p <- p + stat_function(fun = dnorm, args = list(mean = mean(values), sd = sd(values)), 
                          color = "#2471a3", linewidth = 1.2, linetype = "dashed")
    }
    
    p
  })

  # Generate Q-Q plot
  output$qqplot <- renderPlot({
    req(input$run_test > 0)
    error <- validate_data()
    if (!is.null(error) && startsWith(error, "Error")) return(NULL)
    
    values <- data_values()
    
    ggplot(data.frame(x = values), aes(sample = x)) +
      stat_qq() +
      stat_qq_line(color = "#c0392b") +
      labs(title = "Normal Q-Q Plot",
           subtitle = paste("Kolmogorov-Smirnov test: D =", round(test_result()$statistic, 4), 
                           ", p =", format.pval(test_result()$p.value, digits = 4)),
           x = "Theoretical Quantiles", 
           y = "Sample Quantiles") +
      theme_minimal(base_size = 14)
  })

  # Generate ECDF plot
  output$ecdf_plot <- renderPlot({
    req(input$run_test > 0)
    error <- validate_data()
    if (!is.null(error) && startsWith(error, "Error")) return(NULL)
    
    values <- data_values()
    
    # Create data frame for plotting
    n <- length(values)
    sorted_values <- sort(values)
    
    # Calculate empirical CDF
    ecdf_df <- data.frame(
      x = sorted_values,
      y = seq(1/n, 1, by = 1/n)
    )
    
    # Calculate theoretical normal CDF
    theoretical_x <- seq(min(values) - sd(values), max(values) + sd(values), length.out = 100)
    theoretical_df <- data.frame(
      x = theoretical_x,
      y = pnorm(theoretical_x, mean = mean(values), sd = sd(values))
    )
    
    # Calculate the max difference point (D statistic)
    # This is an approximation for visualization
    theoretical_y_at_observed <- pnorm(sorted_values, mean = mean(values), sd = sd(values))
    diff_df <- data.frame(
      x = sorted_values,
      empirical_y = seq(1/n, 1, by = 1/n),
      theoretical_y = theoretical_y_at_observed,
      diff = abs(seq(1/n, 1, by = 1/n) - theoretical_y_at_observed)
    )
    
    max_diff_point <- diff_df[which.max(diff_df$diff), ]
    
    # Plot
    ggplot() +
      # Empirical CDF
      geom_step(data = ecdf_df, aes(x = x, y = y), color = "#c0392b", linewidth = 1.2) +
      # Theoretical normal CDF
      geom_line(data = theoretical_df, aes(x = x, y = y), 
               color = "#2471a3", linewidth = 1.2, linetype = "dashed") +
      # Maximum difference point
      geom_segment(aes(x = max_diff_point$x, y = max_diff_point$empirical_y,
                      xend = max_diff_point$x, yend = max_diff_point$theoretical_y),
                  color = "#27ae60", linewidth = 1, linetype = "dotted") +
      geom_point(aes(x = max_diff_point$x, y = max_diff_point$empirical_y), 
                color = "#c0392b", size = 3) +
      geom_point(aes(x = max_diff_point$x, y = max_diff_point$theoretical_y), 
                color = "#2471a3", size = 3) +
      # Labels
      labs(title = "Empirical vs Theoretical Normal CDF",
           subtitle = paste("D statistic =", round(test_result()$statistic, 4), 
                           "(maximum vertical distance)"),
           x = "Value", 
           y = "Cumulative Probability") +
      annotate("text", x = max_diff_point$x + 0.5*sd(values), 
              y = mean(c(max_diff_point$empirical_y, max_diff_point$theoretical_y)),
              label = paste("D =", round(test_result()$statistic, 4)),
              color = "#27ae60", fontface = "bold") +
      theme_minimal(base_size = 14)
  })
}

shinyApp(ui = ui, server = server)

How the Kolmogorov-Smirnov Test Works

The Kolmogorov-Smirnov test compares your data’s cumulative distribution function to a specified theoretical distribution (typically the normal distribution for normality testing).

flowchart TB
    A[Your Data] --> B[Calculate Empirical\nCumulative Distribution]
    C[Normal Distribution\nwith same mean and SD] --> D[Calculate Theoretical\nCumulative Distribution]
    B --> E[Find Maximum Vertical Distance\nbetween distributions]
    D --> E
    E --> F[Compare D statistic\nto critical value]
    F --> G{p < 0.05?}
    G --> |Yes| H[Reject H₀\nData is not normal]
    G --> |No| I[Fail to reject H₀\nData may be normal]

Mathematical Procedure

Calculate the empirical cumulative distribution function (ECDF) of your data:

\[F_n(x) = \frac{1}{n}\sum_{i=1}^{n} I_{X_i \leq x}\]

Where \(I_{X_i \leq x}\) is the indicator function equal to 1 if \(X_i \leq x\) and 0 otherwise.
Calculate the theoretical cumulative distribution function (CDF) of the normal distribution:

\[F(x) = \Phi\left(\frac{x - \mu}{\sigma}\right)\]

Where \(\Phi\) is the standard normal CDF, \(\mu\) is the sample mean, and \(\sigma\) is the sample standard deviation.
Calculate the test statistic D (maximum vertical distance):

\[D = \max_x |F_n(x) - F(x)|\]
Calculate p-value by comparing the D statistic to its sampling distribution
Make a decision:
- If p < \(\alpha\) (typically 0.05): Reject the null hypothesis (data is not normal)
- If p ≥ \(\alpha\): Fail to reject the null hypothesis (data may be normal)

Kolmogorov-Smirnov vs. Other Normality Tests

The K-S test has different properties compared to other normality tests:

Test	Strengths	Limitations
Kolmogorov-Smirnov	Works well for large samples; shows where distributions differ; can compare any distributions	Less powerful for small samples; may be conservative when parameters are estimated
Shapiro-Wilk	Most powerful for small to medium samples	Maximum sample size limitations; complex calculation
Anderson-Darling	More sensitive to deviations in the tails	Complex calculation; less intuitive interpretation
Lilliefors	Modification of K-S specifically for normality testing	Still not as powerful as Shapiro-Wilk for smaller samples
D’Agostino-Pearson	Based on skewness and kurtosis; good overall power	Requires larger samples for reliable results

Important Considerations

Sample size effects:
- For very small samples (n < 5), the test has low power
- With large samples (n > 300), even minor deviations from normality may be detected as significant
- The K-S test is generally most appropriate for moderate to large sample sizes
Parameter estimation:
- When using K-S test for normality with estimated parameters (mean and SD from your data), the test becomes conservative
- The Lilliefors correction or using bootstrap methods can improve accuracy
Visual assessment:
- The ECDF plot provides a visual representation of exactly where distributions differ
- Always supplement the test with visual methods (histograms, Q-Q plots)
Applications beyond normality:
- The K-S test can compare two samples or a sample against any theoretical distribution
- This makes it more versatile than tests specific to normality

Example 1: Testing Normality of Student Heights

A researcher wants to determine if student height measurements follow a normal distribution before performing parametric analyses.

Data (heights in cm, sample of 30 students): 165, 172, 168, 175, 171, 163, 169, 170, 178, 167, 173, 180, 166, 174, 172, 169, 177, 168, 173, 171, 169, 175, 172, 170, 174, 168, 176, 171, 173, 170

Analysis Steps:

Run Kolmogorov-Smirnov test:
- D = 0.0943, p = 0.9501
Calculate descriptive statistics:
- Mean = 171.3 cm
- Median = 171.0 cm
- Standard deviation = 3.97 cm
- Skewness = 0.115 (very slight positive skew)
- Kurtosis = -0.423 (slightly platykurtic)
Visual assessment:
- Histogram shows roughly bell-shaped distribution
- Q-Q plot points follow closely along the reference line
- ECDF plot shows small maximum deviation (D = 0.0943) between empirical and theoretical distributions

Results:

D = 0.0943, p = 0.9501
Interpretation: Since p > 0.05, we fail to reject the null hypothesis. There is insufficient evidence to claim that the height data deviates from a normal distribution.

How to Report: “The Kolmogorov-Smirnov test was conducted to evaluate the normality of student height data. Results indicated that the heights (M = 171.3 cm, SD = 3.97 cm) were approximately normally distributed, D(30) = 0.0943, p = 0.95. Visual inspection of the histogram and ECDF plot confirmed this conclusion.”

Example 2: Testing Normality of Response Times

A psychologist wants to check if reaction time data from an experiment follows a normal distribution.

Data Summary:

Sample size: 40 participants
Kolmogorov-Smirnov test: D = 0.1862, p = 0.0012
Mean reaction time: 342 ms
Median reaction time: 328 ms
Skewness: 1.25 (positive skew)

Results:

D = 0.1862, p = 0.0012
Interpretation: Since p < 0.05, we reject the null hypothesis. There is significant evidence that the reaction time data deviates from a normal distribution.

How to Report: “The Kolmogorov-Smirnov test indicated that reaction times were not normally distributed, D(40) = 0.1862, p = 0.0012. The ECDF plot revealed that the maximum deviation occurred in the lower tail of the distribution, with the empirical distribution showing more values in the lower range than would be expected in a normal distribution. This positive skew (skewness = 1.25) suggests that most responses were relatively fast, with fewer but more extreme slow responses.”

How to Report Kolmogorov-Smirnov Test Results

When reporting the results of a Kolmogorov-Smirnov test in academic papers or research reports, include the following elements:

"The Kolmogorov-Smirnov test was used to assess the normality of [variable]. Results indicated 
that the data [was/was not] normally distributed, D([sample size]) = [D statistic], p = [p-value]."

For example:

"The Kolmogorov-Smirnov test was used to assess the normality of blood pressure measurements. Results indicated 
that the data was normally distributed, D(45) = 0.091, p = 0.847."

Additional information to consider including:

Descriptive statistics (mean, median, standard deviation)
Maximum deviation location (where in the distribution the D statistic occurred)
Brief description of visual assessment (histogram shape, ECDF plot)
Critical D value based on sample size and significance level

APA Style Reporting

For APA style papers (7th edition), report the Kolmogorov-Smirnov test results as follows:

We conducted a Kolmogorov-Smirnov test to examine whether [variable] was normally distributed. 
Results indicated that [variable] [was/was not] approximately normally distributed, 
D([sample size]) = [D statistic], p = [p-value].

Reporting in Tables

When reporting multiple Kolmogorov-Smirnov test results in a table, include these columns:

Variable tested
Sample size
D statistic
Critical D value
p-value
Normality conclusion (Yes/No based on significance level)

Test Your Understanding

What does the D statistic in the Kolmogorov-Smirnov test represent?
- 1. The variance between groups
- 1. The maximum vertical distance between the empirical and theoretical CDFs
- 1. The correlation between the sample and a normal distribution
- 1. The difference between the sample mean and the population mean
What range of values can the Kolmogorov-Smirnov D statistic take?
- 1. -1 to +1
- 1. 0 to 1
- 1. 0 to infinity
- 1. -infinity to +infinity
A researcher finds D = 0.12, p = 0.08 when testing a dataset. What can they conclude?
- 1. The data is significantly different from a normal distribution
- 1. The data is perfectly normal
- 1. There is not enough evidence to conclude the data deviates from normality
- 1. The sample size is too small for the test to be valid
For which sample size is the Kolmogorov-Smirnov test generally most appropriate?
- 1. n = 8
- 1. n = 30
- 1. n = 3
- 1. n = 2
What happens to the critical D value as sample size increases?
- 1. It increases
- 1. It decreases
- 1. It remains the same
- 1. It becomes more variable

Answers: 1-B, 2-B, 3-C, 4-B, 5-B

Common Questions About the Kolmogorov-Smirnov Test

What’s the difference between the Kolmogorov-Smirnov test and the Shapiro-Wilk test?

The Kolmogorov-Smirnov (K-S) test measures the maximum distance between your data’s empirical CDF and a theoretical normal CDF, while the Shapiro-Wilk test is based on the correlation between your data and normal scores. The Shapiro-Wilk test is generally more powerful for small to medium samples, while the K-S test is easier to visualize and can be used to compare any two distributions, not just testing for normality.

What sample size is appropriate for the Kolmogorov-Smirnov test?

The K-S test can be used with a wide range of sample sizes, but it works best with moderate to large samples (n ≥ 20). For very small samples, the Shapiro-Wilk test is generally more powerful. However, the K-S test has no upper limit on sample size, making it useful for very large datasets where other tests might have computational limitations.

What does it mean if my Kolmogorov-Smirnov test is significant (p < 0.05)?

A significant result (p < 0.05) indicates that your data significantly deviates from a normal distribution. The D statistic and ECDF plot can show exactly where this deviation occurs (often in the tails or center of the distribution). This suggests you should either transform your data to achieve normality or consider using non-parametric statistical methods.

Can I use the Kolmogorov-Smirnov test to compare two samples?

Yes, the two-sample K-S test is specifically designed to compare two empirical distributions to determine if they come from the same distribution. This variant doesn’t require specifying a theoretical distribution and is useful for comparing groups without assuming a particular distribution shape. Our calculator focuses on the one-sample test for normality, but the two-sample test uses the same fundamental approach of finding the maximum difference between cumulative distributions.

Is the Kolmogorov-Smirnov test always accurate for testing normality?

The standard K-S test can be conservative when testing for normality with estimated parameters (when you use the sample mean and standard deviation). The Lilliefors correction addresses this issue specifically for normality testing. Without this correction, the K-S test might fail to reject the null hypothesis even when the data is not normal. For this reason, the Shapiro-Wilk test is often preferred specifically for normality testing, while the K-S test has broader applications for distribution comparisons.

What should I do if my data is not normally distributed according to the K-S test?

If your data is not normally distributed, you have several options: 1. Transform the data using methods like log, square root, or Box-Cox transformations 2. Use non-parametric tests that don’t assume normality 3. Continue with parametric tests if your sample size is large enough to rely on the Central Limit Theorem 4. Use robust statistical methods that are less sensitive to violations of normality

The K-S test can help guide this decision by showing where the deviation from normality occurs, which might suggest appropriate transformations.

Examples of When to Use the Kolmogorov-Smirnov Test

Before parametric tests: To verify distribution assumptions for t-tests, ANOVA, or regression
Quality control: To check if production measurements follow a specified distribution
Financial analysis: To test if returns follow a normal or other theoretical distribution
Machine learning: To verify distribution assumptions in various algorithms
Environmental monitoring: To check if pollution measurements follow expected distributions
Comparing distributions: To test if two samples come from the same distribution
Model validation: To check if residuals follow a normal distribution
Time series analysis: To test distributional assumptions of error terms
Simulation studies: To verify if generated random numbers follow the intended distribution
After transformations: To check if transformed data now follows a normal distribution

References

Kolmogorov, A. N. (1933). Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Istituto Italiano degli Attuari, 4, 83-91.
Smirnov, N. (1948). Table for estimating the goodness of fit of empirical distributions. The Annals of Mathematical Statistics, 19(2), 279-281.
Massey, F. J. (1951). The Kolmogorov-Smirnov test for goodness of fit. Journal of the American Statistical Association, 46(253), 68-78.
Lilliefors, H. W. (1967). On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 62(318), 399-402.
Stephens, M. A. (1974). EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association, 69(347), 730-737.
Razali, N. M., & Wah, Y. B. (2011). Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. Journal of Statistical Modeling and Analytics, 2(1), 21-33.

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{kassambara2025,
  author = {Kassambara, Alboukadel},
  title = {Kolmogorov-Smirnov {Normality} {Test} {Calculator} \textbar{}
    {Check} {Data} {Distribution}},
  date = {2025-04-10},
  url = {https://www.datanovia.com/apps/statfusion/analysis/inferential/goodness-fit/normality/kolmogorov-smirnov-normality-test.html},
  langid = {en}
}

For attribution, please cite this work as:

Kassambara, Alboukadel. 2025. “Kolmogorov-Smirnov Normality Test Calculator | Check Data Distribution.” April 10, 2025. https://www.datanovia.com/apps/statfusion/analysis/inferential/goodness-fit/normality/kolmogorov-smirnov-normality-test.html.