flowchart TB A[Your Data] --> B[Calculate Empirical\nCumulative Distribution] C[Normal Distribution\nwith same mean and SD] --> D[Calculate Theoretical\nCumulative Distribution] B --> E[Find Maximum Vertical Distance\nbetween distributions] D --> E E --> F[Compare D statistic\nto critical value] F --> G{p < 0.05?} G --> |Yes| H[Reject H₀\nData is not normal] G --> |No| I[Fail to reject H₀\nData may be normal]
Key Takeaways: Kolmogorov-Smirnov Normality Test
- Purpose: Test whether data follows a normal distribution or compare two distributions
- When to use: For checking normality, especially with larger sample sizes
- What it measures: Maximum vertical distance between empirical and theoretical cumulative distributions
- Null hypothesis: The data follows a normal distribution (\(H_0\): data is normal)
- Alternative hypothesis: The data does not follow a normal distribution (\(H_1\): data is not normal)
- Interpretation: If p < 0.05, the data significantly deviates from normality
- Advantages: Works well for large samples; can be used to compare any two distributions
- Visual component: Shows exactly where distributions differ the most
What is the Kolmogorov-Smirnov Test?
The Kolmogorov-Smirnov (K-S) test is a nonparametric statistical method that measures the maximum difference between an empirical distribution function and a theoretical distribution function. When used as a normality test, it compares your data to a normal distribution to determine if your data could reasonably have been drawn from a normal distribution.
When to use the Kolmogorov-Smirnov normality test:
- When testing if data follows a normal distribution, especially with larger samples
- When comparing two sample distributions to see if they differ
- When you need to visualize exactly where distributions differ
- As an alternative to the Shapiro-Wilk test, particularly with larger datasets
- When checking assumptions for parametric statistical tests
This online calculator allows you to quickly perform a Kolmogorov-Smirnov normality test, visualize your data distribution, and interpret the results with confidence.
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 1300
library(shiny)
library(bslib)
library(ggplot2)
library(bsicons)
library(vroom)
library(shinyjs)
ui <- page_sidebar(
title = "Kolmogorov-Smirnov Normality Test",
useShinyjs(), # Enable shinyjs for dynamic UI updates
sidebar = sidebar(
width = 400,
card(
card_header("Data Input"),
accordion(
accordion_panel(
"Manual Input",
textAreaInput("data_input", "Enter your data (one value per row):", rows = 8,
placeholder = "Paste values here..."),
div(
actionLink("use_example", "Use example data", style = "color:#0275d8;"),
tags$span(bs_icon("file-earmark-text"), style = "margin-left: 5px; color: #0275d8;")
)
),
accordion_panel(
"File Upload",
fileInput("file_upload", "Upload CSV or TXT file:",
accept = c("text/csv", "text/plain", ".csv", ".txt")),
checkboxInput("header", "File has header", TRUE),
conditionalPanel(
condition = "output.file_uploaded",
div(
selectInput("selected_var", "Select variable:", choices = NULL),
actionButton("clear_file", "Clear File", class = "btn-danger btn-sm")
)
)
),
id = "input_method",
open = 1
),
# Advanced Options accordion
accordion(
accordion_panel(
"Advanced Options",
card(
card_header("Significance Level:"),
card_body(
sliderInput("alpha", NULL, min = 0.01, max = 0.10, value = 0.05, step = 0.01)
)
),
card(
card_header("Test Options:"),
card_body(
selectInput("alternative", "Alternative Hypothesis:",
choices = c("Two-sided" = "two.sided",
"Less than normal" = "less",
"Greater than normal" = "greater"),
selected = "two.sided"),
checkboxInput("exact", "Use exact p-values when possible", TRUE)
)
),
card(
card_header("Plot Options:"),
card_body(
checkboxInput("show_density", "Show density curve", TRUE),
checkboxInput("show_normal", "Show normal curve", TRUE),
sliderInput("bins", "Number of histogram bins:", min = 5, max = 50, value = 20)
)
)
),
open = FALSE
),
actionButton("run_test", "Run Test", class = "btn btn-primary")
),
hr(),
card(
card_header("Interpretation"),
card_body(
div(class = "alert alert-info",
tags$ul(
tags$li(tags$b("Null hypothesis (H₀):"), " The data follows a normal distribution."),
tags$li(tags$b("Alternative hypothesis (H₁):"), " The data does not follow a normal distribution."),
tags$li("If p-value ≥ 0.05, there is not enough evidence to reject normality."),
tags$li("If p-value < 0.05, the data significantly deviates from a normal distribution."),
tags$li("The Kolmogorov-Smirnov test compares your data's cumulative distribution to a theoretical normal distribution.")
)
)
)
)
),
layout_column_wrap(
width = 1,
card(
card_header("Test Results"),
card_body(
navset_tab(
nav_panel("Results", uiOutput("error_message"), verbatimTextOutput("test_results")),
nav_panel("Explanation", div(style = "font-size: 0.9rem;",
p("The Kolmogorov-Smirnov test compares your data to a normal distribution:"),
tags$ul(
tags$li("It measures the maximum vertical distance (D statistic) between the empirical cumulative distribution function (ECDF) of your data and the cumulative distribution function (CDF) of a normal distribution."),
tags$li("Unlike the Shapiro-Wilk test, it works well for larger sample sizes."),
tags$li("The test is less powerful than Shapiro-Wilk for small to medium samples."),
tags$li("Visual inspection (histograms, Q-Q plots, and ECDF plots) should always supplement this test.")
)
))
)
)
),
card(
card_header("Visual Assessment"),
card_body(
navset_tab(
nav_panel("Histogram",
navset_tab(
nav_panel("Plot", plotOutput("histogram")),
nav_panel("Explanation", div(style = "font-size: 0.9rem;",
p("The histogram helps visualize the shape of your data distribution:"),
tags$ul(
tags$li("For normal data, the histogram should appear approximately bell-shaped and symmetric."),
tags$li("The red curve shows the kernel density estimate of your data."),
tags$li("The blue dashed curve shows a normal distribution with the same mean and standard deviation as your data."),
tags$li("Compare these curves to assess normality visually.")
)
))
)
),
nav_panel("Q-Q Plot",
navset_tab(
nav_panel("Plot", plotOutput("qqplot")),
nav_panel("Explanation", div(style = "font-size: 0.9rem;",
p("The Q-Q (Quantile-Quantile) plot compares your data's quantiles against theoretical quantiles from a normal distribution:"),
tags$ul(
tags$li("If points closely follow the diagonal reference line, the data is approximately normal."),
tags$li("Systematic deviations from the line indicate non-normality."),
tags$li("Curves at the ends suggest heavy or light tails."),
tags$li("S-shaped patterns indicate skewness.")
)
))
)
),
nav_panel("ECDF Plot",
navset_tab(
nav_panel("Plot", plotOutput("ecdf_plot")),
nav_panel("Explanation", div(style = "font-size: 0.9rem;",
p("The ECDF (Empirical Cumulative Distribution Function) plot is central to the Kolmogorov-Smirnov test:"),
tags$ul(
tags$li("The solid line shows your data's empirical cumulative distribution."),
tags$li("The dashed line shows the cumulative distribution function of a normal distribution."),
tags$li("The maximum vertical distance between these lines is the D statistic used in the test."),
tags$li("For normal data, these lines should be very close to each other.")
)
))
)
)
)
)
)
)
)
server <- function(input, output, session) {
# Example data
example_data <- "8.44\n7.16\n16.94\n9.59\n13.25\n12.94\n11\n5.61\n10.6\n12.81"
# Track input method
input_method <- reactiveVal("manual")
# Function to clear file inputs
clear_file_inputs <- function() {
updateSelectInput(session, "selected_var", choices = NULL)
reset("file_upload")
}
# Function to clear text inputs
clear_text_inputs <- function() {
updateTextAreaInput(session, "data_input", value = "")
}
# When example data is used, clear file inputs and set text inputs
observeEvent(input$use_example, {
input_method("manual")
clear_file_inputs()
updateTextAreaInput(session, "data_input", value = example_data)
})
# When file is uploaded, clear text inputs and set file method
observeEvent(input$file_upload, {
if (!is.null(input$file_upload)) {
input_method("file")
clear_text_inputs()
}
})
# When clear file button is clicked, clear file and set manual method
observeEvent(input$clear_file, {
input_method("manual")
clear_file_inputs()
})
# When text input changes, clear file inputs if it has content
observeEvent(input$data_input, {
if (!is.null(input$data_input) && nchar(input$data_input) > 0) {
input_method("manual")
clear_file_inputs()
}
}, ignoreInit = TRUE)
# Process uploaded file
file_data <- reactive({
req(input$file_upload)
tryCatch({
vroom::vroom(input$file_upload$datapath, delim = NULL, col_names = input$header, show_col_types = FALSE)
}, error = function(e) {
showNotification(paste("File read error:", e$message), type = "error")
NULL
})
})
# Update variable selection dropdown with numeric columns from uploaded file
observe({
df <- file_data()
if (!is.null(df)) {
num_vars <- names(df)[sapply(df, is.numeric)]
updateSelectInput(session, "selected_var", choices = num_vars)
}
})
output$file_uploaded <- reactive({
!is.null(input$file_upload)
})
outputOptions(output, "file_uploaded", suspendWhenHidden = FALSE)
# Function to parse text input
parse_text_input <- function(text) {
if (is.null(text) || text == "") return(NULL)
input_lines <- strsplit(text, "\\r?\\n")[[1]]
input_lines <- input_lines[input_lines != ""]
numeric_values <- suppressWarnings(as.numeric(input_lines))
if (all(is.na(numeric_values))) return(NULL)
return(na.omit(numeric_values))
}
# Get data values based on input method
data_values <- reactive({
if (input_method() == "file" && !is.null(file_data()) && !is.null(input$selected_var)) {
df <- file_data()
return(na.omit(df[[input$selected_var]]))
} else {
return(parse_text_input(input$data_input))
}
})
# Validate input data
validate_data <- reactive({
values <- data_values()
if (is.null(values)) {
return("Error: Please check your input. Make sure all values are numeric.")
}
if (length(values) < 5) {
return("Error: At least 5 values are recommended for the Kolmogorov-Smirnov test.")
}
if (length(unique(values)) == 1) {
return("Error: All values are identical. The Kolmogorov-Smirnov test requires variation in the data.")
}
return(NULL)
})
# Display error messages
output$error_message <- renderUI({
error <- validate_data()
if (!is.null(error) && input$run_test > 0) {
if (startsWith(error, "Warning")) {
div(class = "alert alert-warning", error)
} else {
div(class = "alert alert-danger", error)
}
}
})
# Run the Kolmogorov-Smirnov test
test_result <- eventReactive(input$run_test, {
error <- validate_data()
if (!is.null(error) && startsWith(error, "Error")) return(NULL)
values <- data_values()
# Normalize the data for standard normal distribution comparison
z_values <- (values - mean(values)) / sd(values)
# Run the KS test against normal distribution
ks.test(values, "pnorm", mean = mean(values), sd = sd(values),
alternative = input$alternative, exact = input$exact)
})
# Calculate critical D value for the KS test
critical_d <- reactive({
req(test_result())
values <- data_values()
n <- length(values)
# Calculate critical D value at alpha significance level
# Formula depends on sample size and significance level
if (input$alternative == "two.sided") {
# For two-sided test
if (n > 35) {
# For large samples, asymptotic formula
return(sqrt(-0.5 * log(input$alpha / 2) / n))
} else {
# Approximate for small samples
return(ifelse(input$alpha == 0.05, 1.36 / sqrt(n),
ifelse(input$alpha == 0.01, 1.63 / sqrt(n), 1.22 / sqrt(n))))
}
} else {
# For one-sided test
if (n > 35) {
# For large samples, asymptotic formula
return(sqrt(-0.5 * log(input$alpha) / n))
} else {
# Approximate for small samples
return(ifelse(input$alpha == 0.05, 1.22 / sqrt(n),
ifelse(input$alpha == 0.01, 1.52 / sqrt(n), 1.07 / sqrt(n))))
}
}
})
# Display test results
output$test_results <- renderPrint({
if (is.null(test_result())) return(NULL)
result <- test_result()
values <- data_values()
# Calculate skewness and kurtosis if e1071 is available
skew_val <- tryCatch({
e1071::skewness(values)
}, error = function(e) {
NA
})
kurt_val <- tryCatch({
e1071::kurtosis(values)
}, error = function(e) {
NA
})
cat("Kolmogorov-Smirnov Normality Test Results:\n\n")
cat("D statistic:", round(result$statistic, 4), "\n")
cat("p-value:", format.pval(result$p.value, digits = 4), "\n\n")
cat("Data Summary:\n")
cat("Sample size:", length(values), "\n")
cat("Mean:", round(mean(values), 4), "\n")
cat("Median:", round(median(values), 4), "\n")
cat("Standard deviation:", round(sd(values), 4), "\n")
if (!is.na(skew_val)) {
cat("Skewness:", round(skew_val, 4), "\n")
}
if (!is.na(kurt_val)) {
cat("Kurtosis:", round(kurt_val, 4), "\n")
}
cat("\nCritical value (D critical at α =", input$alpha, "):", round(critical_d(), 4), "\n\n")
cat("Test Interpretation:\n")
if (result$p.value < input$alpha) {
cat("The p-value (", format.pval(result$p.value, digits = 4),
") is less than the significance level (", input$alpha, ").\n", sep = "")
cat("We reject the null hypothesis. There is significant evidence\n")
cat("to suggest the data does not follow a normal distribution.")
} else {
cat("The p-value (", format.pval(result$p.value, digits = 4),
") is greater than or equal to the significance level (", input$alpha, ").\n", sep = "")
cat("We fail to reject the null hypothesis. There is not enough evidence\n")
cat("to suggest the data deviates from a normal distribution.")
}
})
# Generate histogram
output$histogram <- renderPlot({
req(input$run_test > 0)
error <- validate_data()
if (!is.null(error) && startsWith(error, "Error")) return(NULL)
values <- data_values()
p <- ggplot(data.frame(x = values), aes(x = x)) +
geom_histogram(aes(y = ..density..), bins = input$bins,
fill = "#5dade2", color = "#2874a6", alpha = 0.7) +
labs(title = "Distribution of Data",
subtitle = paste("Kolmogorov-Smirnov test: D =", round(test_result()$statistic, 4),
", p =", format.pval(test_result()$p.value, digits = 4)),
x = "Value", y = "Density") +
theme_minimal(base_size = 14)
if (input$show_density) {
p <- p + geom_density(color = "#c0392b", linewidth = 1.2)
}
if (input$show_normal) {
p <- p + stat_function(fun = dnorm, args = list(mean = mean(values), sd = sd(values)),
color = "#2471a3", linewidth = 1.2, linetype = "dashed")
}
p
})
# Generate Q-Q plot
output$qqplot <- renderPlot({
req(input$run_test > 0)
error <- validate_data()
if (!is.null(error) && startsWith(error, "Error")) return(NULL)
values <- data_values()
ggplot(data.frame(x = values), aes(sample = x)) +
stat_qq() +
stat_qq_line(color = "#c0392b") +
labs(title = "Normal Q-Q Plot",
subtitle = paste("Kolmogorov-Smirnov test: D =", round(test_result()$statistic, 4),
", p =", format.pval(test_result()$p.value, digits = 4)),
x = "Theoretical Quantiles",
y = "Sample Quantiles") +
theme_minimal(base_size = 14)
})
# Generate ECDF plot
output$ecdf_plot <- renderPlot({
req(input$run_test > 0)
error <- validate_data()
if (!is.null(error) && startsWith(error, "Error")) return(NULL)
values <- data_values()
# Create data frame for plotting
n <- length(values)
sorted_values <- sort(values)
# Calculate empirical CDF
ecdf_df <- data.frame(
x = sorted_values,
y = seq(1/n, 1, by = 1/n)
)
# Calculate theoretical normal CDF
theoretical_x <- seq(min(values) - sd(values), max(values) + sd(values), length.out = 100)
theoretical_df <- data.frame(
x = theoretical_x,
y = pnorm(theoretical_x, mean = mean(values), sd = sd(values))
)
# Calculate the max difference point (D statistic)
# This is an approximation for visualization
theoretical_y_at_observed <- pnorm(sorted_values, mean = mean(values), sd = sd(values))
diff_df <- data.frame(
x = sorted_values,
empirical_y = seq(1/n, 1, by = 1/n),
theoretical_y = theoretical_y_at_observed,
diff = abs(seq(1/n, 1, by = 1/n) - theoretical_y_at_observed)
)
max_diff_point <- diff_df[which.max(diff_df$diff), ]
# Plot
ggplot() +
# Empirical CDF
geom_step(data = ecdf_df, aes(x = x, y = y), color = "#c0392b", linewidth = 1.2) +
# Theoretical normal CDF
geom_line(data = theoretical_df, aes(x = x, y = y),
color = "#2471a3", linewidth = 1.2, linetype = "dashed") +
# Maximum difference point
geom_segment(aes(x = max_diff_point$x, y = max_diff_point$empirical_y,
xend = max_diff_point$x, yend = max_diff_point$theoretical_y),
color = "#27ae60", linewidth = 1, linetype = "dotted") +
geom_point(aes(x = max_diff_point$x, y = max_diff_point$empirical_y),
color = "#c0392b", size = 3) +
geom_point(aes(x = max_diff_point$x, y = max_diff_point$theoretical_y),
color = "#2471a3", size = 3) +
# Labels
labs(title = "Empirical vs Theoretical Normal CDF",
subtitle = paste("D statistic =", round(test_result()$statistic, 4),
"(maximum vertical distance)"),
x = "Value",
y = "Cumulative Probability") +
annotate("text", x = max_diff_point$x + 0.5*sd(values),
y = mean(c(max_diff_point$empirical_y, max_diff_point$theoretical_y)),
label = paste("D =", round(test_result()$statistic, 4)),
color = "#27ae60", fontface = "bold") +
theme_minimal(base_size = 14)
})
}
shinyApp(ui = ui, server = server)
How the Kolmogorov-Smirnov Test Works
The Kolmogorov-Smirnov test compares your data’s cumulative distribution function to a specified theoretical distribution (typically the normal distribution for normality testing).
Mathematical Procedure
Calculate the empirical cumulative distribution function (ECDF) of your data:
\[F_n(x) = \frac{1}{n}\sum_{i=1}^{n} I_{X_i \leq x}\]
Where \(I_{X_i \leq x}\) is the indicator function equal to 1 if \(X_i \leq x\) and 0 otherwise.
Calculate the theoretical cumulative distribution function (CDF) of the normal distribution:
\[F(x) = \Phi\left(\frac{x - \mu}{\sigma}\right)\]
Where \(\Phi\) is the standard normal CDF, \(\mu\) is the sample mean, and \(\sigma\) is the sample standard deviation.
Calculate the test statistic D (maximum vertical distance):
\[D = \max_x |F_n(x) - F(x)|\]
Calculate p-value by comparing the D statistic to its sampling distribution
Make a decision:
- If p < \(\alpha\) (typically 0.05): Reject the null hypothesis (data is not normal)
- If p ≥ \(\alpha\): Fail to reject the null hypothesis (data may be normal)
Kolmogorov-Smirnov vs. Other Normality Tests
The K-S test has different properties compared to other normality tests:
Test | Strengths | Limitations |
---|---|---|
Kolmogorov-Smirnov | Works well for large samples; shows where distributions differ; can compare any distributions | Less powerful for small samples; may be conservative when parameters are estimated |
Shapiro-Wilk | Most powerful for small to medium samples | Maximum sample size limitations; complex calculation |
Anderson-Darling | More sensitive to deviations in the tails | Complex calculation; less intuitive interpretation |
Lilliefors | Modification of K-S specifically for normality testing | Still not as powerful as Shapiro-Wilk for smaller samples |
D’Agostino-Pearson | Based on skewness and kurtosis; good overall power | Requires larger samples for reliable results |
Important Considerations
- Sample size effects:
- For very small samples (n < 5), the test has low power
- With large samples (n > 300), even minor deviations from normality may be detected as significant
- The K-S test is generally most appropriate for moderate to large sample sizes
- Parameter estimation:
- When using K-S test for normality with estimated parameters (mean and SD from your data), the test becomes conservative
- The Lilliefors correction or using bootstrap methods can improve accuracy
- Visual assessment:
- The ECDF plot provides a visual representation of exactly where distributions differ
- Always supplement the test with visual methods (histograms, Q-Q plots)
- Applications beyond normality:
- The K-S test can compare two samples or a sample against any theoretical distribution
- This makes it more versatile than tests specific to normality
Example 1: Testing Normality of Student Heights
A researcher wants to determine if student height measurements follow a normal distribution before performing parametric analyses.
Data (heights in cm, sample of 30 students): 165, 172, 168, 175, 171, 163, 169, 170, 178, 167, 173, 180, 166, 174, 172, 169, 177, 168, 173, 171, 169, 175, 172, 170, 174, 168, 176, 171, 173, 170
Analysis Steps:
- Run Kolmogorov-Smirnov test:
- D = 0.0943, p = 0.9501
- Calculate descriptive statistics:
- Mean = 171.3 cm
- Median = 171.0 cm
- Standard deviation = 3.97 cm
- Skewness = 0.115 (very slight positive skew)
- Kurtosis = -0.423 (slightly platykurtic)
- Visual assessment:
- Histogram shows roughly bell-shaped distribution
- Q-Q plot points follow closely along the reference line
- ECDF plot shows small maximum deviation (D = 0.0943) between empirical and theoretical distributions
Results:
- D = 0.0943, p = 0.9501
- Interpretation: Since p > 0.05, we fail to reject the null hypothesis. There is insufficient evidence to claim that the height data deviates from a normal distribution.
How to Report: “The Kolmogorov-Smirnov test was conducted to evaluate the normality of student height data. Results indicated that the heights (M = 171.3 cm, SD = 3.97 cm) were approximately normally distributed, D(30) = 0.0943, p = 0.95. Visual inspection of the histogram and ECDF plot confirmed this conclusion.”
Example 2: Testing Normality of Response Times
A psychologist wants to check if reaction time data from an experiment follows a normal distribution.
Data Summary:
- Sample size: 40 participants
- Kolmogorov-Smirnov test: D = 0.1862, p = 0.0012
- Mean reaction time: 342 ms
- Median reaction time: 328 ms
- Skewness: 1.25 (positive skew)
Results:
- D = 0.1862, p = 0.0012
- Interpretation: Since p < 0.05, we reject the null hypothesis. There is significant evidence that the reaction time data deviates from a normal distribution.
How to Report: “The Kolmogorov-Smirnov test indicated that reaction times were not normally distributed, D(40) = 0.1862, p = 0.0012. The ECDF plot revealed that the maximum deviation occurred in the lower tail of the distribution, with the empirical distribution showing more values in the lower range than would be expected in a normal distribution. This positive skew (skewness = 1.25) suggests that most responses were relatively fast, with fewer but more extreme slow responses.”
How to Report Kolmogorov-Smirnov Test Results
When reporting the results of a Kolmogorov-Smirnov test in academic papers or research reports, include the following elements:
[variable]. Results indicated
"The Kolmogorov-Smirnov test was used to assess the normality of [was/was not] normally distributed, D([sample size]) = [D statistic], p = [p-value]." that the data
For example:
"The Kolmogorov-Smirnov test was used to assess the normality of blood pressure measurements. Results indicated that the data was normally distributed, D(45) = 0.091, p = 0.847."
Additional information to consider including:
- Descriptive statistics (mean, median, standard deviation)
- Maximum deviation location (where in the distribution the D statistic occurred)
- Brief description of visual assessment (histogram shape, ECDF plot)
- Critical D value based on sample size and significance level
APA Style Reporting
For APA style papers (7th edition), report the Kolmogorov-Smirnov test results as follows:
[variable] was normally distributed.
We conducted a Kolmogorov-Smirnov test to examine whether [variable] [was/was not] approximately normally distributed,
Results indicated that [sample size]) = [D statistic], p = [p-value]. D(
Reporting in Tables
When reporting multiple Kolmogorov-Smirnov test results in a table, include these columns:
- Variable tested
- Sample size
- D statistic
- Critical D value
- p-value
- Normality conclusion (Yes/No based on significance level)
Test Your Understanding
- What does the D statistic in the Kolmogorov-Smirnov test represent?
- The variance between groups
- The maximum vertical distance between the empirical and theoretical CDFs
- The correlation between the sample and a normal distribution
- The difference between the sample mean and the population mean
- What range of values can the Kolmogorov-Smirnov D statistic take?
- -1 to +1
- 0 to 1
- 0 to infinity
- -infinity to +infinity
- A researcher finds D = 0.12, p = 0.08 when testing a dataset. What can they conclude?
- The data is significantly different from a normal distribution
- The data is perfectly normal
- There is not enough evidence to conclude the data deviates from normality
- The sample size is too small for the test to be valid
- For which sample size is the Kolmogorov-Smirnov test generally most appropriate?
- n = 8
- n = 30
- n = 3
- n = 2
- What happens to the critical D value as sample size increases?
- It increases
- It decreases
- It remains the same
- It becomes more variable
Answers: 1-B, 2-B, 3-C, 4-B, 5-B
Common Questions About the Kolmogorov-Smirnov Test
The Kolmogorov-Smirnov (K-S) test measures the maximum distance between your data’s empirical CDF and a theoretical normal CDF, while the Shapiro-Wilk test is based on the correlation between your data and normal scores. The Shapiro-Wilk test is generally more powerful for small to medium samples, while the K-S test is easier to visualize and can be used to compare any two distributions, not just testing for normality.
The K-S test can be used with a wide range of sample sizes, but it works best with moderate to large samples (n ≥ 20). For very small samples, the Shapiro-Wilk test is generally more powerful. However, the K-S test has no upper limit on sample size, making it useful for very large datasets where other tests might have computational limitations.
A significant result (p < 0.05) indicates that your data significantly deviates from a normal distribution. The D statistic and ECDF plot can show exactly where this deviation occurs (often in the tails or center of the distribution). This suggests you should either transform your data to achieve normality or consider using non-parametric statistical methods.
Yes, the two-sample K-S test is specifically designed to compare two empirical distributions to determine if they come from the same distribution. This variant doesn’t require specifying a theoretical distribution and is useful for comparing groups without assuming a particular distribution shape. Our calculator focuses on the one-sample test for normality, but the two-sample test uses the same fundamental approach of finding the maximum difference between cumulative distributions.
The standard K-S test can be conservative when testing for normality with estimated parameters (when you use the sample mean and standard deviation). The Lilliefors correction addresses this issue specifically for normality testing. Without this correction, the K-S test might fail to reject the null hypothesis even when the data is not normal. For this reason, the Shapiro-Wilk test is often preferred specifically for normality testing, while the K-S test has broader applications for distribution comparisons.
If your data is not normally distributed, you have several options: 1. Transform the data using methods like log, square root, or Box-Cox transformations 2. Use non-parametric tests that don’t assume normality 3. Continue with parametric tests if your sample size is large enough to rely on the Central Limit Theorem 4. Use robust statistical methods that are less sensitive to violations of normality
The K-S test can help guide this decision by showing where the deviation from normality occurs, which might suggest appropriate transformations.
Examples of When to Use the Kolmogorov-Smirnov Test
- Before parametric tests: To verify distribution assumptions for t-tests, ANOVA, or regression
- Quality control: To check if production measurements follow a specified distribution
- Financial analysis: To test if returns follow a normal or other theoretical distribution
- Machine learning: To verify distribution assumptions in various algorithms
- Environmental monitoring: To check if pollution measurements follow expected distributions
- Comparing distributions: To test if two samples come from the same distribution
- Model validation: To check if residuals follow a normal distribution
- Time series analysis: To test distributional assumptions of error terms
- Simulation studies: To verify if generated random numbers follow the intended distribution
- After transformations: To check if transformed data now follows a normal distribution
References
- Kolmogorov, A. N. (1933). Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Istituto Italiano degli Attuari, 4, 83-91.
- Smirnov, N. (1948). Table for estimating the goodness of fit of empirical distributions. The Annals of Mathematical Statistics, 19(2), 279-281.
- Massey, F. J. (1951). The Kolmogorov-Smirnov test for goodness of fit. Journal of the American Statistical Association, 46(253), 68-78.
- Lilliefors, H. W. (1967). On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 62(318), 399-402.
- Stephens, M. A. (1974). EDF statistics for goodness of fit and some comparisons. Journal of the American Statistical Association, 69(347), 730-737.
- Razali, N. M., & Wah, Y. B. (2011). Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. Journal of Statistical Modeling and Analytics, 2(1), 21-33.
Reuse
Citation
@online{kassambara2025,
author = {Kassambara, Alboukadel},
title = {Kolmogorov-Smirnov {Normality} {Test} {Calculator} \textbar{}
{Check} {Data} {Distribution}},
date = {2025-04-10},
url = {https://www.datanovia.com/apps/statfusion/analysis/inferential/goodness-fit/normality/kolmogorov-smirnov-normality-test.html},
langid = {en}
}