Non-Parametric Statistical Tests
Statistical tools for analyzing data that doesn’t meet the assumptions of parametric tests. StatFusion’s non-parametric calculators provide robust analysis options for non-normal data, ordinal measurements, or small samples.
Non-parametric tests make fewer assumptions about your data than their parametric counterparts. They’re essential when working with data that doesn’t follow a normal distribution, contains outliers, uses ordinal scales, or comes from small samples. These distribution-free methods provide valid statistical inference even when parametric assumptions are violated.
What Are Non-Parametric Tests?
Non-parametric tests (also called distribution-free methods) are statistical procedures that don’t require specific assumptions about the underlying population distribution. Unlike parametric tests like t-tests and ANOVA that assume normal distribution, non-parametric tests typically:
- Work with ranks rather than raw values
- Make fewer assumptions about the shape of distributions
- Are robust to outliers and skewed data
- Can be used with ordinal data and small samples
- Maintain valid results when parametric assumptions are violated
While non-parametric tests are generally less powerful than their parametric equivalents when parametric assumptions are met, they provide more reliable results when those assumptions are violated.
When to Use Non-Parametric Tests
Non-parametric methods are appropriate in several situations:
- Non-normal data: When your data significantly deviates from normal distribution
- Small sample sizes: When you have too few observations to verify normality
- Ordinal data: When data is measured on an ordinal scale (rankings, Likert scales)
- Presence of outliers: When data contains extreme values that can’t be removed
- Heterogeneous variances: When group variances differ substantially
Each non-parametric test is the alternative to a specific parametric test, designed for similar research questions but with fewer assumptions about the data.
Available Non-Parametric Tests
StatFusion offers a comprehensive suite of non-parametric tests organized by purpose and design.
One-Sample Tests
Tests for comparing a single sample to a hypothesized value or distribution.
Median Tests
Compare sample median to a hypothesized value:
- Wilcoxon Signed-Rank Test (Single) - Non-parametric alternative to one-sample t-test
- Sign Test - Test if median differs from a hypothesized value
Distribution Tests
Compare sample distribution to a theoretical distribution:
- Kolmogorov-Smirnov One-Sample Test - Compare sample to theoretical distribution
- Chi-Square Goodness-of-Fit Test - Test if categorical data follows expected proportions
Both tests compare a sample to a hypothesized median, but they differ in how they use the data:
Wilcoxon Signed-Rank Test: - Uses both the sign and magnitude of differences - More powerful when the distribution is symmetric - Non-parametric alternative to the one-sample t-test - Assumes symmetric distribution of differences (though not necessarily normal)
Sign Test: - Uses only the sign (direction) of differences, not their magnitude - Less powerful but makes fewer assumptions - More robust when the distribution is asymmetric - Appropriate when you can only determine if values are higher or lower than median
The Wilcoxon test is generally preferred unless the distribution is highly skewed or you’re working with ordinal data where only the direction (not magnitude) can be determined.
Two-Sample Tests
Tests for comparing two samples or groups.
Independent Samples Tests
For comparing two unrelated groups:
- Mann-Whitney U Test - Non-parametric alternative to independent t-test
- Wilcoxon Rank-Sum Test - Equivalent to the Mann-Whitney U test
- Kolmogorov-Smirnov Two-Sample Test - Compare shapes of distributions
- Mood’s Median Test - Compare medians with minimal assumptions
Paired Samples Tests
For comparing two related measurements:
- Wilcoxon Signed-Rank Test (Paired) - Non-parametric alternative to paired t-test
- Sign Test (Paired) - Simple test for paired differences
- McNemar’s Test - Compare paired proportions (binary outcomes)
The Mann-Whitney U test and Wilcoxon Rank-Sum test are mathematically equivalent procedures with different historical origins and slightly different computational approaches. Both:
- Compare distributions between two independent groups
- Use ranks rather than raw values
- Serve as non-parametric alternatives to the independent samples t-test
- Test whether one group tends to have larger values than the other
These tests are suitable when: - Data doesn’t meet normality assumptions - Sample sizes are small - Data contains outliers - You’re working with ordinal data
The default interpretation tests whether one population tends to have larger values than the other, though additional assumptions allow testing for differences in medians specifically.
The Wilcoxon Signed-Rank Test for paired samples is the non-parametric equivalent of the paired samples t-test. It’s used when:
- You have two related measurements (before/after, matched pairs)
- The differences don’t follow a normal distribution
- Your sample size is small
- The data contains outliers
The test works by: 1. Calculating the differences between paired measurements 2. Ranking the absolute differences 3. Summing the ranks for positive and negative differences 4. Using the smaller sum as the test statistic
A significant result indicates that one measurement tends to be consistently higher or lower than the other. While it doesn’t compare means specifically (like the t-test), it tells you about the consistent direction and magnitude of differences between paired observations.
Multiple-Sample Tests
Tests for comparing three or more groups or samples.
Independent Groups Tests
For comparing multiple unrelated groups:
- Kruskal-Wallis Test - Non-parametric alternative to one-way ANOVA
- Jonckheere-Terpstra Test - Test for ordered alternatives across groups
- Median Test - Compare medians across multiple groups
The Kruskal-Wallis test is the non-parametric alternative to one-way ANOVA, used when: - Comparing three or more independent groups - The data doesn’t follow normal distribution - There are outliers that can’t be removed - The data is ordinal or the variances are unequal
How it works: 1. All observations are ranked from lowest to highest, ignoring group membership 2. The ranks are summed within each group 3. The test statistic (H) is calculated based on these rank sums 4. A significant result indicates that at least one group differs from the others
Like ANOVA, a significant Kruskal-Wallis result doesn’t tell you which specific groups differ. You’ll need to perform post-hoc tests (like Dunn’s test) to identify the specific differences between groups.
The Kruskal-Wallis test is particularly useful in biomedical research, psychology, ecology, and other fields where data often doesn’t meet parametric assumptions.
The Friedman test is the non-parametric alternative to repeated measures ANOVA, used when: - The same subjects are measured under multiple conditions or time points - The data doesn’t follow normal distribution - The sample size is small - The measurements are ordinal
How it works: 1. Data is arranged in a table with subjects as rows and conditions as columns 2. Values are ranked across conditions separately for each subject 3. The test statistic is calculated based on the sums of ranks for each condition 4. A significant result indicates differences across conditions
The Friedman test is commonly used in: - Clinical trials measuring patient responses across multiple time points - Sensory evaluation studies comparing multiple products - Behavioral studies with repeated measurements under different conditions - Educational research tracking performance across different methods
Follow up a significant Friedman test with post-hoc tests (like Nemenyi’s test) to identify specific differences between conditions.
Post-Hoc Tests for Non-Parametric Analyses
Follow-up tests after finding significant differences in multiple-group non-parametric tests.
Post-Hoc for Kruskal-Wallis
Follow-up tests after significant Kruskal-Wallis results:
- Dunn’s Test - Pairwise comparisons after Kruskal-Wallis
- Conover-Iman Test - More powerful alternative to Dunn’s test
- Steel-Dwass-Critchlow-Fligner Test - Controls familywise error rate
Post-Hoc for Friedman
Follow-up tests after significant Friedman results:
- Nemenyi Test - Pairwise comparisons after Friedman
- Conover Test for Friedman - More powerful than Nemenyi
- Wilcoxon Signed-Rank with Bonferroni - Multiple pairwise comparisons
Correlation and Association Tests
Non-parametric tests for measuring relationships between variables.
Correlation Tests
Measure monotonic relationships:
- Spearman’s Rank Correlation - Non-parametric measure of association based on ranks
- Kendall’s Tau - Alternative rank correlation with different properties
- Goodman-Kruskal Gamma - Measure of association for ordinal variables
Categorical Association Tests
Test relationships between categorical variables:
- Chi-Square Test - Test independence between categorical variables
- Fisher’s Exact Test - Alternative for small expected frequencies
- Cramer’s V - Effect size for Chi-Square tests
Both Spearman’s rank correlation (ρ) and Kendall’s Tau (τ) are non-parametric measures of association based on ranks, but they have different properties:
Spearman’s Correlation:
- More widely used and recognized
- Easier to calculate and interpret (similar to Pearson’s r)
- Measures the strength of monotonic relationships
- More sensitive to errors and outliers
Kendall’s Tau:
- More robust to outliers and errors in data
- Better statistical properties for small samples
- More accurate p-values for non-normal distributions
- Better interpretation for ordinal data
- More directly interpretable as probability of concordance minus probability of discordance
When to choose Kendall’s Tau:
- When you have small sample sizes
- When you have many tied ranks
- When robustness is a primary concern
- When you’re working with ordinal data
When to choose Spearman’s: - When communicating to audiences familiar with correlation - When computational simplicity is important - When you want direct comparison with Pearson’s r
In practice, both measures often lead to similar conclusions about statistical significance, though the numerical values differ.
How to Choose the Right Non-Parametric Test
Selecting the appropriate non-parametric test depends on your research question and study design:
Decision Guide for Non-Parametric Tests
- Identify your research question and study design (comparison, relationship, etc.)
- Determine what parametric test would be appropriate if assumptions were met
- Select the corresponding non-parametric alternative based on your data characteristics
For Comparing Groups/Samples
Instead of One-Sample t-Test:
- Wilcoxon Signed-Rank Test (Single)
- Sign Test (for highly skewed distributions)
Instead of Independent Samples t-Test:
Instead of Paired Samples t-Test:
- Wilcoxon Signed-Rank Test (Paired)
- Sign Test (Paired) (for highly skewed distributions)
Instead of One-Way ANOVA:
- Kruskal-Wallis Test
- Median Test (more robust but less powerful)
Instead of Repeated Measures ANOVA:
- Friedman Test
- Cochran’s Q Test (for binary outcomes)
For Measuring Relationships
Instead of Pearson Correlation:
- Spearman’s Rank Correlation
- Kendall’s Tau
Instead of Linear Regression:
- Theil-Sen Estimator
- Quantile Regression
For Categorical Variables:
- Chi-Square Test (large samples)
- Fisher’s Exact Test (small samples)
- McNemar’s Test (paired proportions)
For Survival Analysis:
- Log-Rank Test
- Cox Proportional Hazards (semi-parametric)
Advantages and Limitations of Non-Parametric Tests
Advantages
- Fewer assumptions about the underlying distribution
- Robust to outliers and extreme values
- Applicable to ordinal data (rankings, Likert scales)
- Valid for small samples where normality is difficult to verify
- Simple interpretations often based on medians or ranks
- Useful when transformations fail to normalize data
Limitations
- Less statistical power when parametric assumptions are actually met
- Less precise confidence intervals
- Limited multivariate methods compared to parametric statistics
- Reduced ability to control for covariates
- Less familiar to many readers of research
- Testing different hypotheses than parametric equivalents in some cases
Common Questions About Non-Parametric Tests
Non-parametric tests are generally less powerful than their parametric equivalents when all parametric assumptions are met. However, when assumptions are violated, non-parametric tests can be more powerful because:
- They maintain accurate Type I error rates
- They’re less influenced by outliers and extreme values
- They can detect differences in distribution shape, not just central tendency
For data that moderately violates normality with large sample sizes, parametric tests remain robust due to the Central Limit Theorem. However, for small samples, substantial non-normality, or ordinal data, non-parametric tests often provide better statistical inference.
In practice, the decision should consider both the nature of your data and the specific research question you’re addressing.
Not exactly. Parametric and non-parametric tests often test different aspects of the data:
Parametric tests typically compare means:
- t-test: difference in means between groups
- ANOVA: differences in means across multiple groups
Non-parametric tests typically compare:
- Medians (in some cases)
- Overall distributions (stochastic dominance)
- Probability that a random value from one group exceeds a random value from another
For example, the Mann-Whitney U test doesn’t directly test for differences in medians (contrary to common belief) unless you assume identical distribution shapes. Instead, it tests whether one distribution is stochastically greater than the other.
These distinction matters for interpretation. A significant non-parametric result might not necessarily indicate a difference in medians, but rather that values in one group tend to be larger than in the other.
For APA style (7th edition), report:
Mann-Whitney U test:
A Mann-Whitney U test indicated that [variable] was significantly [higher/lower] for [Group 1] (Median = [value]) than for [Group 2] (Median = [value]), U = [value], p = [value], r = [effect size].
Wilcoxon Signed-Rank test:
A Wilcoxon signed-rank test showed that [intervention/condition] elicited a statistically significant change in [variable] (Z = [value], p = [value], r = [effect size]), with median [increasing/decreasing] from [value] to [value].
Kruskal-Wallis test:
A Kruskal-Wallis test showed a statistically significant difference in [variable] between the different [groups/conditions], H([df]) = [value], p = [value], with a mean rank of [value] for [Group 1], [value] for [Group 2], and [value] for [Group 3].
Always include:
- Test name and statistic
- p-value
- Effect size when possible
- Descriptive statistics (typically medians rather than means)
- Relevant degrees of freedom
- Post-hoc test results if applicable
Common effect size measures for non-parametric tests include:
For Mann-Whitney U or Wilcoxon Rank-Sum test:
- r = Z / √N (where Z is the standardized test statistic and N is the total sample size)
- Small effect: r ≈ 0.1
- Medium effect: r ≈ 0.3
- Large effect: r ≈ 0.5
- Probability of superiority: Probability that a random observation from one group exceeds a random observation from the other
For Wilcoxon Signed-Rank test:
- r = Z / √N (where N is the total number of observations)
- Matched-pairs rank biserial correlation
For Kruskal-Wallis test:
- η²H = (H - k + 1) / (n - k) (where H is the test statistic, k is the number of groups, n is the total sample size)
- ε² = H / (n² - 1) ÷ n
For Friedman test:
- Kendall’s W (coefficient of concordance)
- Friedman’s χ²r / (n(k-1)) (where n is the number of subjects and k is the number of conditions)
These effect sizes help quantify the magnitude of observed effects, complementing p-values by indicating practical significance.
While it might seem conservative to use non-parametric tests by default, this approach has important limitations:
Potential drawbacks:
- Lower statistical power when parametric assumptions are met
- More limited ability to handle complex designs (multiple factors, covariates)
- Less precise confidence intervals
- Testing different hypotheses than parametric equivalents in some cases
- Less familiar to many readers and reviewers
A better approach:
- Check if your data reasonably meets parametric assumptions
- Use parametric tests when appropriate, as they often provide more informative results
- Reserve non-parametric tests for cases where assumptions are clearly violated
- Consider transformations to normalize data when possible
- Report the rationale for your test selection
In modern practice, many statisticians recommend using:
- Welch’s t-test (robust to unequal variances) rather than Student’s t-test
- Bootstrapping or permutation tests (computationally intensive but highly flexible)
- Mixed models (can handle various data structures)
These approaches often provide good alternatives that balance robustness and statistical power.
Reuse
Citation
@online{kassambara2025,
author = {Kassambara, Alboukadel},
title = {Non-Parametric {Statistical} {Tests} \textbar{}
{Distribution-Free} {Methods}},
date = {2025-04-10},
url = {https://www.datanovia.com/apps/statfusion/analysis/inferential/non-parametric/index.html},
langid = {en}
}