Inferential Statistics Tools

Perform hypothesis testing and inferential statistical analysis directly in your browser. StatFusion’s inferential statistics tools help you draw conclusions about populations based on sample data, with no software installation required.

Tip

Inferential statistics allows researchers to make probability-based judgments about populations using sample data. These tools help you test hypotheses, calculate p-values, and determine statistical significance for your research questions.

What is Inferential Statistics?

Inferential statistics uses sample data to make generalizations or inferences about a population. Unlike descriptive statistics, which simply summarizes data, inferential statistics allows researchers to:

Test hypotheses about population parameters
Estimate population parameters with confidence intervals
Compare groups to assess if differences are statistically significant
Determine relationships between variables and assess their significance

The tools in this section help you perform these inferential tasks with proper statistical rigor while providing clear interpretations of your results.

Available Inferential Tests

StatFusion offers a comprehensive suite of inferential statistical tests organized by category and purpose.

Mean Comparisons

Tests for comparing means between groups or against known values.

One Sample Tests

Compare a sample mean to a known or hypothesized value:

One-Sample t-Test - Test if a sample mean differs from a specified value
One-Sample z-Test - Compare a sample mean to a known value when population standard deviation is known

Two Sample Tests

Compare means between two groups:

Independent Samples t-Test - Compare means between two unrelated groups
Paired Samples t-Test - Compare means between two related measurements
Welch’s t-Test - Compare means when variances differ

Multiple Sample Tests

Compare means across more than two groups:

One-Way ANOVA - Compare means across multiple independent groups
Repeated Measures ANOVA - Compare means across multiple related measurements
Factorial ANOVA - Test effects of multiple factors and their interactions
ANCOVA - Compare means while controlling for covariates

Post-Hoc Tests

Follow-up tests after finding significant effects in multiple comparisons:

Tukey’s HSD - Compare all pairs of means while controlling Type I error
Bonferroni Test - Adjust p-values for multiple comparisons
Dunnett’s Test - Compare each group to a control group
Scheffé’s Test - Test all possible contrasts

Non-Parametric Tests

Distribution-free alternatives when parametric assumptions aren’t met.

One Sample Tests

Compare a sample to a hypothesized distribution or value:

Wilcoxon Signed Rank Test (Single) - Non-parametric alternative to one-sample t-test
Sign Test - Test median against a hypothesized value

Two Sample Tests

Compare two samples without normality assumptions:

[Mann-Whitney U Test](/apps/statfusion/analysis/inferential/non-parametric/two-sample/mann-whitney-u-test-calculator-nonparametric.qmd} - Non-parametric alternative to independent t-test
Wilcoxon Signed Rank Test (Paired) - Non-parametric alternative to paired t-test
Kolmogorov-Smirnov Test - Compare two sample distributions

Multiple Sample Tests

Compare more than two samples without normality assumptions:

Kruskal-Wallis Test - Non-parametric alternative to one-way ANOVA
Friedman Test - Non-parametric alternative to repeated measures ANOVA
Jonckheere-Terpstra Test - Test for ordered alternatives across groups

Non-Parametric Post-Hoc Tests

Follow-up tests for non-parametric multiple comparisons:

Dunn’s Test - Post-hoc test after Kruskal-Wallis
Conover-Iman Test - Alternative post-hoc for non-parametric ANOVA
Nemenyi Test - Post-hoc test after Friedman test

Correlation & Association Tests

Tests for measuring and testing relationships between variables.

Correlation Tests

Measure and test linear and monotonic relationships:

Pearson Correlation - Test linear relationship between continuous variables
Spearman Correlation - Test monotonic relationship (rank correlation)
Kendall’s Tau - Non-parametric measure of ordinal association

Categorical Association Tests

Examine relationships between categorical variables:

Chi-Square Test - Test independence between categorical variables
Fisher’s Exact Test - Alternative to Chi-Square for small samples
Cochran-Mantel-Haenszel Test - Test conditional independence

Proportion Tests

Tests for comparing proportions within and between samples.

Single and Two-Sample Proportion Tests

Test proportions against a reference or between groups:

One-Sample Proportion Test - Test a proportion against a hypothesized value
Two-Sample Proportion Test - Compare proportions between independent samples

Variance Tests

Tests for comparing variances between samples.

Two-Sample Variance Tests

Compare variances between two samples:

F-Test - Compare variances of two normally distributed populations
Levene’s Test - Less sensitive to departures from normality

Multiple-Sample Variance Tests

Compare variances across multiple groups:

Bartlett’s Test - Test for equal variances across groups (assumes normality)
Brown-Forsythe Test - Robust test for homogeneity of variance

Goodness-of-Fit Tests

Tests for determining if data follows specified distributions.

Distribution Tests

Test if data follows theoretical distributions:

Chi-Square Goodness of Fit - Test if categorical data follows expected frequencies
Anderson-Darling Test - Test if data follows a specified continuous distribution

Normality Tests

Specific tests for normal distribution:

Shapiro-Wilk Test - Powerful test for normality
Kolmogorov-Smirnov Normality Test - Compare data to normal distribution
Q-Q Plot Analysis - Visual assessment of normality

How to Choose the Right Inferential Test

Selecting the appropriate statistical test is crucial for valid analysis:

Tip

Decision Guide for Common Inferential Tests

First, identify your research question and type of data
Then, check if parametric assumptions are met for your data
Finally, select the appropriate test based on your specific scenario

Comparing Means

One sample vs. known value:

Parametric: One-Sample t-Test
Non-parametric: Wilcoxon Signed Rank Test (Single)

Two independent groups:

Parametric: Independent Samples t-Test
Non-parametric: Mann-Whitney U Test

Two related/paired groups:

Parametric: Paired Samples t-Test
Non-parametric: Wilcoxon Signed Rank Test (Paired)

Three+ independent groups:

Parametric: One-Way ANOVA
Non-parametric: Kruskal-Wallis Test

Three+ related groups:

Parametric: Repeated Measures ANOVA
Non-parametric: Friedman Test

Testing Relationships

Two continuous variables:

Parametric: Pearson Correlation
Non-parametric: Spearman Correlation

Two categorical variables:

Large samples: Chi-Square Test
Small samples: Fisher’s Exact Test

Comparing proportions:

Single proportion: One-Sample Proportion Test
Two independent proportions: Two-Sample Proportion Test
Paired proportions: McNemar’s Test

Comparing variances:

Two groups (normal): F-Test
Two groups (robust): Levene’s Test
Multiple groups: Bartlett’s Test

When to Use Parametric vs. Non-parametric Tests

Use parametric tests when:

Your data is normally distributed
Sample size is large enough
Variables are measured on interval or ratio scale
Variances are approximately equal (for certain tests)

Use non-parametric tests when:

Your data significantly deviates from normal distribution
Sample size is small
Variables are measured on ordinal or nominal scale
Data contains outliers that cannot be removed

Understanding Inferential Statistics Concepts

What is a p-value?

A p-value represents the probability of observing results at least as extreme as those in your sample, assuming the null hypothesis is true. A small p-value (typically ≤ 0.05) suggests that your sample results would be unlikely if the null hypothesis were true, leading to its rejection.

Common misinterpretations:

A p-value is NOT the probability that the null hypothesis is true
A p-value is NOT the probability that your findings occurred by chance
A p-value does NOT measure the size or importance of an effect

What is statistical power?

Statistical power is the probability that a test will correctly reject the null hypothesis when the alternative hypothesis is true. In simpler terms, it’s the likelihood of detecting an effect when one truly exists.

Power is affected by:

Sample size (larger samples increase power)
Effect size (larger effects are easier to detect)
Significance level (a less stringent level increases power)
Variability in the data (less variability increases power)

A power of 0.8 (80%) is commonly targeted in research design, meaning you have an 80% chance of detecting an effect if it exists.

What are Type I and Type II errors?

Type I Error (False Positive): Rejecting the null hypothesis when it is actually true. The probability of a Type I error is equal to your significance level (alpha), typically 0.05.

Type II Error (False Negative): Failing to reject the null hypothesis when it is actually false. The probability of a Type II error is denoted as beta, and 1-beta equals statistical power.

These errors represent a trade-off: decreasing one type of error typically increases the other. Researchers must balance both based on the relative costs of each error type in their specific context.

What is effect size and why is it important?

Effect size measures the magnitude of an observed effect, independent of sample size. Unlike p-values, which can be significant for trivial effects with large samples, effect sizes help quantify how meaningful a difference or relationship is in practical terms.

Common effect size measures include:

Cohen’s d (for t-tests): ~0.2 (small), ~0.5 (medium), ~0.8 (large)
Eta-squared or Partial eta-squared (for ANOVA): ~0.01 (small), ~0.06 (medium), ~0.14 (large)
Correlation coefficient r: ~0.1 (small), ~0.3 (medium), ~0.5 (large)

Reporting effect sizes alongside p-values provides a more complete picture of your results and facilitates meta-analysis.

What is the difference between statistical and practical significance?

Statistical significance indicates that an observed effect is unlikely to have occurred by chance, based on a p-value threshold (usually p < 0.05). However, with large samples, even tiny, practically meaningless effects can be statistically significant.

Practical significance refers to whether an effect is large enough to matter in a real-world context. This judgment depends on the specific field, research question, and potential applications.

Best practice is to consider both: use statistical significance to determine if an effect is reliable, and effect size and domain knowledge to evaluate practical significance.

Common Mistakes in Inferential Statistics

Avoid these common pitfalls when conducting inferential analyses:

p-hacking: Performing multiple tests and only reporting significant results
HARKing (Hypothesizing After Results are Known): Presenting post-hoc hypotheses as if they were a priori
Ignoring assumptions: Failing to check if your data meets test assumptions
Misinterpreting p-values: Treating p > 0.05 as “proving” the null hypothesis
Neglecting effect sizes: Focusing only on p-values without considering magnitude
Inappropriate test selection: Using tests that don’t match your data or research question
Overlooking multiple comparisons: Not adjusting for increased Type I error risk

StatFusion’s tools are designed to help you avoid these mistakes by guiding you through appropriate test selection, assumption checking, and result interpretation.

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{kassambara2025,
  author = {Kassambara, Alboukadel},
  title = {Inferential {Statistics} {Tools} \textbar{} {Hypothesis}
    {Testing} \& {Statistical} {Tests}},
  date = {2025-04-10},
  url = {https://www.datanovia.com/apps/statfusion/analysis/inferential/index.html},
  langid = {en}
}

For attribution, please cite this work as:

Kassambara, Alboukadel. 2025. “Inferential Statistics Tools | Hypothesis Testing & Statistical Tests.” April 10, 2025. https://www.datanovia.com/apps/statfusion/analysis/inferential/index.html.

Inferential Statistics Tools

What is Inferential Statistics?

Available Inferential Tests

Mean Comparisons

One Sample Tests

Two Sample Tests

Multiple Sample Tests

Post-Hoc Tests

Non-Parametric Tests

One Sample Tests

Two Sample Tests

Multiple Sample Tests

Non-Parametric Post-Hoc Tests

Correlation & Association Tests

Correlation Tests

Categorical Association Tests

Proportion Tests

Single and Two-Sample Proportion Tests

Related Proportions Tests

Variance Tests

Two-Sample Variance Tests

Multiple-Sample Variance Tests

Goodness-of-Fit Tests

Distribution Tests

Normality Tests

How to Choose the Right Inferential Test

Comparing Means

Testing Relationships

When to Use Parametric vs. Non-parametric Tests

Understanding Inferential Statistics Concepts

Common Mistakes in Inferential Statistics

Further Reading and Resources

Reuse

Citation