Calculate p-values from Z-scores or Z-scores from p-values for statistical hypothesis testing. Determine statistical significance with detailed explanations.
A p-value (probability value) is a fundamental concept in statistical hypothesis testing that helps determine whether observed results are statistically significant. It represents the probability of obtaining test results at least as extreme as the observed results, assuming the null hypothesis is true.
P-values are crucial for making informed decisions in research, quality control, A/B testing, and many other fields where statistical evidence is required to support or reject claims about populations based on sample data.
Probability of getting results as extreme or more extreme than observed, given null hypothesis is true
Common significance levels: α = 0.05, 0.01, 0.001
P-value Range | Interpretation | Evidence Against H₀ | Decision |
---|---|---|---|
p > 0.10 | Not significant | Little to no evidence | Fail to reject H₀ |
0.05 < p ≤ 0.10 | Marginally significant | Weak evidence | Consider more data |
0.01 < p ≤ 0.05 | Significant | Moderate evidence | Reject H₀ |
0.001 < p ≤ 0.01 | Highly significant | Strong evidence | Reject H₀ |
p ≤ 0.001 | Very highly significant | Very strong evidence | Reject H₀ |
Z-score | Two-tail P-value | One-tail P-value | Significance Level |
---|---|---|---|
±1.645 | 0.100 | 0.050 | 10% (two-tail), 5% (one-tail) |
±1.960 | 0.050 | 0.025 | 5% (two-tail), 2.5% (one-tail) |
±2.326 | 0.020 | 0.010 | 2% (two-tail), 1% (one-tail) |
±2.576 | 0.010 | 0.005 | 1% (two-tail), 0.5% (one-tail) |
±3.291 | 0.001 | 0.0005 | 0.1% (two-tail), 0.05% (one-tail) |
Probability of Type I error (rejecting true H₀)
For sample mean testing
Using standard normal distribution
Application | Test Type | Example Hypothesis | P-value Type |
---|---|---|---|
A/B Testing | Two-sample | Conversion rates are different | Two-tail |
Quality Control | One-sample | Process mean ≠ target | Two-tail |
Drug Efficacy | Clinical trial | Treatment > placebo | One-tail (right) |
Academic Research | Various | Effect exists | Two-tail (usually) |
Finance | Risk analysis | Returns > benchmark | One-tail (right) |
Statistical Significance: Unlikely to occur by chance alone (low p-value).
Practical Significance: Large enough difference to matter in real-world applications.
Cohen's d for Effect Size:
Cohen's d | Effect Size | Interpretation |
---|---|---|
0.2 | Small | Noticeable to experts |
0.5 | Medium | Noticeable to most people |
0.8 | Large | Obvious to everyone |
Probability of correctly rejecting false H₀
Factors Affecting Power:
Factor | Increase Power | Trade-offs |
---|---|---|
Sample size | Larger n | Higher cost, more time |
Effect size | Larger effects easier to detect | Cannot control true effect |
Significance level | Higher α (e.g., 0.10) | More Type I errors |
Variability | Lower σ | Often beyond researcher control |
Problem: Testing multiple hypotheses increases chance of false positives.
Correction Method | Formula | When to Use |
---|---|---|
Bonferroni | α/m (m = number of tests) | Conservative, independent tests |
Holm-Bonferroni | Sequential adjustment | Less conservative than Bonferroni |
False Discovery Rate | Controls proportion of false discoveries | Exploratory research, many tests |
Confidence Interval: Range of plausible values for the parameter.
Relationship: If 95% CI excludes null value, then p < 0.05 for two-tail test.
Advantages of CIs: Show effect size magnitude, precision of estimate, and statistical significance simultaneously.