# Benjamini-Hochberg FDR: The Multiple-Testing Correction Every Backtester Forgets

> Why testing 100,000 indicator configurations without FDR correction guarantees you will "discover" false signals — and how to fix it

**Author:** Student One Research  
**Published:** April 22, 2026 (2026-04-22)  
**Reading time:** 7 min  
**Tags:** statistics, FDR, multiple testing, Benjamini-Hochberg, hypothesis testing  
**Canonical URL:** https://dashboard.studentone.tech/blog/benjamini-hochberg-fdr-multiple-testing-quant  
**License:** CC BY 4.0

---

If you test one trading signal at p A p-value of 0.05 means: under the null hypothesis (no real edge), there is a 5% probability of observing a result this extreme by chance. Run the test once, that's a tolerable error rate. Run it 100,000 times, and you expect ~5,000 false positives even when nothing real is happening.

      This is not a subtle effect. It is the dominant source of "discovered" strategies that fail in live trading. Every exhaustive parameter sweep that does not correct for multiple testing is producing a list dominated by noise survivors.

## What FDR Controls

      The False Discovery Rate is the expected proportion of false positives among all positive results. If you call 100 configurations "significant" with FDR controlled at 5%, you expect at most 5 of those to be false positives. The other 95 carry genuine statistical evidence.

      FDR is the appropriate target for exploratory parameter sweeps — strictly tighter family-wise error rate controls (Bonferroni, Holm) become so conservative they reject nearly everything when the test count is large. FDR keeps statistical power while bounding false discoveries proportionally.

## The Benjamini-Hochberg Procedure

      The mechanics:

- Run all m hypothesis tests, collect p-values

- Sort p-values ascending: p(1) ≤ p(2) ≤ ... ≤ p(m)

- For each rank k, compute the BH threshold: k × α / m

- Find the largest k such that p(k) ≤ k × α / m

- Reject the null for all tests with rank ≤ k

      The result: a calibrated set of "discovered" configurations where the expected false-positive proportion is bounded by α.

## What This Looks Like in Practice

      Suppose you run an exhaustive RSI sweep — periods 2 to 14,000, oversold/overbought thresholds in 1-point increments. That's roughly 14,000 × 100 × 100 = 140 million configurations. Without FDR, even at p In typical sweeps, the BH-corrected threshold ends up at p Retail backtesting platforms skip FDR correction for three reasons:

- Marketing — "we found 1.4 million profitable configurations" sells better than "we found 47 statistically defensible configurations"

- Workflow — single-pass optimizers produce one "best" configuration, not a corrected family of survivors, so there is no list to correct

- Methodological awareness — many platform developers come from software engineering backgrounds, not biostatistics, where FDR has been standard practice for two decades

      The result: every "AI-discovered strategy" or "optimized indicator preset" you encounter on a retail platform was found without multiple-testing correction. The statistical claim is empty.

## Romano-Wolf and Other Alternatives

      For very high-dimensional parameter spaces with strong dependence structure (where individual tests are not independent), the Romano-Wolf bootstrap procedure provides tighter family-wise error control while accounting for cross-test correlation. Student One supports both BH-FDR and Romano-Wolf gates, with BH as the default and Romano-Wolf available when the configuration space exhibits high correlation (e.g., consecutive periods of the same indicator).

## How Student One Applies FDR

      Every exhaustive sweep runs through the FDR gate automatically. The output is two lists: configurations called significant after BH correction, and configurations rejected by the procedure. The output metadata documents:

- Total tests performed (m)

- Target FDR level (α)

- The actual corrected p-value threshold

- Per-configuration raw p-value and BH-adjusted q-value

- Citation: Benjamini, Y. and Hochberg, Y. (1995), "Controlling the False Discovery Rate"

      This is the structure expected by institutional research workflows and academic peer review.

## Summary

      Without multiple-testing correction, exhaustive parameter enumeration is just an industrial-scale fishing expedition. With Benjamini-Hochberg FDR (or Romano-Wolf for high-correlation spaces), the same enumeration becomes a calibrated statistical procedure. The difference is whether your discovered signals survive live deployment or fail within weeks.

---

## Cite this article

Student One Research (2026). *Benjamini-Hochberg FDR: The Multiple-Testing Correction Every Backtester Forgets*. Student One Research Blog. https://dashboard.studentone.tech/blog/benjamini-hochberg-fdr-multiple-testing-quant
