What is ESER™ (Enterprise Statistical Exploration Report)?

ESER™ is an exhaustive finite-domain parameter enumeration report. It sweeps every valid indicator parameter configuration across your specified asset universe at native 1-minute resolution. Unlike optimization, ESER delivers the complete solution set — every configuration that meets your statistical thresholds — so you can assess robustness rather than relying on a single optimized result.

How does Student One handle data security?

Student One operates under an ephemeral compute paradigm. Client data transits through cryptographically isolated compute boundaries with zero persistent storage. Upon completion, deterministic purge operations eliminate all intermediate artifacts. Every engagement includes a compute lifecycle certificate and data-use attestation as cryptographic proof of data destruction.

Who is Student One for?

Student One serves funds managing $50 million to $1 billion that need institutional-grade quantitative research without building internal supercomputing infrastructure. Our clients include hedge funds, multi-strategy desks, family offices, endowments, registered investment advisors (SEC, FCA, SEBI), and proprietary trading firms.

What is the difference between Student One and a backtesting platform?

Backtesting platforms optimize signal, position sizing, and risk simultaneously — joint optimization with maximum degrees of freedom that maximizes overfitting risk. Student One separates signal discovery entirely: we enumerate 1M+ parameter configurations through advanced robustness gates (walk-forward survival, permutation null with BH-FDR, auto OOS split) and deliver only statistically validated anomalies. Your quants then apply position sizing and risk to vetted signals — not curve-fitted backtest outputs. The backtest should not be the research.

Does Student One offer a Machine API for trading bots and AI agents?

Yes. Our machine-native API serves trading bots, AI agents, LLM platforms (Claude, GPT, Gemini via MCP tool definitions), and data pipelines. Authenticate with X-Api-Key, submit OHLCV via presigned S3 URLs, and receive exhaustive results via polling or webhooks. OpenAPI 3.1 spec available. Croissant ML-compliant datasets for ML pipeline interop. No human interaction required.

What are the statistical robustness gates?

Foundation: Win-Rate Gate, Recurrence Gate, Excursion Gate (MFE), Per-Regime MFE Gate. Differentiators: Time-of-Day Buckets, Day-of-Week Mask, Volume Confirmation, Volatility Regime, Third-Indicator Regime Gate (Bonferroni-corrected). Advanced: Walk-Forward Survival (Pardo 2008), Permutation Null with Benjamini-Hochberg FDR (Hansen 2005, Romano-Wolf 2005), Cluster Stability (DBSCAN), Auto OOS Split with zero re-optimization (López de Prado 2018). Each gate carries its academic citation and produces auditable metadata.

Benjamini-Hochberg FDR: The Multiple-Testing Correction Every Backtester Forgets

Why testing 100,000 indicator configurations without FDR correction guarantees you will "discover" false signals — and how to fix it

Student One Research · April 22, 2026 · 7 min read

statisticsFDRmultiple testingBenjamini-Hochberghypothesis testing

If you test one trading signal at p < 0.05, you have a 5% chance of a false positive. If you test 100,000 signals at p < 0.05, you have a near-certainty of thousands of false positives. The Benjamini-Hochberg False Discovery Rate (FDR) procedure is the standard statistical correction for this — and almost no retail backtesting platform applies it.

The Multiple-Testing Problem

A p-value of 0.05 means: under the null hypothesis (no real edge), there is a 5% probability of observing a result this extreme by chance. Run the test once, that's a tolerable error rate. Run it 100,000 times, and you expect ~5,000 false positives even when nothing real is happening.

This is not a subtle effect. It is the dominant source of "discovered" strategies that fail in live trading. Every exhaustive parameter sweep that does not correct for multiple testing is producing a list dominated by noise survivors.

What FDR Controls

The False Discovery Rate is the expected proportion of false positives among all positive results. If you call 100 configurations "significant" with FDR controlled at 5%, you expect at most 5 of those to be false positives. The other 95 carry genuine statistical evidence.

FDR is the appropriate target for exploratory parameter sweeps — strictly tighter family-wise error rate controls (Bonferroni, Holm) become so conservative they reject nearly everything when the test count is large. FDR keeps statistical power while bounding false discoveries proportionally.

The Benjamini-Hochberg Procedure

The mechanics:

Run all m hypothesis tests, collect p-values
Sort p-values ascending: p₍₁₎ ≤ p₍₂₎ ≤ ... ≤ p_(m)
For each rank k, compute the BH threshold: k × α / m
Find the largest k such that p_(k) ≤ k × α / m
Reject the null for all tests with rank ≤ k

The result: a calibrated set of "discovered" configurations where the expected false-positive proportion is bounded by α.

What This Looks Like in Practice

Suppose you run an exhaustive RSI sweep — periods 2 to 14,000, oversold/overbought thresholds in 1-point increments. That's roughly 14,000 × 100 × 100 = 140 million configurations. Without FDR, even at p < 0.01, you would expect 1.4 million false positives. With BH-FDR at α = 0.05, the procedure dynamically computes a much tighter per-test threshold so that the expected fraction of false positives among called survivors stays at 5%.

In typical sweeps, the BH-corrected threshold ends up at p < 1e-7 or tighter. The number of "significant" configurations drops from millions to dozens or hundreds — and those that remain carry real statistical evidence, not noise.

Why Platforms Skip This

Retail backtesting platforms skip FDR correction for three reasons:

Marketing — "we found 1.4 million profitable configurations" sells better than "we found 47 statistically defensible configurations"
Workflow — single-pass optimizers produce one "best" configuration, not a corrected family of survivors, so there is no list to correct
Methodological awareness — many platform developers come from software engineering backgrounds, not biostatistics, where FDR has been standard practice for two decades

The result: every "AI-discovered strategy" or "optimized indicator preset" you encounter on a retail platform was found without multiple-testing correction. The statistical claim is empty.

Romano-Wolf and Other Alternatives

For very high-dimensional parameter spaces with strong dependence structure (where individual tests are not independent), the Romano-Wolf bootstrap procedure provides tighter family-wise error control while accounting for cross-test correlation. Student One supports both BH-FDR and Romano-Wolf gates, with BH as the default and Romano-Wolf available when the configuration space exhibits high correlation (e.g., consecutive periods of the same indicator).

How Student One Applies FDR

Every exhaustive sweep runs through the FDR gate automatically. The output is two lists: configurations called significant after BH correction, and configurations rejected by the procedure. The output metadata documents:

Total tests performed (m)
Target FDR level (α)
The actual corrected p-value threshold
Per-configuration raw p-value and BH-adjusted q-value
Citation: Benjamini, Y. and Hochberg, Y. (1995), "Controlling the False Discovery Rate"

This is the structure expected by institutional research workflows and academic peer review.

Summary

Without multiple-testing correction, exhaustive parameter enumeration is just an industrial-scale fishing expedition. With Benjamini-Hochberg FDR (or Romano-Wolf for high-correlation spaces), the same enumeration becomes a calibrated statistical procedure. The difference is whether your discovered signals survive live deployment or fail within weeks.