# You Don’t Have to Walk-Forward. Here Are the Alternatives — Expanding Windows, Anchored CV, and CPCV.

> Walk-forward is the default out-of-sample protocol in retail quant. It is also the most data-hungry, the most parameter-sensitive, and the easiest to abuse. Three alternatives — anchored expanding windows, purged K-fold, and combinatorial purged CV — cover the cases walk-forward handles badly.

**Author:** Student One Research  
**Published:** June 5, 2026 (2026-06-05)  
**Reading time:** 11 min read  
**Tags:** Out-of-Sample, Walk-Forward, Expanding Window, Cross-Validation, Methodology  
**Canonical URL:** https://dashboard.studentone.tech/blog/walk-forward-alternative-expanding-window-anchored-cv  
**License:** CC BY 4.0

---

Walk-forward is the default out-of-sample protocol in retail and prosumer quant. It is sold as *the* rigorous OOS test — the one Pardo wrote a book about in 1992, the one every backtest framework ships a built-in for. It is also the most data-hungry of the popular OOS protocols, the one most sensitive to fold-size choices, and the one most easily abused by re-running with a different fold count until the strategy survives. There are at least three alternatives that handle the cases walk-forward handles badly. None of them are exotic. Most quants either don’t know they exist or have been told they’re too expensive to run.

## What Walk-Forward Actually Does

True rolling walk-forward partitions the calendar into K equal-sized blocks. At step *k* you train on block *k−1* and test on block *k*. The window moves forward; nothing accumulates. After K−1 steps you have K−1 disjoint test scores. You aggregate (median, mean, sign-flip rate) and call it the OOS performance.

This is not the only thing labelled "walk-forward" in the wild. Until recently, our own platform shipped an **anchored expanding-window** variant under the same name (the engine’s walk-forward gate v1.x). The anchored variant trains on *everything up to time t*, tests on the next block, then expands. v2.0.0 (April 2026) switched to true rolling because the two protocols answer different questions and conflating them was a methodological bug. Most TradingView/QuantConnect strategy testers also conflate them. Read the source, not the marketing.

## Where Walk-Forward Fails

Walk-forward has three failure modes that are baked into its structure, not into a poor implementation.

### 1. The fold count is a researcher degree of freedom

K=4? K=6? K=10? Each gives a different OOS score. If the operator can re-run with a different K until the strategy survives, the OOS test is no longer out-of-sample — it is a hyperparameter the human optimised over. The honest fix is to commit to K before running the search, and the honest defence is to insist the platform records every choice. Most implementations don’t.

### 2. Each test fold is a single sample

A 6-fold walk-forward gives you 5 OOS test scores. Five points is enough to compute a median. It is not enough to compute a confidence interval, a sign-flip rate with any precision, or a meaningful sample variance. The "test" is structurally underpowered for any short calendar.

### 3. Information bleeds across the boundary

If a trade opens in fold *k−1* and closes in fold *k*, the test fold contains an event whose entry timing was visible during training. This is classical CV leakage and it is silent: the framework does not warn you. Cleaning it requires a **purge** step (drop train events whose exit_day falls inside the test window) and an **embargo** step (drop train events near the test boundary even if they don’t span it). Most retail walk-forward implementations skip both. Our engine’s walk-forward gate exposes a `purge_overlapping_events` flag that defaults off for backwards compatibility — turn it on.

## Alternative 1: Anchored Expanding Windows

The expanding-window protocol trains on *all data from t0 to tk−1*, tests on block *k*, then expands the training set to include block *k* and tests on block *k+1*. The training set grows; the test window slides.

This matches the question a real operator faces in production: *"I have N years of data; how does my parameter estimate stabilise as N grows?"* Walk-forward, with its fixed-size training window, throws away the oldest data at every step — which is wrong if the underlying process is stationary, and right only if you specifically believe regime turnover is faster than the training window.

When to prefer expanding-window over walk-forward:

- Slow-moving signals. Macro overlays, weekly-bar mean reversion, seasonal patterns. The marginal value of an extra year of training data is real; throwing it away is malpractice.

- Short calendars. A 4-year history split into 6 walk-forward folds gives 8-month training windows. You can’t fit anything stable on 8 months of daily bars. The expanding window starts narrow but grows.

- Parameter-stability questions. If you want to prove "my optimal RSI period is stable across time," the expanding window’s monotonically-growing training set is the natural diagnostic.

When walk-forward is correct and expanding-window is wrong:

- Known non-stationarity. Crypto pre-2018 has nothing in common with crypto post-2022. Including it in the training set drags the parameter estimate toward a regime that no longer exists.

- Regime-conditional signals. A volatility-breakout strategy that only fires in high-VIX years should be tested on rolling windows that contain comparable VIX states, not on a training set diluted by years of low-VIX paint-drying.

The takeaway is not "use one or the other." It is that the choice between rolling and anchored is a *statement about your prior on stationarity*, and you owe yourself an honest answer to that question before you pick the protocol.

## Alternative 2: Purged K-Fold (López de Prado)

Walk-forward and expanding windows both impose a single chronological direction. The training set is always before the test set. This is the right constraint for a real trading system. It is also the wrong constraint for asking *"is my signal robust across the calendar"*, because it gives you only K−1 looks and they are all in the same direction.

Purged K-fold cross-validation, formalised by Marcos López de Prado in *Advances in Financial Machine Learning* (2018, ch. 7), keeps the K-fold idea from classical ML but adds two corrections:

- Purge. Drop training events whose [entry_day, exit_day] interval intersects the test fold. This eliminates the leak walk-forward also needs to fix.

- Embargo. Drop training events for a buffer of e days after the test fold ends. This handles the case where the test fold’s closing trades carry residual information into the next training window.

You get K test scores instead of K−1, the test folds are interspersed throughout the calendar (not just the last K−1 blocks), and the protocol gives you a meaningfully larger sample than walk-forward at the same K. Our engine implements this as the `purged_kfold` gate with `n_folds=5, embargo_pct=0.01` as defaults.

The cost: K-fold is not a forecasting protocol. Some test folds sit in the past relative to their training data. If your strategy depends on features that drift unidirectionally (e.g. average ticker liquidity has grown 20× over a decade), purged K-fold gives you robustness scores, not realistic forecasting scores. Use it as a *complement* to walk-forward, not a replacement.

## Alternative 3: Combinatorial Purged Cross-Validation (CPCV)

Walk-forward gives K−1 OOS samples. Purged K-fold gives K. **Combinatorial purged CV** gives `C(K, K/2)` samples — enumerating every way to split K calendar blocks into a training half and a test half, with purge + embargo on every split.

For K=14 this is C(14, 7) = 3,432 distinct OOS evaluations of the same strategy. Each one is a legitimate purged train/test split. The aggregate gives you something walk-forward cannot: a *distribution* of OOS scores broad enough to detect selection-process overfitting at the strategy-search level. This is the basis of the Probability of Backtest Overfitting (PBO) test (Bailey, Borwein, López de Prado, Zhu 2017), implemented as the `pbo` gate in our engine.

CPCV answers a different question from walk-forward: not "did this strategy survive last year?" but "if I had searched a strategy space and reported the in-sample winner, how often would that winner have ranked below the median in a randomly-chosen test half?" If that probability is above 0.5, your search procedure is systematically overfitting and the specific winner you ship is statistically a fluke.

This is the test that catches what every other test misses: *the multiple-testing problem applied to the strategy-selection step itself*. It is also expensive (a 0.45× cost multiplier in our pricing model, vs 0.04× for walk-forward) which is why most retail platforms don’t ship it. We do.

## The Decision Tree

| Question | Right OOS protocol |
| --- | --- |
| Will my parameter estimate hold next quarter? | Rolling walk-forward with purge |
| Does adding more history stabilise the estimate? | Anchored expanding window |
| Is the signal robust across the calendar (regime-agnostic)? | Purged K-fold |
| Is my *strategy-search procedure* overfit? | CPCV / PBO |
| One of: trade entry-exit overlap, sub-daily features, slow signal | Always purge + embargo, regardless of protocol |

None of these tests can be skipped because the others are run. They answer different questions. A strategy that passes walk-forward and fails PBO is a strategy that survives one disjoint test period but is one of dozens of equally-good-looking siblings in the search space — the survival was selection effect, not signal. A strategy that passes purged K-fold and fails walk-forward is regime-robust but stale. Both diagnoses are useful; neither is the same as the other.

## What "Out-of-Sample" Means in Practice

The cleanest mental model: walk-forward is a *forecasting* protocol, expanding window is a *parameter-stability* protocol, purged K-fold is a *signal-robustness* protocol, and CPCV is a *selection-process* protocol. All four are out-of-sample. Each tells you a different thing. Reaching for "walk-forward" by reflex because it’s the one Pardo wrote about in 1992 is an answer that’s thirty years out of date.

The point of running OOS validation isn’t to produce a single thumbs-up. It’s to construct a battery of orthogonal tests such that a strategy passing all of them is genuinely hard to fake. Walk-forward alone is not that battery. It’s one component of it. The [full menu of OOS tests we run on every Student One job](https://dashboard.studentone.tech/blog/out-of-sample-tests-counter-overfitting-menu) is the answer to that question, and it’s longer than four lines.

If you are running one OOS test on your strategy and shipping the winner, you are shipping noise more often than you think. The fix is not subtle. It is to run more tests, on more independent splits, with more honest purge + embargo logic, and to hand the choice of which fold-count and which split-count to a system that cannot be retried until the answer comes out right.

That system exists. The defaults are wrong. The alternatives are not.

---

## Cite this article

Student One Research (2026). *You Don’t Have to Walk-Forward. Here Are the Alternatives — Expanding Windows, Anchored CV, and CPCV.*. Student One Research Blog. https://dashboard.studentone.tech/blog/walk-forward-alternative-expanding-window-anchored-cv