XTradeGrok Blog

How to Backtest a Trading Strategy in the UK: A Practitioner’s Guide

Risk warning. Backtested results are historical and do not guarantee future performance. The most common reason backtested strategies fail in live trading is methodological error during the backtest itself — this article exists to help you avoid the most damaging of those errors. Trading involves risk and most retail traders lose money.

Why backtesting matters more than most retail traders realise

Backtesting is the discipline of running a trading strategy against historical market data to estimate how it would have performed. Done properly, it is the single most valuable activity a systematic trader can undertake before risking capital. Done improperly — which is how most retail backtests are done — it produces confidence in strategies that fail immediately in live trading and consumes capital that would have been better preserved.

This article is written for the UK practitioner: someone with the patience and technical inclination to build something rigorous, working with UK-relevant assets (FTSE indices, GBP/USD, gilts, UK equities, crypto traded through UK-accessible venues) and considering live deployment through UK-regulated channels. The principles are universal but the data sources, tax considerations, and execution constraints we will discuss are specific to the UK environment in 2026.

Step 1: Define the strategy specifically enough to test

Most backtests fail before any code is written, because the strategy specification is too vague to test. “Buy when momentum is positive and sell when it turns negative” is not a specification — it is an idea. A specification answers, before any data is touched: which exact assets are traded, on which exact timeframe, with which exact entry condition (formulated as a numerical rule), with which exact exit conditions (both profit-take and stop-loss as numerical rules), with which exact position sizing rule, and how the strategy handles edge cases (no signal for weeks, multiple signals firing simultaneously, gaps over weekends).

Practitioner note. A specification you cannot translate directly into Python or pseudocode is not a specification. If, when you sit down to write the backtest, you find yourself making judgement calls about what the strategy should do in specific situations, the strategy is underspecified. Stop, return to the specification, and resolve those ambiguities before continuing.

A worked example. “GBP/USD mean reversion” is an idea. The corresponding specification: buy GBP/USD when the 5-minute close is more than 2.5 standard deviations below the 50-period rolling mean, with the standard deviation calculated on the same 50-period window; close the position when the price returns to within 1 standard deviation of the mean, or 60 minutes have elapsed without reversion, or the price moves a further 1.5 standard deviations against the position (stop-loss). Position size: 1% of capital divided by stop-loss distance in pips, capped at 5% of capital notional. Operating window: London and New York sessions only (08:00–20:00 UK time). Multiple signals: only one position open at a time; signals firing while a position is open are ignored.

The second version is testable; the first is not. Notice that the second version makes more decisions — timeframe, exact thresholds, time-stop, session restrictions — and each of those decisions is a parameter that the backtest will reveal as more or less important. The discipline of writing the specification forces clarity that often improves the strategy itself before any data is touched.

Step 2: Choose data sources appropriate to your assets

Backtest quality is limited by data quality. Free data sources work for initial exploration; serious backtests require paid data with sufficient history, granularity, and accuracy.

UK equities and indices

For FTSE 100, FTSE 250, and individual UK equities, the standard sources are Refinitiv (formerly Reuters), Bloomberg (institutional), and lower-cost alternatives like Quandl, Stooq, and EOD Historical Data. Yahoo Finance provides free daily data adequate for initial exploration but contains corporate action errors and lacks the granularity for serious intraday work. For UK practitioners, EOD Historical Data is a reasonable middle-ground source covering UK equities with adjustments for splits and dividends.

Forex

Major forex pairs (GBP/USD, EUR/GBP, EUR/USD) are reasonably well-served by free tick data from Dukascopy and historical data from sources like HistData. Spread data is critical for forex backtests — a strategy that ignores realistic spreads will overestimate performance significantly. Use bid-ask spread data from a representative period rather than mid-prices.

Crypto

Crypto data is freely available with high quality from major exchanges via APIs (Binance, Coinbase, Kraken). The challenge is which exchange to use as the source of truth, since prices vary across exchanges and a backtest using one exchange’s data does not perfectly predict execution on another. Best practice is to backtest on the data of the exchange you will actually trade on, accept the slight degradation when generalising, and account for execution costs realistically.

Granularity

Daily data is sufficient for swing strategies operating on multi-day timeframes. Hourly data is needed for intraday strategies with holding periods of hours. Minute-level or tick data is needed for short-term strategies, and even then the trader should be honest about whether retail execution can capture the edges visible at sub-minute timeframes — most cannot.

Step 3: Avoid the seven backtest poisons

Most backtests that look promising and fail in live trading fail because of methodology errors that inflate historical performance. Seven errors account for the bulk of these failures. Before any backtest is treated as evidence, all seven must be addressed explicitly.

Look-ahead bias

The strategy uses information that would not have been available at the moment of decision. Calculating a moving average that includes the current bar’s close to decide whether to enter on that bar is the canonical example. Less obvious cases include using survivorship-biased data (testing on the FTSE 100 components today rather than the FTSE 100 components historically), or using earnings data published at the end of a quarter to make decisions during the quarter.

Survivorship bias

Testing the strategy on assets that exist today and quietly excluding those that delisted, went bankrupt, or were acquired. A long-only equity strategy backtested on the current FTSE 100 will outperform the same strategy applied to the historical index because the constituents that performed worst have been removed from the index over time.

Overfitting

Optimising parameters until the backtest performs spectacularly on the in-sample data but fails out of sample. The defence is rigorous: hold out a portion of data (typically the most recent 20–30%) and never look at it during development. Optimise on the remaining data. When the strategy is finalised, run it once on the held-out data. If performance degrades dramatically, the in-sample performance was overfitted.

Optimisation on noise

Tweaking parameters until performance metrics improve on a single dataset, when the improvements reflect random variation rather than a real edge. The defence is to ensure parameter choices are stable: a strategy that works at parameter X but fails at X+1% has not found a real edge; it has found a coincidence. Real edges are robust to small parameter changes.

Ignoring transaction costs

Spreads, commissions, slippage, financing costs on overnight positions, exchange fees. A strategy with a theoretical 2% monthly edge can become a 0.5% loss after realistic costs. UK retail traders face spread widening on smaller account sizes, FX conversion costs on non-GBP assets, and stamp duty on UK equity purchases (0.5% on most LSE-listed shares, exempt for AIM-listed). All of this enters the backtest.

Unrealistic execution assumptions

Assuming you can buy at the exact close price of a candle when in reality you would buy at the next open, with slippage, on whatever liquidity exists at that moment. Best practice: assume entries and exits occur at the next bar’s open after a signal, with realistic slippage applied (typically 0.05–0.1% for liquid assets, more for illiquid ones). The cost of this realism is lower backtested performance; the benefit is performance that survives live trading.

Insufficient historical coverage

Backtesting only on recent benign conditions. A strategy that performs spectacularly across 2023–2025 may not have been tested through 2008, 2018 Q4, March 2020, or the 2022 inflation regime change. Best practice is to test across multiple market regimes and explicitly identify which regimes the strategy is built for and which it is likely to fail in. A strategy that only works in one regime is not a tradeable strategy; it is a regime-specific tactic.

See xTradeGrok’s approach to systematic trading. Open an xTradeGrok account in minutes →

Step 4: Walk-forward testing

Walk-forward testing is the standard professional methodology for validating that a strategy generalises beyond its in-sample performance. The approach is to divide the historical data into sequential segments, optimise the strategy on the first segment, test it on the second (without further optimisation), then move the window forward and repeat.

A worked example. Ten years of data divided into ten one-year windows. Optimise on years 1–3, test on year 4. Optimise on years 2–4, test on year 5. Continue. The aggregated test-window performance is your honest estimate of how the strategy will perform out of sample. If the walk-forward performance is dramatically worse than the in-sample performance, the strategy is overfitted; if it is comparable, the strategy has a real edge that is robust enough to update parameters periodically.

Practitioner note. Walk-forward testing is standard in the libraries most UK quant developers use — backtrader, vectorbt, QuantConnect’s Lean engine. The activation cost is modest; the methodological improvement is substantial. If your backtest framework does not support walk-forward testing, you are working with the wrong framework.

Step 5: Stress test against scenarios

A strategy with strong walk-forward performance still needs stress testing. The question is not whether the strategy worked on the historical record; it is how the strategy would behave in conditions the historical record does not contain.

Standard stress tests: how does the strategy perform if a major asset’s typical volatility doubles? If correlations between asset classes break down? If a major venue suffers an extended outage? If a Bank of England decision moves a target asset by 5% in a minute? These are not fully simulatable but the trader can ask each question and identify which would cause the strategy to fail catastrophically. A strategy with multiple identifiable failure modes needs either further hardening or a tighter risk-management framework around it. A strategy where you cannot identify any failure modes has not been examined hard enough.

Step 6: Paper trade before going live

Even after a clean backtest and walk-forward validation, paper trading the strategy in real-time market conditions for at least one to three months catches issues no backtest will. Bugs in the implementation that did not manifest on historical data. Execution behaviour that differs from backtest assumptions. Operational issues with the broker or the data feed. Personal psychological responses to the strategy’s actual rhythm of wins and losses, which is rarely what the backtest histogram suggested.

The temptation to skip paper trading is large — the strategy looks good, capital is available, the trader is keen to start — and skipping it is one of the larger contributors to first-deployment losses. The cost of paper trading is time. The cost of skipping it is capital, often substantial.

Trade your validated strategies through xTradeGrok’s platform. Get started with xTradeGrok →

What good backtested performance actually looks like

A common misconception is that good backtested performance means very high returns. It does not. Good backtested performance is characterised by: returns sufficient to compensate for risk and costs; consistent performance across walk-forward windows; manageable drawdowns relative to position sizing; performance that survives realistic transaction-cost assumptions; and clear identification of which market regimes the strategy fits. A strategy showing 8% annual returns with a 10% maximum drawdown across multiple regimes is worth more than a strategy showing 30% annual returns with 50% drawdowns concentrated in one historical period.

The Sharpe ratio is the standard summary metric: annualised excess return divided by annualised standard deviation of returns. A Sharpe of 1.0 is reasonable for a single retail strategy, 1.5 is good, and 2.0+ across a long history is exceptional. Backtests showing Sharpe ratios of 4 or higher are almost always overfitted; they do not survive contact with live markets, and the trader who deploys capital based on them learns expensively.

Frequently asked questions

What software should I use for backtesting?

For Python practitioners: backtrader and vectorbt are the standard libraries; vectorbt is faster for parameter sweeps, backtrader is more flexible for complex strategy logic. For users without programming skills: QuantConnect, TradingView (with Pine Script), and MetaTrader’s Strategy Tester are reasonable alternatives. The platform matters less than the discipline; a rigorous backtest in MetaTrader is more useful than a sloppy one in Python.

How much historical data do I need?

Enough to span multiple market regimes. For UK assets, this typically means at least 10 years and ideally 15–20 years to span low-volatility and high-volatility periods, monetary tightening and easing cycles, and at least one major market dislocation. For crypto, the available history is shorter; backtests confined to 2017–2023 will not have seen the 2025–2026 macro environment.

How do I test for overfitting?

Hold-out validation is the simplest approach: never look at the most recent 20–30% of data during development. Walk-forward testing is more rigorous and is the standard professional approach. Both should produce performance similar to the in-sample period if the strategy has a real edge. Dramatic degradation between in-sample and out-of-sample performance is the diagnostic for overfitting.

What is a realistic Sharpe ratio for a retail strategy?

0.7–1.5 across a meaningful out-of-sample period is the realistic range for a single well-executed retail strategy. Higher than 2.0 across a long history is rare and likely indicates overfitting. Combining multiple uncorrelated strategies can produce portfolio Sharpes higher than any individual strategy’s, which is one of the structural arguments for running more than one strategy at the retail level.

Should I share my backtest results with anyone before going live?

A second pair of eyes catches errors that authorial blindness misses. Showing your backtest methodology and results to a knowledgeable peer — not asking whether to trade, but asking whether the methodology has gaps — routinely catches look-ahead bias, survivorship bias, or unrealistic execution assumptions that the original developer did not see. The xTradeGrok community and similar UK practitioner forums are reasonable venues for this kind of review.

Leave a Reply

Your email address will not be published. Required fields are marked *