Survivorship bias is a simple idea but it’s a good time to bring it up after the previous notes on overfitting.
Survivorship bias exaggerates long-only strategies’ performance during backtesting, especially those with a long holding period (years).
Most data sets will not include companies that went bankrupt or were delisted because their share prices went too low. Survivorship bias describes how the companies that we see today are those that did the best in the past. The failures have disappeared.
For ex. if you take ten years of data for all stocks currently included in the S&P 500 the performance will be better than the actual ten year return of the S&P 500.
Another ex. is if your strategy only applies to one stock, then the model you build of it based on historical data will not include the possibility of bankruptcy. Bankruptcy from the perspective of a time series is a floor on the semi-random walk. Maybe the model predicts the stock will drop by $1 when it is at $0.25, not understanding that it can’t go below $0.00. This is not the usual usage of survivorship bias as I explained above but it can be generalized.
I know some professional system development & backtesting…
Survivorship bias is a simple idea but it’s a good time to bring it up after the previous notes on overfitting.
Survivorship bias exaggerates long-only strategies’ performance during backtesting, especially those with a long holding period (years).
Most data sets will not include companies that went bankrupt or were delisted because their share prices went too low. Survivorship bias describes how the companies that we see today are those that did the best in the past. The failures have disappeared.
For ex. if you take ten years of data for all stocks currently included in the S&P 500 the performance will be better than the actual ten year return of the S&P 500.
Another ex. is if your strategy only applies to one stock, then the model you build of it based on historical data will not include the possibility of bankruptcy. Bankruptcy from the perspective of a time series is a floor on the semi-random walk. Maybe the model predicts the stock will drop by $1 when it is at $0.25, not understanding that it can’t go below $0.00. This is not the usual usage of survivorship bias as I explained above but it can be generalized.
I know some professional system development & backtesting products such as MarketQA use complete data sets to eliminate potential survivorship bias. For a retail trader all you can do is keep it in mind. Fortunately survivorship bias’s influence is not too high on most algorithmic systems which are usually high-frequency or at least hold for 1>