I previously posted a note on decision trees, then explained how they could be improved by model averaging using ensembles of trees trained on bootstrap samples. Then I implemented it in Matlab, and now finally I’m sharing it here coded in R, with an example to walk through. This should be the simplest way to learn how a trading system like this works and it’s open source.
As I’ve mentioned multiple times, machine learning systems can take in basically any data and then automatically harvest as much alpha as possible from it. The differences between an advanced (
tree bagging,
SVM, etc) and a primitive algorithm (
linear regression,
nearest neighbors,
LDA, etc) usually translate [in trading] to finding more complex nonlinear patterns, controlling overfitting, and, of course, slower runtimes.
In this example we’re going to try to squeeze some alpha out of
GLD, an actively traded gold ETF. After a little bit of thought, we decide on some inputs that might have some predictive value because of what
…
I previously posted a note on decision trees, then explained how they could be improved by model averaging using ensembles of trees trained on bootstrap samples. Then I implemented it in Matlab, and now finally I’m sharing it here coded in R, with an example to walk through. This should be the simplest way to learn how a trading system like this works and it’s open source.
As I’ve mentioned multiple times, machine learning systems can take in basically any data and then automatically harvest as much alpha as possible from it. The differences between an advanced (
tree bagging,
SVM, etc) and a primitive algorithm (
linear regression,
nearest neighbors,
LDA, etc) usually translate [in trading] to finding more complex nonlinear patterns, controlling overfitting, and, of course, slower runtimes.
In this example we’re going to try to squeeze some alpha out of
GLD, an actively traded gold ETF. After a little bit of thought, we decide on some inputs that might have some predictive value because of what we know about macroeconomics and the market. We decide to feed our system data on the movements of two big gold miners, Freeport-McMoRan (
FCX) and Rio Tinto’s ADR (
RTP), bonds (
DHY) the performance of the financial sector (
XLF) and the S&P500 (
SPY).
I recommend using factors such as bond prices, the overall market, and the price of relevant commodities in any machine learning system, because I’ve found they often improve performance. If you look at the sample data which you should have downloaded above, I’ve compiled all this data for you. We will use weekly periods and backtest as far back as 2001.
To run the system from the code I’ve provided, open R (on Windows it’s RGui,
download the installer here) and first enter the command > setwd(‘C:\\[whatever folder you downloaded the files into]’). This sets the default search path directory. Next copy and paste or type > source(‘rungoldtreesys.r’). This will load the data and the tree bagging system code. Now you can backtest the system using whatever parameters you’d like. In the continuation of the example below, I ran it with this command and parameters, > factormodel.tree(data, targets, returns, btsamples=130, horizon=1, trainperiods=8, leverage = ‘kelly’, keepNFeatures = 10,
treesInBag = 40, endPd = 150).
Now I’ll explain how to interpret the results. While the system is backtesting it outputs the predictions at each period so you can see how fast it’s running. The final text results give you a summary of all the predictions and confidence values, and the overall accuracy as the fraction correct. There are also three plots, showing the estimated importance of each variable and decrease in out-of-bootstrap-sample error rate as trees are added to the ensemble (only on the first backtest period, just to give an idea – you don’t want hundreds of charts). Here’s one I got estimating variable importance:
We find that the previous period’s return is the most useful followed by the previous 4 weeks’ return of FCX and bond yield levels. FCXHighLow is the difference between FCX’s weekly high and low and FCXVolNorm is the volume of shares traded in the week for FCX. Both were found to be useless, as we might expect.
Read more about tree bagging to learn how exactly importance is measured. Next we look at the error rate of the ensemble as more decision trees were “grown”:
During feature selection the error rate falls and then rises since the ensemble gets “confused” by the useless variables, which we found above. Then in actual model building the accuracy finishes at about 57.5%. This is just the model build to predict one period into the future by the backtester. The real power of ensemble/bagging learners is that as more components are added the error gets lower, to a point.
Finally, let’s look at the equity curve using the parameters above. There is another parameter, the random seed, which controls how bootstrap samples are chosen so results vary.
Over 120 weeks (about 2 years), the system made about 90%. Two things to keep in mind are that this ignores trading costs (which should be negligible in this case because it’s weekly trading of just a single security), and more importantly that this is based on full
Kelly betting, which is probably too volatile for a human to tolerate – above we see a 40% drawdown. However, when searching for alpha it’s good to have sensitive tools.
If you give this system data and buy/short targets, it will pull as much alpha from the data as is possible for the underlying algorithm.
Finally, I’ll explain the system’s parameters so you can experiment with and modify the code yourself.
data : All the data you’re giving to the system by columns
targets : Either 1 or -1 for a long or short position. Aligned in time with corresponding data
returns : Similar to targets but used for making equity curve
backtest : Don’t mess with this one, possible future functionality
verbose : “
btsamples : The number of periods to evaluate the system on
skip : Ignore every (skip-1) data point. Used for testing over long period faster
horizon : Number of periods out to predict (equivalently, number of lags for data)
dataperiods : Don’t mess with this one, possible future functionality
speedUpFactor : Whether or not to train a model every backtest period, not tested, likely broken
trainperiods : Number of periods of data to train on. More = focus on the past, Less = shorter memory
leverage : Either ‘kelly’ or a positive decimal number. E.g 2 means 2X returns
keepNFeatures : Number of features to retain after feature selection
treesInBag : Number of trees to grow. Smooths confidence values but takes longer
startPd & endPd : Used to test over a specific interval. No date functionality – kept simple for now.
Please leave a message if you have any suggestions, questions, or ideas.