Many research papers cover the prediction of financial time series but only a small number of them speak about the application in a real trading strategy. Most of the time the research only gives the performance metrics of the model (accuracy, RMSE, …) but without trying to transform it into a profitable strategy.
When we talk about financial time series, we talk about stochastic processes, meaning it deals with a lot of randomness. For this reason, it’s unrealistic to expect getting an accuracy similar to the ones obtained in many other applications of Deep Learning. So, don’t expect getting a 80% accuracy when predicting the market to go up or down for a given time horizon. Predicting Financial Time Series is known to be one of the hardest task in Machine Learning. The purpose is instead of trying to find a model that can give you a small edge when compared to a pure random guess, or to a pure Buy and Hold strategy when we speak about the stock market.
There is also a lot of things to take care regarding the data processing when applying Machine Learning to Financial Time Series. If you are a beginner in this field of research or if you get good metrics from your model but poor real trading results, you can read our paper covering this subject, Financial Time Series Data Processing for Machine Learning, it can help you to understand what’s going wrong.
For years now, and in collaboration with two French engineer schools, CentralSupelec and Polytechnique, Lusis AI Department is working on Financial Time Series predictions in the perspective of applicable trading strategies.
In this article and the future ones, the reader will find some methods and results than can help him a lot with his own research. But because much more than in any other domain, an AI Based Trading Strategy that works can directly be transformed into money, so we will obiously never disclose the most important things enabling to reproduce our models. If you find a reproducible model with very good metrics anywhere in a paper, take care and do your own research and backtest before investing with it.
Predicting GBPUSD intraday trend
In this article we illustrate the application of Deep Learning to build a trading strategy.
We first create and evaluate a model predicting intraday trends on GBPUSD. Then we backtest a strategy solely based on the model predictions before to make it run in real time.
- Dataset : GBPUSD one hour OHLC data between 04/11/2011 and 01/30/2018, so it represents 41,401 one hour OHLC bars, for about 7 years of data
- Training Set : 2011–2014
- Validation Set : 2015
- Test Set : 2016–2018
- Label : Up/Down closing price N bars later, our predictor is a classifier
After determining the label, we need to check the proportion of Up and Down classes in the population. Finding unbalanced labels means we would need to rebalance them, or use another metric than the accuracy. Here, we found 50.3% of Up, for 49.7% of Down class, so the Accuracy is a reliable metric for evaluating our model.
Here, for confidentialty reason, we will not mention the following :
- prediction horizon (N bars)
- size of the time slices used as input for the model
- features (raw or derivated) we use
- filters or any other pre-processing applied
- exact model hyperparameters
But we will show that surprisingly, a simple model can perform well.
In most of the cases when we speak about Time Series predictions we first think about using LSTM or CNN 1D. Here, we wanted to evaluate a MLNN at first to be compared to the two others. But after research we have found a way to make this model accurate enough to build a trading strategy with it.
The following diagram illustrates a MLNN similar to the one we use:
The model output layer has two units, one for the Up and one for the Down class. It uses a softmax activation function. The loss function is a categorical cross entropy.
Why do we use such multiple unit output with softmax instead of a single unit with a sigmoid activation ?
The reason is that it makes easier to create generic functions for testing various models with labels having more than two classes. The simplest case is Up/Down/Neutral, but we could also have more levels of Up and Down (strong, medium, weak, …).
In general we always create a set of productivity functions at first to make the variations of models easier and faster to be built and tested. This accelerates a lot the iterations. Then we can concentrate the efforts on the most important things, the model itself and its performances metrics.
About the other point we can disclose, the activation function of the Dense layers is tanh, we use an adam optimizer and dropout layers for regularization.
We train the model on 500 epochs with batch size of 64 then get the following results:
The loss function has nothing to do with what we usually get in many other Deep Learning applications. Keep in mind that financial markets is a very noisy context where we can only expect to get a small edge.
We can observe the loss improves slightly up to 300 epochs then starts to move up. This indicates some overfitting persists.
There are several ways to fight overfitting including:
- Getting more data : here we only use GBPUSD, we cannot easily apply data augmentation techniques like found in images classifiers. An inverted Cat picture is still a cat, but this is not such simple for financial time series. Any small modification of data can result in very different situations, we can never make sure the label will be the same than for the original sequence. Capturing longer history or adding data from other currency pairs can help
- Simplifing the Model : this is often the most efficient way to fight overfitting, before to use regularization, here it’s already done
- Regularization : here we only kept the dropout after testing L1 and L2 regularization
- Early Stopping : we don’t use it for the moment as we need to capture the best metrics around a given epoch (here somewhere between 250 and 350), an not only a precise epoch like 300. As the method does not natively exists in Keras, we need to implement a custom one
So here, we already did two of these approaches, and before to experiment with more data we check the accuracy and see that we make better than a random guess with 0.56 on the validation set. For some epochs the model even did better with an accuracy exceeding 0.6 but before to try getting it with an optimal early stopping, we want to check at first how a trading strategy based on this predictor behave.
So we backtest it.
We backtest the model on 3 years of data that were not used for training, so between 2015 and early 2018.
The strategy built from the model is very simple. We simply buy when the model predicts an Up trend and sell short when the model predicts a Down trend.
In order to evaluate the pure model performance, we don’t set any stop loss or take profit. We simply close the position N bars after the entry, where N is the prediction horizon.
We also don’t set any spread for the first backtest in order to get the raw metrics.
All the results below are expressed in pips. To get it we use a trade size of 10,000 GBP, so each pip represent 1 USD. We start with a 10,000 USD account. Before speaking about leveraging, every backtest must first start with non-leveraged trades in order to evaluate the risk that can be taken.
Just one word about the Quality Ratio metric. You will not find it in the literature as this is a proprietary formula. Find the description below extracted from the Lusis Backtest Engine documentation.
These first results seems very good. We also observe that Average Trade Win Loss is higher than the spread generally observed on GBPUSD with good brokers, that is generally below 0.8 pips.
The largest losing trade observed was down by 156 pips, so 1.56% of the initial capital and the maximum drawdown is 4%.
As a reference, in the trading community, it’s often recommended by major authors to not trade a strategy exceeding 30% drawdown and to never risk more than 2% per trade.
Here our metrics are already below these two limits.
On financial market, and especially on Forex all the weekdays are not equivalent in term of risk and behavior. So let’s check the P&L per day of week.
Here 0 stands for Monday and 4 stands for Friday. We can easilly see that Friday is a losing day. So let’s just add a filter in the strategy itself to remove it and run the backtest again.
Here we can immediately see an improvement of the P&L, the Average Trade Win Loss and the maximum drawdown went down to 3%. The Equity curve also seems smoother.
The largest losing trade seems to be still quite high at 151 pips. The most basic practice to improve it consists on adding stop loss to the strategy. Some people also add take profits but many authors consider it to be an error. The best trend following strategies often make their profits of the year from only a few number of big moves. Getting out from them too early never give you any chance to take big profits.
To find where to place the Stop Loss, let’s have a look at the MAE/MFE plot, they respectively stands for Maximum Adverse Excursion and Maximum Favorable Excursion. It shows the P&L achieved by every winning and losing trade compared to their maximum potential loss or winning since they were opened.
Here we can see that only a few trades were small winners when their MAE was below about 50 pips. So we can say that after reaching a potential 50 pips loss there is nearly no chance for a trade to recover and become a winner.
But we also see that some trades that went down by more than 50 pips can partially recover their loss before to be closed. It means that such stop loss will not necessarily improve the strategy. The only way to make sure is to run a backtest.
The 50 pips Stop Loss does not really improve the strategy, it even reduces some of the metrics, but we also have higher QR, meaning we can get an even more “tradable” equity curve.
Our Backtest Engine enables running an optimizer in order to find the best Stop Loss distance to use. Anyway, for the moment, and because the difference is not that big, we prefer evaluating a 100% pure Deep Learning approach at first.
Now we add a spread of 0.8 pips and run the backtest again, without any Stop Loss.
The strategy stays profitable and is not that much affected by the spread. So we decide to run it in real time with a trading platform.
Real time Run
Here are the real time results in pips since the begining of the year.
The results are still quite good and the strategy is still profitable.
The recent drop end of July comes from the last news on Brexit. This is a very interesting situation that gave us some ideas to test in order to improve the performances of the model.
Conclusion and next steps
This article covered the creation of a Deep Learning based trading strategy and how we achieved a full backtest process to make sure that beyond the performance metrics, the model can be profitable for real time trading.
The next steps for us is now to :
- Add more data for trying improving the accuracy
- Apply the same model to a portfolio of instruments
- Work on LSTM and CNN 1D models as they are the natural choice for such time series problem
- Work on Hybrid approaches with multiple inputs
Some of these works are already in progress.
Thank you for reading.
All the work in this article was performed under JupyterLab with TensorFlow/Keras.
The Back-tests were performed with a personal engine created by the author under MIT licence, initialy in Golang, then transformed into Python especially to be used with machine learning models coming from TensorFlow, PyTorch or Scikit-Learn. Lusis improved a lot this software by adding automated trading feature enabling to run any strategy on a production system without any modification. Currently this real time trading can work with any Broker using Lusis Trading Platform Technology.