An Introduction of Recurrent Neural Networks

Last time I have explained what is statistical arbitrage strategy that is applied by quantitative hedge funds. I mentioned the machine learning algorithms such as Recurrent Neural Networks (RNN) and Long Short Term Memory (LSTM), which are used as foundation to predict action of stock price. This time I will dig deeper and explore what is RNN (mainly) and LSTM, and how they work in practice.

Normally when learning about machine learning, we start with linear and multi-linear regression. These are fundamental concepts for building a Feed-Forward Neural Networks, which allows information to flow only in the forward direction, from input layer, through hidden layers, and to the output layer. There are no cycles or loops in the network. Below is the flow chart of a feed-forward neural network:


The problem with FFN is that decisions are based on the current input. It does not memorize the past data, so there is no future scope. That’s why it is used in general regression and classification problems and certainly can’t be applied on predicting stock price.

Don’t worry, this is where RNN comes to play. First question is: why we use RNN or LSTM (a derivative of RNN) in predicting stock price? If we assume that stock prices in every second as a sequential or time series data, and traditional feedforward networks cannot be used for learning and prediction, we need a mechanism that can retain past or historical information to forecast the future price. That’s why we use Recurrent neural networks algorithm to deal with time series data.

Let’s focus on intuition on RNN first. For example, we took a shot of ball moving in time like the picture below. We also want to predict where the ball is moving.


Well, we can make guesses. However, these will just be any random guesses and would not mean much to data scientist. Without the knowledge where it has been, we do not have data where the ball is going.

Let’s try this: take multiple snapshots of the ball moving.


Now it seems we have enough information to make better prediction because it at least looks like a sequential data now, imagining every ball stores its information or data at that split second. For the sequential series, all data about the ball at a specific time is connected to its last state and provides input for next state. This is the basic logic why RNN is good at processing sequence data for predictions.

Still question: How it works in practice?

Let’s bring a concept of sequential memory, which is how human brain works. Let’s count all the alphabets. Pretty easy, right? what if we do that in reversed order in our brain? It’s workable, but a little harder. This is sequential memory because we are learning alphabets in a sequence, and sequential memory is a mechanism easier for our brains to recognize sequence patterns.

 

If you look at the chart above, the left-hand side is traditional Feedforward networks, whereas the right-hand side is the Recurrent Neural Networks. The loop in the RNN is a representation of saving the previous input so that in any given time t, the input in the RNN is input(t) + input(t-1), its previous data.

This is a code snippet to show the general workflow of RNN.


Problem with RNN

If we let RNN algorithm to read in a sentence as the picture shows. We noticed that the distribution of color (represent the memory in each looping) is shifting towards the most recent string, which is the question mark, and the memory for string “what” has become negligible in the most recent state.

Therefore, as RNN process more steps, it has trouble retaining the information from previous step. Thus, it does not have enough info to learn about long term data. In other words, it suffers from short-term memory.

 

LSTM


LSTM is a special kind of RNN and is capable of learning long-term dependencies by remembering information for longer periods. Instead of having one neural network layer like RNN, four interacting layers are working extensively to achieve storage of past information.

Theoretically speaking, we can utilize LSTM to store amount of past historical data to predict future price action. It sounds perfect, right? However, LSTM are only used in High Frequency trading, which means that it has an acceptable success rate on predicting stock price in next nanoseconds, not even talking about minutes. In other words, that’s the current computing capabilities to foresee the “future” in the stock market.

 

Conclusion

I am very fortunate to discuss these topics with some of the best professionals in the industry and gain their insights about applications of machine learning and AI in this complicated, never-ending project. However, I am always optimistic about the future. If one day, the prediction power improved from milliseconds to minutes(mid-frequency), weas human has moved forward again in the development of history. At last, special thanks to these online contributors who help me understand the basic knowledge of these cool stuffs!

 


References:

https://www.simplilearn.com/tutorials/deep-learning-tutorial/rnn

https://machinelearningmastery.com/calculus-in-action-neural-networks/

https://www.youtube.com/watch?v=LHXXI4-IEns

 

 










Comments

Popular posts from this blog

A Peek on How Statistical Analysis and Machine Learning Work in Constructing Quantitative Strategy

Are Machine Learning And AI the Future of Investing?