STOCK TREND PREDICTION FRAMEWORK BASED ON LINE SEGMENT ALGORITHM AND DEEP LEARNING by Zhaowei Liang B.Eng., South China University of Technology, 2004 M. Eng., Pittsburg State University, 2006 PROJECT SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE UNIVERSITY OF NORTHERN BRITISH COLUMBIA November 2021 © Zhaowei Liang, 2021 Abstract Stock forecasting is a very complicated task due to its noise and volatile characteristics. How to effectively eliminate the noise has attracted attention from both investors and researchers. This report presents a novel de-noise technique named Line Segment Algorithm (LSA). Compared to those signal processing methods, LSA is based on the characteristic of financial time series. First, the algorithm identified the shape patterns of the historical stock price series and labeled them as turning points and false alarms. Then, a stock trend prediction framework was built and trained with the shape patterns extracted by the algorithm. Eventually, the model could predict whether a shape pattern is turning point or not. To evaluate its performance, experiments on the real stock data were carried out in LSTM and Random Forest, respectively. The results show that LSA demonstrates its effectiveness by better accuracy on prediction. It provides a new perspective for stock trend analysis and can be applied in the actual stock investment trading as well. ii Table of Contents Abstract.................................................................................................................................... ii Table of Contents ................................................................................................................... iii List of Tables ............................................................................................................................v List of Figures......................................................................................................................... vi Acknowledgements .............................................................................................................. viii Chapter 1: Introduction ........................................................................................................ 1 1.1 Feature Extraction ................................................................................................ 1 1.2 Model Enhancement ............................................................................................ 2 Chapter 2: Background ......................................................................................................... 6 2.1 Definition of Trend .............................................................................................. 6 2.2 Relation Between Trend and Noise ..................................................................... 7 2.3 Price VS Trend Prediction ................................................................................... 8 2.4 Data Granularity................................................................................................... 9 Chapter 3: Methodology...................................................................................................... 11 3.1 Contain Relation ................................................................................................ 11 3.2 Top/Bottom Shape Patterns ............................................................................... 13 3.3 Duration and Strength ........................................................................................ 15 3.4 Procedure of LSA .............................................................................................. 18 3.5 Turning Point and False Alarm .......................................................................... 19 3.6 Feature Selection and Processing....................................................................... 21 3.7 Process of Feature Extraction ............................................................................ 22 iii Chapter 4: Experiment Design ........................................................................................... 24 4.1 Data Source ........................................................................................................ 24 4.2 Model 1: LSTM ................................................................................................. 25 4.2.1 Model Structure ............................................................................................. 25 4.2.2 Experiment Result.......................................................................................... 26 4.2.3 Positive Findings and Limitations ................................................................. 31 4.3 Model 2: Random Forest ................................................................................... 32 4.3.1 Model Evaluation ........................................................................................... 32 4.3.2 Experiment Result.......................................................................................... 33 Chapter 5: Conclusion ......................................................................................................... 37 Bibliography ...........................................................................................................................38 Appendix .................................................................................................................................41 iv List of Tables Table 3.1: Selected technical indicators and their description ................................................ 21 Table 4.1: Source of dataset .................................................................................................... 24 Table 4.2: Comparison with existing models* ....................................................................... 27 Table 4.3: Comparison of single layer and stacked structure ................................................. 30 Table 4.4: Comparison of continuous and discrete values ..................................................... 31 Table 4.5: Confusion matrix for two-class classification ....................................................... 32 Table 4.6: Classification report ............................................................................................... 34 v List of Figures Figure 2.1: New definition of trend .......................................................................................... 6 Figure 2.2: Comparison of new definition and traditional one ................................................. 7 Figure 2.3: Trend and noise ...................................................................................................... 8 Figure 2.4: Trend-noise conversion .......................................................................................... 8 Figure 2.5: Price VS trend ........................................................................................................ 9 Figure 2.6: Scale down daily K line into 30-minute ............................................................... 10 Figure 3.1: Line Segment Algorithm ...................................................................................... 11 Figure 3.2: Merge operation ................................................................................................... 12 Figure 3.3: Recursive merge operation ................................................................................... 13 Figure 3.4: Type of shape pattern ........................................................................................... 14 Figure 3.5: Shape patterns in stock ......................................................................................... 14 Figure 3.6: Duration and strength of a downtrend .................................................................. 15 Figure 3.7: Overlap of Top/Bottom Shape ............................................................................. 16 Figure 3.8 How Duration affects LSA: D=3........................................................................... 17 Figure 3.9 How Duration affects LSA: D=4........................................................................... 17 Figure 3.10: Flow diagram of LSA ......................................................................................... 18 Figure 3.11: Turning points and false alarm ........................................................................... 20 Figure 3.12: Process of feature extraction .............................................................................. 23 Figure 4.1: Slide window ........................................................................................................ 25 Figure 4.2: Model structure..................................................................................................... 26 Figure 4.3 Accuracy 66.05% at test data size of 53 ................................................................ 27 vi Figure 4.4: Predict result before classification ....................................................................... 28 Figure 4.5: Predict result after classification .......................................................................... 28 Figure 4.6 Loss curve.............................................................................................................. 29 Figure 4.7: Prediction result with univariate input feature ..................................................... 29 Figure 4.8: Prediction result with multivariate input features ................................................ 30 Figure 4.9: Accuracy result ..................................................................................................... 33 Figure 4.10: Confusion matrix output..................................................................................... 35 Figure 4.11: Out-Of-Bag error score ...................................................................................... 35 Figure 4.12: Feature importance chart .................................................................................... 36 vii Acknowledgements I would like to take this opportunity to express my thanks to my supervisor Professor Chen and Dr. Jiang for giving me the opportunity to choose this interesting topic and providing me invaluable support throughout this research. Interest is one of the most important motivational factors which inspires me to come up with new ideas. Beside my advisor, I would also like to thank my supervisory committee member: Dr. Fu, for his insightful comments and advises from his professional finance perspective. Last but not the least, I would like to give especial thanks to my wife for taking care of two kids alone in China so that I can concentrate on my research. Many thanks to my parents, my brother, and my sister for their continuous support. You all make my life rich and meaningful. viii Chapter 1: Introduction With the rapid development of computer technology, researchers have begun using machine learning to predict stock prices and fluctuations as early as 1970 [1], but how to accurately predict nonstationary financial time series is still an open question to us. In the past decades, they have studied stock prediction from many perspectives, where the features extraction and model enhancement are the two most important directions among them. 1.1 Feature Extraction Feature extraction is one of the most important parts in stock prediction process. A set of high-quality features is the key to a success prediction model. There are mainly two methods to extract features. The first one is to directly use statistics data as input features. For example, some researchers explore the correlation between statistics data and stock price. Some statistics data, including technical indicators [2], macroeconomic factors [3] and investors’ sentiment [1], etc., have been incorporated into the prediction model [4]. It is popular because the data are easy to collect and do not need a lot of calculations. The other one is to extract features in a fuzzy way. Researchers do not care what the features might be like. They just take the advantage of hybrid neural network to find the hidden features and feed them into the prediction model. For example, Wei Bao et al. [3] applied autoencoder to generate deep high-level features for predicting the stock price. In [5], author proposed an approach to convert a time series into image so that CNN can extract useful features from financial variables. CNN is used for features extraction because it has achieved a great success in computer vision and image processing. Study criticized that this approach 1 neglects the potential influence from correlated stock markets. To address this problem, a 3dimensional input tensor construction approach was designed in [6]. However, no matter which method is applied, the common problem is that most of them failed to consider the noise in the data. To illustrate it, there is a survey of 47 papers on stock prediction. 45 papers train the model with untreated daily trading data which makes the training model absorb noise and result in unsatisfying forecast accuracy. Only 2 papers [3, 7] really tried to de-noise the data before feeding to the model. Even these two papers tried to denoise the data with wavelet transform and Principal Component Analysis (PCA), respectively, they still have no clear definition of what noise is really like. Wavelet and PCA are based on the technology from communication electronics. They might work well to handle the signals. But it does not mean that it is also suitable for stock, since noise and fluctuation in financial time series has different context from that in signal processing. 1.2 Model Enhancement Researchers have put a great deal of effort on model enhancement. Most of the early studies used traditional statistical models, such as ARIMA [8-10], to predict stock price. It does have some computational efficiency, but one disadvantage is that it assumes the training data is statistical distribution and stationarity, which is often difficult to meet in nonlinear financial time series [11]. In [12-13], the author used an SVM-based approach for stock prediction. But SVM has a low computational efficiency especially when modeling large-scale financial time series [14]. Besides, Göçken, Mustafa [15] designed a Harmony Search and Genetic Algorithm to enhance traditional ANN model. Although it can learn any nonlinear relationship, there are still some problems in practical applications, including low convergence rate [16], difficult to determine the optimal model structure and the problem of overfitting. In 2 [17], Siripurapu et al. attempted to convert price sequences into pictures and then used the CNN to learn useful features from a single time scale in the price series. But in financial time series, there are many multiple time scale features, and it is not reasonable to study only one of them. Random forest (RF) also drew many attentions in stock prediction. It is an ensemble learning algorithm which can evaluate the contribution of each feature. Many studies show that RF is superior to the models mentioned above. Patel and Shah [18] applied SVM, ANN and RF to predict the trend of stock market based on the historical data of S&P Index, Infosys and other stock data from Indian stock market. The result shows that the RF model has the highest accuracy. Khaiadem et al. [19] used RF algorithm to predict the next day's stock price trend with the historical data of Apple, Amazon and Microsoft, and also obtains a relatively higher accuracy rate. Compared to the other models, RF does not have the problems of overfitting, sensitivity to missing data or complex computation. It was considered one of the best models for time series forecasting, especially in stock trend prediction. This opinion continued until RNN, especially LSTM, was released. Due to its internal memory mechanism [20-21], LSTM works well on sequence data with long-term dependencies. In [22], it was applied to predict out-of-sample directional movements for those constituent stock of S&P 500. The results showed that LSTM outperformed memory-free classification methods. Wei Bao et al. [3] use wavelet transforms and stacked autoencoders to extract features from technical indicators and then feed them to LSTM model to learn time dependencies for stock price prediction. In [23], history information and limit order book were fed to LSTM model for the determination of the stock price 3 movements. All these studies proved that LSTM can successfully extract hidden time dependencies in financial time series. However, there is no big advancement ever since then. Even researchers put a great deal of effort to enhance LSTM models by structure modifications or hybrid neural networks, such as Bi-LSTM, ConvLSTM, CuLSTM, CNN-LSTM and WSAE-LSTM, the performance is just slightly improved. In the light of the above literature review, we summarized three points: (1) It seems there is a bottle neck in model enhancement. Unless there is a new model structure released, there would be little improvement by manipulating the current ones. (2) Compared to model enhancement, features extraction might have a greater potential contribution to the overall prediction performance since it determines the upper limit of a prediction model. (3) In order to reach the true features of stock, we have to find an effective technique to eliminate the noise. Because it deteriorates the performance and makes prediction complicate. Based on these findings, our prediction framework would concentrate on feature extraction and develop a new technique named as Line Segment Algorithm (LSA) to target the noise of stock price from a financial perspective. In order to match the algorithm, we selected stacked LSTM structure with discrete multivariate values, which is still an area seldom covered by the other papers, as prediction model. Random Forest will also be applied to evaluate and compare the performance of LSA. Because they are two of the most popular models in stock forecasting. This report aims at developing a simple but effective prediction framework and 4 figuring out whether the proposed LSA enhances the prediction performance. Its limitations will be also discussed. The remainder of this report is organized into five sections. Section 2 presents the proposed algorithm with an introduction of some background knowledge, including the definition of trend, relation between trend and noise, and prediction granularity, etc. Section 3 is the main content of the report. We will introduce the methodology of LSA. Such as, contain relation, turning points and false alarm, process of algorithm, etc. Then section 4 presents the details regarding our experiment design. The performance of LSTM and Random Forest will be analyzed. Finally, a brief conclusion and future work about the topic is given in section 5. 5 Chapter 2: Background Actually, stock price pattern is different from time series signals. Its financial attribute makes it inappropriate to use signal processing methods to analyze it. Therefore, to better understand principle of the LSA algorithm, we first need to introduce some basic knowledge. 2.1 Definition of Trend The trend of stock market refers to the upward or downward of future price series. Financial time series defines its daily trend by close price. If close price is higher than the previous one, then it is up trend. Vice versa. However, in our model, the definition of trend does not refer the close price, but high or low price. Up trend: HK-1 < HK and LK-1 < LK (2.1) Down trend: HK-1 > HK and LK-1 > LK (2.2) Take figure 2.1 as example, when both high and low prices are higher than the previous ones, it is up trend (left). Similarly, when both high and low prices are lower than the previous ones, it is down trend (right). Figure 2.1: New definition of trend Following figure is an example to illustrate the differences. In traditional definition, the green one on the left is down because the close price is lower than the previous one. But in 6 new definition, it is up trend since its high and low price are higher than the previous one. The same situation applied to the other three candlesticks in oval dash line. Figure 2.2: Comparison of new definition and traditional one There are two reasons for the new trend definition. (1) Using high/low price to describe stock daily movement is more practical than close price. People just get used to use close price to decide daily trend, but it does not mean anything to computer. (2) The new trend definition is a part of LSA, which will be introduced to eliminate the noise in the following section. And some technical indicators are also calculated based on the new definition. 2.2 Relation Between Trend and Noise Stock prices are non-stationary time series with trend and noise. A trend is used to describe the main direction in which the stock price is moving. Conversely, a noise moves around the trend in a fluctuation or oscillation way. As figure 2.3 shows, stock price moves up in a swing. They have direction and tend to travel in observable trends, but they are not necessarily predictable. 7 Figure 2.3: Trend and noise In fact, trend is a higher level of noise and noise is a lower level of trend. The most distinctive differences between trend and noise are their duration (the length of time) and strength (the gap between top/bottom shape). The duration and strength of noise is relatively shorter and weaker than those of trends. Actual Stock Price Pattern = Trend + Noise (a lower level of trend ) (2.3) high level Noise Trend low level Figure 2.4: Trend-noise conversion 2.3 Price VS Trend Prediction There are mainly two categories of stock forecasting [24-25]: one is to focus on short- term stock price prediction. There are many papers [9-10, 26-27] apply deep learning to predict stock prices. Price forecast is popular and straightforward. However, we believe that it really wants to use a simple method to deal with very complicated problems. Because actual stock price is always with noise. Its exact price is unpredictable. 8 Another research method focuses on the stock trend, and then determines stock trading signals. As matter of fact, price is an infinite continuous attribute, while trend is a category. Trend prediction can simplify the problem because it belongs to the field of classification problem. In actual practice, investor does not have to know the exact price in the future. A trend prediction is good enough to help them make decisions. Up / Down Figure 2.5: Price VS trend To sum up, it is impossible to predict absolute prices: models should not be used to predict stock prices, predict stock trends instead. Until now, there is still no clear and systematic rule on how to predict stock trend. In this report, we will try to put effort to cast some light on it. To be more precise, what we predict is the turning point of the trend, including top shape pattern and bottom shape pattern. 2.4 Data Granularity Nowadays, stock trades can take place in very high frequency when the market is open. The data granularity can be various such as second, minute, hour, day and even a fix investment period of time. Most of the papers failed to consider the effect of data granularity and applied daily trading data to predict the following daily price movement. If the granularity of training data is the same as the prediction one, we consider the resolution of training data is too low. In 9 another word, it is not a good idea to use daily K line data to predict a next day price movement, since there is not enough details information. Instead, 30-minute K line data is a better choice. Because each daily K line can be expanded to 8 30-minute K lines. Daily 30-Min Figure 2.6: Scale down daily K line into 30-minute LSA chooses trade day as training granularity and trade week (5 days = 1 week) as the prediction granularity. That means it predicts the trend of a stock in a predefined period measured by trade week. Certainly, short-term or even high frequency stock prediction is also available. Accordingly, a higher resolution data needs to be fed into the model, such as 30minute or even 5-minute data. 10 Chapter 3: Methodology The basic idea of LSA is to split a financial time series into several line segments. As shown in figure 3.1, the stock trend pattern can be split into some rising and falling segments. Figure 3.1: Line Segment Algorithm Unlike the other de-noise method, LSA is based on the characteristic of financial time series [24]. It greatly reduces the size of the original data and provides an easy and efficient technique in stock trend analysis. The following will give more specific related information. 3.1 Contain Relation For a pair of consecutive candlesticks, if the price range of one candlestick is a subset of the other, we call these two candlesticks are in a ‘contain’ relation. Contain relation is calculated by high/low price. [Lk-1, Hk-1] ⸧ [Lk, Hk] or [Lk, Hk] ⸧ [Lk-1, Hk-1] 11 (3.1) When both supply and demand are equally strong, market will not show its clear direction. Instead, it will produce many consecutive candlesticks in parallel (oscillations) and cause the learning model to generate false predictions. The purpose of “contain” relation is to define the valid candlestick and eliminate the oscillations. Figure 3.2: Merge operation Two candlesticks in contain relation can be merged into one. This process is called as merge operation. In an up trend merge (equation 3.2), the merged high (or low) price equals to the bigger one of the high (or low) prices. A similar process applied to the down trend merge as well (equation 3.3). Hmerge = Max(Hk-1, Hk) and Lmerge = Max(Lk-1, Lk) (3.2) Hmerge = Min(Hk-1, Hk) and Lmerge = Min(Lk-1, Lk) (3.3) If there are more than two candlesticks in contain relation, for example, after the first merge, the new candlestick contains the next one, the merge operation can be applied recursively until there is no contain candlestick anymore. The merged candlestick can represent those original ones in contain relation. Actually, merge operation eliminates stock fluctuations by extending its trend duration. 12 1st merge 2nd merge Figure 3.3: Recursive merge operation 3.2 Top/Bottom Shape Patterns Shape pattern is used to describe the beginning or the end of a trend. There are two kind of shape patterns: top shape and bottom shape. A downtrend begins with a top shape and ends with a bottom shape, while an uptrend begins and ends with a bottom shape and top shape, respectively. A shape pattern is composed of at least 6 consecutive candlesticks. The following definitions and algorithms are applied to define top/bottom shape patterns. The high price which is the highest one among the previous 5 and next 1 candle stick is top shape. Hk = Max (Hk-4, Hk-3, Hk-2, Hk-1, Hk+1) (3.4) Lk = Max (Lk-4, Lk-3, Lk-2, L k-1, Lk+1) (3.5) Similarly, the low price which is the lowest one among the previous 5 and next 1 candle stick is bottom shape. Lk = Min (Lk-4, Lk-3, Lk-2, Lk-1, Lk+1) (3.6) Hk = Min (Hk-4, Hk-3, Hk-2, Hk-1, Hk+1) (3.7) 13 Top shape Bottom shape Figure 3.4: Type of shape pattern Figure 3.4 is the ideal shape patterns. In actual practice, top and bottom shape patterns are usually stick together as following figure. Top shape Bottom shape Figure 3.5: Shape patterns in stock Top/bottom shape patterns are fundamental of the algorithm. They help us to analyze the stock in the following ways: (1) They are very effective indicators for buying and selling points. (2) Between each pair of top/bottom or bottom/top shape, there is a trend. (3) Trend and noise can be classified by the duration and strength between each pair of top/bottom shape patterns. 14 3.3 Duration and Strength The purpose of strength and duration is to further separate noise from trend. For each trend, it begins and ends with a pair of top/bottom shape, which represents the two endpoints of a trend. We define the number of the candlesticks between the shapes as trend duration. D = Count (top current – bottom previous) or (3.8) D = Count (bottom current – top previous) (3.9) For example, the trend duration of the following downtrend is D = 5. Strength Duration Figure 3.6: Duration and strength of a downtrend Similarly, for each pair of top/bottom shape, the gap between them represents the strength of the trend. S = (Lk0_previous top - Hk0_bottom) / Hk0_bottom or (3.10) S = (Hk0_previous bottom - Lk0_top) / Lk0_top (3.11) If S < 0, it means the pair of top/bottom shape overlaps with each other, as illustrated in Figure 3.7. High price of bottom shape overlaps with the low price of top shape. If 0 ≤ S < 1.5%, in this situation, we consider the trend is weak. Otherwise, if S ≥ 5%, the trend is strong. 15 Figure 3.7: Overlap of Top/Bottom Shape Noise and volatile features in a stock trend forecast are major challenges because they hinder the extraction of useful information [28]. To address this problem, a valid trend should meet the following requirements in our framework. Otherwise, they will be considered as noise and eliminated by LSA. D > 4 (after merge operation) (3.12) S > 1.5% (after merge operation) (3.13) The reason for D > 4 is because the granularity of prediction should be higher that day. The input feature is calculated by daily trading data. Therefore, the prediction level should be weekly trend. And 5 days equal to a week in stock market. Similarly, supposed the transaction fee of each buy/sell operation is 1.5%, then strength should be greater than that. Otherwise, there will be no profit. Both duration and strength can be adjusted according to the characteristic of a specific stock. Figures 3.8 and 3.9 compare the trend line segment in two difference duration values. Strength follows the same way as duration. For more information on how strength will affect the algorithm, please refer to appendix. 16 Figure 3.8 How Duration affects LSA: D=3 Figure 3.9 How Duration affects LSA: D=4 17 3.4 Procedure of LSA The process of LSA algorithm is as follow: 1. Combine the consecutive K lines which are in contain relation 2. Calculate all shape patterns, including top / bottom shape patterns 3. Delete the shape patterns fail to meet the duration and strength criteria 4. Classify the turning points and the false alarms 5. Eliminate those repeated invalid shape patterns in consecutive order 6. Connect the turning points with line segments Figure 3.10: Flow diagram of LSA 18 3.5 Turning Point and False Alarm After the step 3 from above diagram, we will get the shape patterns. If the stock trend rises (or falls) to a certain point and begins to fall (or rises), that point is turning point. In another word, for a consecutive of same shape patterns, the last one is the turning point. Turning point = the last one of consecutive top/bottom shapes (3.14) Everyone wants to buy or sell stocks at turning points to maximize profits. However, those turning points shown in figure are actually theoretical points. In actual practice, we usually buy or sell in the next candlestick. Since the top/bottom shape will not appear until the next candlestick is shown. That means we never know whether it is a top/bottom shape until it reveals itself. We call this One-Day-Delay (ODD). With ODD, the trading signal is delayed by one day (or 30-minute, depends on the data granularity) so that we could determine whether the turning point will show up or not. The purpose of ODD is to avoid the prediction error. In fact, even a turning point is formed. Sometimes, it may turn out to be a mistake as new shape patterns are generating. The reason for that is when the shape pattern, which is different from the previous one, appears after 5 valid candlesticks, the trend is considered to be ready to complete at any time. However, its complete state can be broken by another new top/bottom shape pattern showing up afterwards. The trend returns to its incomplete state again. We call this phenomenon as false alarm (smaller dots in white). False alarms = all shape patterns between two consecutive turning points 19 (3.15) Figure 3.11: Turning points and false alarm False alarm is bad in our model. What make the prediction difficult is that we can only make a judgment base on the data in the past. That is, the data in the left of time axis. So, false alarm is impossible to eliminate. You never know whether this shape pattern is false alarm or not, until you get another shape pattern. It is tempting to use future data to make a prefect prediction model, but it is not practical. Therefore, we classify the shape patterns into two groups: turning point and false alarm. We aim to apply deep learning models to help us identify false alarm from turning point and improve the overall prediction performance. Our prediction model is quite straightforward. After de-noising by LSA, the data will be greatly compressed. Instead of feeding the whole original data into the model, we only input those features of the shape patterns. Each line segment is composed of more than 5 K line. That means the size of the data fed to the model is less than one fifth of the original data, which can greatly improve its efficiency. 20 3.6 Feature Selection and Processing Features are essential to the performance. Previous studies were mostly conducted by feeding daily trading data, such as open price or close price, as input. That is one of the reasons why those models fail to develop a high accurate model. To avoid “rubbish in, rubbish out”, features are chosen based on the following criteria: (1) Only those trend-related technical indicators will be selected, and (2) All selected technical indicators are converted into trend deterministic data. To increase the amount of information of input features, besides the common technical analysis indicators, this report also adds some relatively novel technical analysis indicators based on the LSA algorithm. such as SAM_SHAP, SAM_NUM and TBS. Following table is the description in details. Technical indicator Description RSI Momentum indicator that determines whether overbought or oversold K_SO Measures the level of close price relative to the previous turning point R_WIL Judges the trend of short-term market behavior DIF Subtraction of quick and slow moving-average, it reflects the price trend DEA Oscillator that fluctuates above and below the zero line MACD Displays trend following characteristics and momentum characteristics CHG_RATE Percentage of price change compared to the last turning points SAM_SHAP Identifies whether the shape pattern is the same as the previous one SAM_NUM Calculates the number of same consecutive shape patterns TBS Identifies top shape pattern, bottom shape patterns and false alarms Table 3.1: Selected technical indicators and their description After the technical indicators are calculated, we go one more step further by converting continuous-valued inputs to discrete trend deterministic values “0” or “1”, where “0” means down trend and “1” indicates up trend. This conversion is based on the fact that each 21 continuous value, when compared with its previous one, indicates the future up or down trend. Since we have to find out the co-relation between the input trends and output trend, it would be easier with trend deterministic data. Take price change rate as an example, it is a better momentum-based indicator rather than price itself. In fact, each input parameters in its discrete from indicates a possible up or down trend determined based on its inherent property. Such as, RSI is a technical curve based on the ratio of the rise and the fall in a certain period. It can reflect the prosperity of the stock markets, and generally used for identifying the overbought and oversold points. It ranges between 0 and 100. If RSI is over 70, the stock is overbought. It may go down in near future. Therefore, we set it as “1”. If it is below 30, the stock is oversold, we set it as “0”. For the values between 30 and 70, if RSI at time “t” is greater than that at time “t-1”, we set it as “1”, otherwise, we set it “0”. The other technical indicators follow a similar process. 3.7 Process of Feature Extraction The whole process of feature extraction is shown as the chart. First, original data go through data preprocessing by LSA, then all technical indicators are calculated. Stock trends are generated based on our new definition. All the continuous technical indicators are converted to discrete values and fed into deep learning network. 22 Figure 3.12: Process of feature extraction In short, there are mainly three steps in our feature engineering process: (1) Merge operation is applied to eliminate the low level noise while strength and duration are used to get rid of high level fluctuations. (2) Then, the model is trained with daily data to predict a week trend with duration greater than 5, which ensure the data resolution is high enough. (3) Last, the model predicts trend instead of price with those trend deterministic discrete data rather than continuous data. In summary, LSA effectively eliminates the noise and fluctuations. It simplifies the prediction problem by identifying the turning points of stock trend. After the data are preprocessed by LSA, only those data of shape patterns are needed to feed into the model. 23 Chapter 4: Experiment Design The outputs of LSA are the top/bottom shape patterns. Random Forest or LSTM is applied to separate false alarm from turning point. RF is a simple but efficient algorithm for classification problem, while LSTM is famous for its internal memory mechanism in time series prediction. The prediction accuracy of both models will be compared and analyzed. We will present the details regarding how we obtain the predicted value and evaluate the performance of each model. 4.1 Data Source To ensure that the data is representative, our data is composed of 6 stock indices from 3 different markets, more than 20 years of data. All data is obtained from yahoo finance. The data is divided into training set and test set. To predict future stock trend, only those data from the past can be used. Index Duration Region 1 Dow 1992 – 2021 United States 2 Nasdaq 1971 – 2021 United States 3 S&P 500 1992 – 2021 United States 4 HIS 1987 – 2021 Hong Kong 5 SSEC 1997 – 2021 China 6 SCI 1991 - 2021 China Table 4.1: Source of dataset In regard to the prediction data labeling, all top/bottom shape patterns were classified into two classes: turning points and false alarms. Our goal is to predict whether the shape pattern is a turning point or false alarm. Based on this knowledge, we are solving a classification problem. In this report, the label is defined as: 24 1, if shape pattern is turning point 0, otherwise, false alarm 4.2 Model 1: LSTM LSTM has the capability of learning long-term dependencies. It is popular in language translation, since it can effectively extract feature dependencies form the content in the previous sentences. Actually, financial time series is similar to language. The current stock price not only depends on the price of yesterday, but also highly relates to those prices before yesterday. 4.2.1 Model Structure All features of historical data must be sliced into a predefined size sample. Window size means that the amount of data covered by sliding window for each row of training set. Its size can be adjusted as needed. The default size is 6 consecutive shape patterns for training and 1 for prediction. If we convert the shape patterns into days, that is more than 30 day’s trading data is applied to predict the price movement 5 days later. Figure 4.1: Slide window 25 In our model, we applied stacked LSTM structure. Studies [29-30] indicated that the depth is the key to many challenging prediction problems. Increasing the depth of the network benefits the model with fewer neurons, less time and more accurate in learning the features. LSTM is easy to fall into overfitting. Dropout is used as a regularization technique. It prevents overfitting by randomly ignoring certain nodes in a layer during training. In our model, there are 4 LSTM layers. A dropout layer is attached to each of the LSTM layer. Figure 4.2: Model structure 4.2.2 Experiment Result Our algorithm works well. It successfully predicts the movement of stock. The test set is 28 shape patterns by default. We also apply different prediction size to see how it affects the accuracy. The size of test data ranges from 39 to 80; the accuracy drops down from 69.23% to 63.75%. Figure 4.3 shows one of the prediction results. “1” means turning point, while “0” 26 means false alarm. For more information about the accuracy in different test size, please refer to the appendix. Figure 4.3 Accuracy 66.05% at test data size of 53 The average accuracy is about 66.04%. Compared with those existing models, our result is slightly better. Model Accuracy (%) 1 Simplistic Model 54.82 2 SVM 61.98 3 LSTM 65.05 4 CNN 59.34 5 Multiple Pipeline Model 63.30 6 NFNN 65.93 7 Our Proposed Model 66.04 Table 4.2: Comparison with existing models* (Cited from [31] for comparison) The following figure is generated to visualize the comparison of real trend and predict trend. As we can see, most of the predict values are close to “0” or “1”, but there are still some in between. That because the output dense layer originally generates continuous values. So, a classification process is applied to the output of dense layer with a threshold. 27 Figure 4.4: Predict result before classification The output of LSTM is the probability value of the classification result. Usually, the threshold is 0.5 by default. When the output probability value is greater than 0.5, it is considered as turning point. Otherwise, it is false alarms. In actual practice, the thresholds can be adjusted in according to the characteristics stock and the investment strategies. Figure 4.5: Predict result after classification Figure 4.5 shows the final result after classification with threshold value of 0.5. The accuracy rate is calculated by the number of correct predictions divided by the total number of test data size. After the accuracy in different prediction size, we also analyze how the number of epochs affects prediction results. When the number of epochs is less than 50, all the prediction results concentrate around the center. There is almost no gap between the prediction values. 28 Obviously, the model is underfit. As the number increase, the gap begins to go wider. As the figure shows, the loss curve begins to converge when the number beyond 1800. Figure 4.6 Loss curve Following is a comparison of single and multivariate prediction results. There are 10 input features in our model, we also tried to use only one feature instead. As the result shows, the prediction value centralizes between 0.2 and 0.5. It seems that the model is under fit. We found that as the number of feature increases, the gap becomes wider. Therefore, we conclude that the more features we draw from the data; the less training data we might acquire. Figure 4.7: Prediction result with univariate input feature 29 Figure 4.8: Prediction result with multivariate input features We evaluated the performance of the model by different layers. As shown in the following table, the more layers, the better accuracy it can reach. Each additional layer benefits the accuracy, but the improvement rate decreases as the number of layer increase. While it is not theoretically clear what is the additional power gained by the deeper architecture, it was observed empirically that deep RNNs work better than shallower ones on some tasks [32]. Items Number of layers 1 2 3 4 Accuracy (%) 57.79 65.47 69.03 70.12 Improvement (%) N/A 13.29 5.44 1.60 Table 4.3: Comparison of single layer and stacked structure The focus is also to compare the performance of LSTM when the inputs are represented in the form of real continuous values and discrete trend deterministic data. It can be seen from the results that the accuracy rate is good when the model is trained by the continuous-valued input. However, when the discrete deterministic data is applied to the model, the performance is further improved. It is not a big change, but still noticeable. The reason behind the improved performance is discrete deterministic data help the model classify the inherent trend that reach technical indicator shows. 30 Discrete values Continuous values Accuracy (%) 68.92 64.11 Time 27m42s 33m57s Table 4.4: Comparison of continuous and discrete values 4.2.3 Positive Findings and Limitations In summary, there are three positive findings: (1) Multivariate is better than single feature, especially when the size of data is relatively small. (2) Stacked LSTM model has superior performance on extracting hidden features than single layer model. (3) In financial time series, discrete input is more suitable for trend prediction, while continuous value is appropriate for price prediction. However, based on the prediction result, there is not much difference between our proposed model and the other LSTM models. Does it mean that our LSA make no noticeable contribution to our framework? The answer to that is LSA greatly decreases the size of training data results in data insufficiency. Take Nasdaq as example, there are 22 years of data. That is 12758 daily trading data, which seems to be a decent amount. However, after the data process by LSA, only the shape patterns are left. The number of data reduced to 1831. LSTM requires a relatively big amount of data to make it performs well. There are approximately 18,000 parameters in our model. But the size of training data is only 1831. Even we apply multivariate to expand the data, but that is just barely enough to make the model works. This also explains why we use multivariate in our model. 31 4.3 Model 2: Random Forest Unlike neural network which requires large data to train, Random Forest works well even with small to medium data. That would be a great advantage in our situation. It is an ensemble supervised machine learning algorithm which uses multiple decision trees in aggregate to help make more stable and accurate predictions. It is widely used because of its flexibility, simplicity, and often quality results. 4.3.1 Model Evaluation In our report, two-class classification is applied to evaluate the prediction performance. As shown in table 4.5, tp, fp, fn and tn mean true positive, false positive, false negative and Positive TP TN Negative Actual true negative, respectively. FP FN Positive Negative Predicted Table 4.5: Confusion matrix for two-class classification To get a more detailed overview of how the model performed, we present a classification report that computes the Accuracy, Specificity, Precision, and Recall. Accuracy measures the portion of all testing samples classified correctly: = ( + + )−( − ) Recall measures the ability of a classifier to correctly identify positive labels: 32 = + Specificity measures the classifier’s ability to correctly identify negative labels: = + Precision measures the proportion of all correctly identified samples in a population of samples which are classified as positive labels: 4.3.2 Experiment Result = + The accuracy_score function calculates the accuracy, which is defined as the number of accurate predictions the model made on the test set. The accuracy of our model is around 70.12%, which is pretty good. Figure 4.9: Accuracy result* _____________________________________________________________________________________ * Acknowledgement: part of the code was cited and modified from Alex Reed “sigma_coding_youtube”: https://github.com/areed1192/sigma_coding_youtube/blob/master/python/python-data-science/machinelearning/random-forest/random_forest_price_prediction.ipynb 33 Accuracy is an important evaluation indicator. However, it may not be suitable for an imbalanced dataset [33-34]. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative. The best value is 1, and the worst value is 0. If the accuracy is high, it means our model is correctly classifying items. In some cases, we will have models that may have low precision or high recall. precision recall f1-score support False Alarm 0.736585 0.762626 0.749380 198.000000 Top/Btm Shape 0.649254 0.617021 0.632727 141.000000 accuracy 0.702065 0.702065 0.702065 0.702065 macro avg 0.692920 0.689824 0.691053 339.000000 weighted avg 0.700262 0.702065 0.700861 339.000000 Table 4.6: Classification report It is difficult to compare two models with low precision and high recall. Vice versa. In order to make results comparable, a metric called the F1-Score is applied. It uses Harmonic Mean in place of Arithmetic Mean by punishing the extreme values more, which helps to measure recall and precision at the same time. Following is a confusion matrix to better visualize our result. 34 Figure 4.10: Confusion matrix output Compared to the validation set, the Out-of-Bag error score uses a sample of data which was not necessarily used during the model’s analysis. Therefore, the OOB sample set is a more random and more variate than the validate set. In another word, the OOB score usually has a relatively smaller accuracy score. Above is the random forest OOB error score for our model. It is slightly bigger than the accuracy score. Figure 4.11: Out-Of-Bag error score The chart is to visualize the feature importance. By this way, we can see how much each feature contributes to the overall performance. 35 Figure 4.12: Feature importance chart 36 Chapter 5: Conclusion In this report, we presented a stock prediction framework based on a novel de-noise algorithm and deep learning. First, we developed LSA to identity the top/bottom shape patterns. Then, the features of shape patterns are fed to LSTM and Random Forest respectively to identity turning points and false alarms. Although actual potential performance of LSA on LSTM is still unclear since the data is insufficient and a lower level of data (such as 30-min) is difficult to get from public, the experiment results of Random Forest shows that LSA plays a significant role in improving the stock trend prediction performance. Compared to the other mathematics approaches, such as Fourier Transform, Wavelet Transform and Principal Component Analysis, LSA is more like a de-noise method from a financial perspective. It not only efficiently eliminates the noise, but also greatly shrinks the size of data. Furthermore, this framework provides an alternative way for stock analysis, which might benefit the traditional prediction models. It is also applicable to the other time series, such as futures, options, temperature etc. Certain parameters of the algorithm affect the accuracy of the classification. For example, the values of strength and duration have great impacts on the shape patterns. Meanwhile, the up trend and down trends are actually inconsistent in strength and duration. Instead of using the same set of strength and duration criteria, it would be better to divide it into two sets of values. Therefore, in future work, the optimization of LSA parameters can be considered as the starting point for further improve the performance of the algorithm. 37 Bibliography [1] Ji, Xuan, Jiachen Wang, and Zhijun Yan. "A stock price prediction method based on deep learning technology." International Journal of Crowd Science 5.1 (2021): 55-72. [2] Shynkevich, Yauheniya, T. Martin McGinnity, Sonya A. Coleman, Ammar Belatreche, and Yuhua Li. "Forecasting price movements using technical indicators: Investigating the impact of varying input window length." Neurocomputing 264 (2017): 71-88. [3] Bao, Wei, Jun Yue, and Yulei Rao. "A deep learning framework for financial time series using stacked autoencoders and long-short term memory." PloS one 12.7 (2017): e0180944. [4] Cervelló-Royo, Roberto, Francisco Guijarro, and Karolina Michniuk. "Stock market trading rule based on pattern recognition and technical analysis: Forecasting the DJIA index with intraday data." Expert systems with Applications 42.14 (2015): 5963-5975. [5] Sezer, Omer Berat, and Ahmet Murat Ozbayoglu. "Algorithmic financial trading with deep convolutional neural networks: Time series to image conversion approach." Applied Soft Computing 70 (2018): 525-538. [6] Hoseinzade, Ehsan, and Saman Haratizadeh. "CNNpred: CNN-based stock market prediction using a diverse set of variables." Expert Systems with Applications 129 (2019): 273-285. [7] Ma, Yilin, Ruizhu Han, and Xiaoling Fu. "Stock prediction based on random forest and LSTM neural network." 2019 19th International Conference on Control, Automation and Systems (ICCAS). IEEE, (2019): 126-130. [8] Booth, G. Geoffrey, Teppo Martikainen, Salil K. Sarkar, Ilkka Virtanen, and Paavo YliOlli. "Nonlinear dependence in Finnish stock returns." European Journal of Operational Research 74, no. 2 (1994): 273-283. [9] Pai, Ping-Feng, and Chih-Sheng Lin. "A hybrid ARIMA and support vector machines model in stock price forecasting." Omega 33.6 (2005): 497-505. [10] Adebiyi Ayodele, A., Adewumi Aderemi, O. and Ayo Charles, K. "Comparison of ARIMA and Artificial Neural Networks Models for Stock Price Prediction. Journal of Applied Mathematics", (2014): 1-7. [11] Le Linh, and Ying Xie. "Recurrent embedding kernel for predicting stock daily direction." 2018 IEEE/ACM 5th International Conference on Big Data Computing Applications and Technologies (BDCAT). IEEE, (2018): 160-166. 38 [12] Lin, Yuling, Haixiang Guo, and Jinglu Hu. "An SVM-based approach for stock market trend prediction." The 2013 international joint conference on neural networks (IJCNN). IEEE, (2013): 1-7. [13] Kara Yakup, Melek Acar Boyacioglu, and Ömer Kaan Baykan. "Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul Stock Exchange." Expert systems with Applications 38.5 (2011): 5311-5319. [14] Niu, Tong, Jianzhou Wang, Haiyan Lu, Wendong Yang, and Pei Du. "Developing a deep learning framework with two-stage feature selection for multivariate financial time series forecasting." Expert Systems with Applications 148 (2020): 113237. [15] Göçken, Mustafa, Mehmet Özçalıcı, Aslı Boru, and Ayşe Tuğba Dosdoğru. "Integrating metaheuristics and artificial neural networks for improved stock price prediction." Expert Systems with Applications 44 (2016): 320-331. [16] Han, Min, Shuhui Zhang, Meiling Xu, Tie Qiu, and Ning Wang. "Multivariate chaotic time series online prediction based on improved kernel recursive least squares algorithm." IEEE transactions on cybernetics 49.4 (2018): 1160-1172. [17] Siripurapu, Ashwin. "Convolutional networks for stock trading." Stanford Univ Dep Comput Sci 1.2 (2014): 1-6. [18] Patel Jigar, Sahil Shah, Priyank Thakkar, and Ketan Kotecha. "Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques." Expert systems with applications 42.1 (2015): 259-268. [19] Khaidem Luckyson, Snehanshu Saha, and Sudeepa Roy Dey. "Predicting the direction of stock market prices using random forest." [J]. Computing Research Repository, 3(2016): 1605 – 1624 [20] Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780. [21] Chung Junyoung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. "Empirical evaluation of gated recurrent neural networks on sequence modeling." arXiv preprint arXiv:1412.3555 (2014). [22] Fischer, Thomas, and Christopher Krauss. "Deep learning with long short-term memory networks for financial market predictions." European Journal of Operational Research 270.2 (2018): 654-669. 39 [23] Sirignano, Justin, and Rama Cont. "Universal features of price formation in financial markets: perspectives from deep learning." Quantitative Finance 19.9 (2019): 1449-1459. [24] Luo, Linkai, and Xi Chen. "Integrating piecewise linear representation and weighted support vector machine for stock trading signal prediction." Applied Soft Computing 13.2 (2013): 806-816. [25] Chang, Pei-Chann, Chin-Yuan Fan, and Chen-Hao Liu. "Integrating a piecewise linear representation method and a neural network model for stock trading points prediction." IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 39.1 (2008): 80-92. [26] Guo, Yanhui, Siming Han, Chuanhe Shen, Ying Li, Xijie Yin, and Yu Bai. "An adaptive SVR for high-frequency stock price forecasting." IEEE Access 6 (2018): 11397-11404. [27] Cao, Jiasheng, and Jinghan Wang. "Stock price forecasting model based on modified convolution neural network and financial time series analysis." International Journal of Communication Systems 32.12 (2019): e3987. [28] Wang, Baohua, Hejiao Huang, and Xiaolong Wang. "A novel text mining approach to financial time series forecasting." Neurocomputing 83 (2012): 136-145. [29] Hermans, Michiel, and Benjamin Schrauwen. "Training and analysing deep recurrent neural networks." Advances in neural information processing systems 26 (2013): 190198. [30] Pascanu, Razvan, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. "How to construct deep recurrent neural networks." arXiv preprint arXiv:1312.6026 (2013). [31] Hao, Yaping, and Qiang Gao. "Predicting the trend of stock market index using the hybrid neural network based on multiple time scale feature learning." Applied Sciences 10.11 (2020): 3961. [32] Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in Neural Information Processing Systems, (2014): 31043112. [33] Kara, Yakup, Melek Acar Boyacioglu, and Ömer Kaan Baykan. "Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul Stock Exchange." Expert systems with Applications 38.5 (2011): 5311-5319. [34] Shi Lukui, Qin Zhijiao, Yan Huiqiang. "Stock turning point prediction method based on minimum variance.", Application Research of Computers. Vol.34 No.11. 11(2017): 3373-3378 40 Appendix How strength affects LSA: the one above S = 1.5%; the one below S = 1.0% 41 Test Data: 39, Accuracy: 69.23% Test Data: 66, Accuracy: 65.15% Test Data: 80, Accuracy: 63.75% 42