STOCK TREND PREDICTION FRAMEWORK BASED ON LINE SEGMENT
ALGORITHM AND DEEP LEARNING
by
Zhaowei Liang
B.Eng., South China University of Technology, 2004
M. Eng., Pittsburg State University, 2006

PROJECT SUBMITTED IN PARTIAL FULFILLMENT OF
THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE
IN
COMPUTER SCIENCE

UNIVERSITY OF NORTHERN BRITISH COLUMBIA
November 2021
© Zhaowei Liang, 2021

Abstract

Stock forecasting is a very complicated task due to its noise and volatile characteristics.
How to effectively eliminate the noise has attracted attention from both investors and
researchers. This report presents a novel de-noise technique named Line Segment Algorithm
(LSA). Compared to those signal processing methods, LSA is based on the characteristic of
financial time series. First, the algorithm identified the shape patterns of the historical stock
price series and labeled them as turning points and false alarms. Then, a stock trend prediction
framework was built and trained with the shape patterns extracted by the algorithm. Eventually,
the model could predict whether a shape pattern is turning point or not. To evaluate its
performance, experiments on the real stock data were carried out in LSTM and Random Forest,
respectively. The results show that LSA demonstrates its effectiveness by better accuracy on
prediction. It provides a new perspective for stock trend analysis and can be applied in the
actual stock investment trading as well.

ii

Table of Contents

Abstract.................................................................................................................................... ii
Table of Contents ................................................................................................................... iii
List of Tables ............................................................................................................................v
List of Figures......................................................................................................................... vi
Acknowledgements .............................................................................................................. viii
Chapter 1: Introduction ........................................................................................................ 1
1.1

Feature Extraction ................................................................................................ 1

1.2

Model Enhancement ............................................................................................ 2

Chapter 2: Background ......................................................................................................... 6
2.1

Definition of Trend .............................................................................................. 6

2.2

Relation Between Trend and Noise ..................................................................... 7

2.3

Price VS Trend Prediction ................................................................................... 8

2.4

Data Granularity................................................................................................... 9

Chapter 3: Methodology...................................................................................................... 11
3.1

Contain Relation ................................................................................................ 11

3.2

Top/Bottom Shape Patterns ............................................................................... 13

3.3

Duration and Strength ........................................................................................ 15

3.4

Procedure of LSA .............................................................................................. 18

3.5

Turning Point and False Alarm .......................................................................... 19

3.6

Feature Selection and Processing....................................................................... 21

3.7

Process of Feature Extraction ............................................................................ 22
iii

Chapter 4: Experiment Design ........................................................................................... 24
4.1

Data Source ........................................................................................................ 24

4.2

Model 1: LSTM ................................................................................................. 25

4.2.1 Model Structure ............................................................................................. 25
4.2.2 Experiment Result.......................................................................................... 26
4.2.3 Positive Findings and Limitations ................................................................. 31
4.3

Model 2: Random Forest ................................................................................... 32

4.3.1 Model Evaluation ........................................................................................... 32
4.3.2 Experiment Result.......................................................................................... 33
Chapter 5: Conclusion ......................................................................................................... 37
Bibliography ...........................................................................................................................38
Appendix .................................................................................................................................41

iv

List of Tables

Table 3.1: Selected technical indicators and their description ................................................ 21
Table 4.1: Source of dataset .................................................................................................... 24
Table 4.2: Comparison with existing models* ....................................................................... 27
Table 4.3: Comparison of single layer and stacked structure ................................................. 30
Table 4.4: Comparison of continuous and discrete values ..................................................... 31
Table 4.5: Confusion matrix for two-class classification ....................................................... 32
Table 4.6: Classification report ............................................................................................... 34

v

List of Figures

Figure 2.1: New definition of trend .......................................................................................... 6
Figure 2.2: Comparison of new definition and traditional one ................................................. 7
Figure 2.3: Trend and noise ...................................................................................................... 8
Figure 2.4: Trend-noise conversion .......................................................................................... 8
Figure 2.5: Price VS trend ........................................................................................................ 9
Figure 2.6: Scale down daily K line into 30-minute ............................................................... 10
Figure 3.1: Line Segment Algorithm ...................................................................................... 11
Figure 3.2: Merge operation ................................................................................................... 12
Figure 3.3: Recursive merge operation ................................................................................... 13
Figure 3.4: Type of shape pattern ........................................................................................... 14
Figure 3.5: Shape patterns in stock ......................................................................................... 14
Figure 3.6: Duration and strength of a downtrend .................................................................. 15
Figure 3.7: Overlap of Top/Bottom Shape ............................................................................. 16
Figure 3.8 How Duration affects LSA: D=3........................................................................... 17
Figure 3.9 How Duration affects LSA: D=4........................................................................... 17
Figure 3.10: Flow diagram of LSA ......................................................................................... 18
Figure 3.11: Turning points and false alarm ........................................................................... 20
Figure 3.12: Process of feature extraction .............................................................................. 23
Figure 4.1: Slide window ........................................................................................................ 25
Figure 4.2: Model structure..................................................................................................... 26
Figure 4.3 Accuracy 66.05% at test data size of 53 ................................................................ 27
vi

Figure 4.4: Predict result before classification ....................................................................... 28
Figure 4.5: Predict result after classification .......................................................................... 28
Figure 4.6 Loss curve.............................................................................................................. 29
Figure 4.7: Prediction result with univariate input feature ..................................................... 29
Figure 4.8: Prediction result with multivariate input features ................................................ 30
Figure 4.9: Accuracy result ..................................................................................................... 33
Figure 4.10: Confusion matrix output..................................................................................... 35
Figure 4.11: Out-Of-Bag error score ...................................................................................... 35
Figure 4.12: Feature importance chart .................................................................................... 36

vii

Acknowledgements

I would like to take this opportunity to express my thanks to my supervisor Professor
Chen and Dr. Jiang for giving me the opportunity to choose this interesting topic and providing
me invaluable support throughout this research. Interest is one of the most important
motivational factors which inspires me to come up with new ideas.
Beside my advisor, I would also like to thank my supervisory committee member: Dr.
Fu, for his insightful comments and advises from his professional finance perspective.
Last but not the least, I would like to give especial thanks to my wife for taking care of
two kids alone in China so that I can concentrate on my research. Many thanks to my parents,
my brother, and my sister for their continuous support. You all make my life rich and
meaningful.

viii

Chapter 1: Introduction
With the rapid development of computer technology, researchers have begun using
machine learning to predict stock prices and fluctuations as early as 1970 [1], but how to
accurately predict nonstationary financial time series is still an open question to us. In the past
decades, they have studied stock prediction from many perspectives, where the features
extraction and model enhancement are the two most important directions among them.
1.1

Feature Extraction
Feature extraction is one of the most important parts in stock prediction process. A set

of high-quality features is the key to a success prediction model. There are mainly two methods
to extract features. The first one is to directly use statistics data as input features. For example,
some researchers explore the correlation between statistics data and stock price. Some statistics
data, including technical indicators [2], macroeconomic factors [3] and investors’ sentiment
[1], etc., have been incorporated into the prediction model [4]. It is popular because the data
are easy to collect and do not need a lot of calculations.
The other one is to extract features in a fuzzy way. Researchers do not care what the
features might be like. They just take the advantage of hybrid neural network to find the hidden
features and feed them into the prediction model. For example, Wei Bao et al. [3] applied
autoencoder to generate deep high-level features for predicting the stock price. In [5], author
proposed an approach to convert a time series into image so that CNN can extract useful
features from financial variables. CNN is used for features extraction because it has achieved
a great success in computer vision and image processing. Study criticized that this approach

1

neglects the potential influence from correlated stock markets. To address this problem, a 3dimensional input tensor construction approach was designed in [6].
However, no matter which method is applied, the common problem is that most of them
failed to consider the noise in the data. To illustrate it, there is a survey of 47 papers on stock
prediction. 45 papers train the model with untreated daily trading data which makes the training
model absorb noise and result in unsatisfying forecast accuracy. Only 2 papers [3, 7] really
tried to de-noise the data before feeding to the model. Even these two papers tried to denoise
the data with wavelet transform and Principal Component Analysis (PCA), respectively, they
still have no clear definition of what noise is really like. Wavelet and PCA are based on the
technology from communication electronics. They might work well to handle the signals. But
it does not mean that it is also suitable for stock, since noise and fluctuation in financial time
series has different context from that in signal processing.
1.2

Model Enhancement
Researchers have put a great deal of effort on model enhancement. Most of the early

studies used traditional statistical models, such as ARIMA [8-10], to predict stock price. It
does have some computational efficiency, but one disadvantage is that it assumes the training
data is statistical distribution and stationarity, which is often difficult to meet in nonlinear
financial time series [11]. In [12-13], the author used an SVM-based approach for stock
prediction. But SVM has a low computational efficiency especially when modeling large-scale
financial time series [14]. Besides, Göçken, Mustafa [15] designed a Harmony Search and
Genetic Algorithm to enhance traditional ANN model. Although it can learn any nonlinear
relationship, there are still some problems in practical applications, including low convergence
rate [16], difficult to determine the optimal model structure and the problem of overfitting. In
2

[17], Siripurapu et al. attempted to convert price sequences into pictures and then used the
CNN to learn useful features from a single time scale in the price series. But in financial time
series, there are many multiple time scale features, and it is not reasonable to study only one
of them.
Random forest (RF) also drew many attentions in stock prediction. It is an ensemble
learning algorithm which can evaluate the contribution of each feature. Many studies show that
RF is superior to the models mentioned above. Patel and Shah [18] applied SVM, ANN and
RF to predict the trend of stock market based on the historical data of S&P Index, Infosys and
other stock data from Indian stock market. The result shows that the RF model has the highest
accuracy. Khaiadem et al. [19] used RF algorithm to predict the next day's stock price trend
with the historical data of Apple, Amazon and Microsoft, and also obtains a relatively higher
accuracy rate. Compared to the other models, RF does not have the problems of overfitting,
sensitivity to missing data or complex computation. It was considered one of the best models
for time series forecasting, especially in stock trend prediction. This opinion continued until
RNN, especially LSTM, was released.
Due to its internal memory mechanism [20-21], LSTM works well on sequence data
with long-term dependencies. In [22], it was applied to predict out-of-sample directional
movements for those constituent stock of S&P 500. The results showed that LSTM
outperformed memory-free classification methods. Wei Bao et al. [3] use wavelet transforms
and stacked autoencoders to extract features from technical indicators and then feed them to
LSTM model to learn time dependencies for stock price prediction. In [23], history information
and limit order book were fed to LSTM model for the determination of the stock price

3

movements. All these studies proved that LSTM can successfully extract hidden time
dependencies in financial time series.
However, there is no big advancement ever since then. Even researchers put a great
deal of effort to enhance LSTM models by structure modifications or hybrid neural networks,
such as Bi-LSTM, ConvLSTM, CuLSTM, CNN-LSTM and WSAE-LSTM, the performance
is just slightly improved.
In the light of the above literature review, we summarized three points:
(1) It seems there is a bottle neck in model enhancement. Unless there is a new model
structure released, there would be little improvement by manipulating the current
ones.
(2) Compared to model enhancement, features extraction might have a greater potential
contribution to the overall prediction performance since it determines the upper
limit of a prediction model.
(3) In order to reach the true features of stock, we have to find an effective technique
to eliminate the noise. Because it deteriorates the performance and makes
prediction complicate.
Based on these findings, our prediction framework would concentrate on feature
extraction and develop a new technique named as Line Segment Algorithm (LSA) to target the
noise of stock price from a financial perspective. In order to match the algorithm, we selected
stacked LSTM structure with discrete multivariate values, which is still an area seldom covered
by the other papers, as prediction model. Random Forest will also be applied to evaluate and
compare the performance of LSA. Because they are two of the most popular models in stock
forecasting. This report aims at developing a simple but effective prediction framework and
4

figuring out whether the proposed LSA enhances the prediction performance. Its limitations
will be also discussed.
The remainder of this report is organized into five sections. Section 2 presents the
proposed algorithm with an introduction of some background knowledge, including the
definition of trend, relation between trend and noise, and prediction granularity, etc. Section 3
is the main content of the report. We will introduce the methodology of LSA. Such as, contain
relation, turning points and false alarm, process of algorithm, etc. Then section 4 presents the
details regarding our experiment design. The performance of LSTM and Random Forest will
be analyzed. Finally, a brief conclusion and future work about the topic is given in section 5.

5

Chapter 2: Background
Actually, stock price pattern is different from time series signals. Its financial attribute
makes it inappropriate to use signal processing methods to analyze it. Therefore, to better
understand principle of the LSA algorithm, we first need to introduce some basic knowledge.
2.1

Definition of Trend
The trend of stock market refers to the upward or downward of future price series.

Financial time series defines its daily trend by close price. If close price is higher than the
previous one, then it is up trend. Vice versa. However, in our model, the definition of trend
does not refer the close price, but high or low price.
Up trend: HK-1 < HK and LK-1 < LK

(2.1)

Down trend: HK-1 > HK and LK-1 > LK

(2.2)

Take figure 2.1 as example, when both high and low prices are higher than the previous
ones, it is up trend (left). Similarly, when both high and low prices are lower than the previous
ones, it is down trend (right).

Figure 2.1: New definition of trend

Following figure is an example to illustrate the differences. In traditional definition, the
green one on the left is down because the close price is lower than the previous one. But in
6

new definition, it is up trend since its high and low price are higher than the previous one. The
same situation applied to the other three candlesticks in oval dash line.

Figure 2.2: Comparison of new definition and traditional one

There are two reasons for the new trend definition.
(1) Using high/low price to describe stock daily movement is more practical than close
price. People just get used to use close price to decide daily trend, but it does not
mean anything to computer.
(2) The new trend definition is a part of LSA, which will be introduced to eliminate the
noise in the following section. And some technical indicators are also calculated
based on the new definition.
2.2

Relation Between Trend and Noise
Stock prices are non-stationary time series with trend and noise. A trend is used to

describe the main direction in which the stock price is moving. Conversely, a noise moves
around the trend in a fluctuation or oscillation way. As figure 2.3 shows, stock price moves up
in a swing. They have direction and tend to travel in observable trends, but they are not
necessarily predictable.

7

Figure 2.3: Trend and noise

In fact, trend is a higher level of noise and noise is a lower level of trend. The most
distinctive differences between trend and noise are their duration (the length of time) and
strength (the gap between top/bottom shape). The duration and strength of noise is relatively
shorter and weaker than those of trends.
Actual Stock Price Pattern = Trend + Noise (a lower level of trend )

(2.3)

high level

Noise

Trend

low level
Figure 2.4: Trend-noise conversion

2.3

Price VS Trend Prediction
There are mainly two categories of stock forecasting [24-25]: one is to focus on short-

term stock price prediction. There are many papers [9-10, 26-27] apply deep learning to predict
stock prices. Price forecast is popular and straightforward. However, we believe that it really
wants to use a simple method to deal with very complicated problems. Because actual stock
price is always with noise. Its exact price is unpredictable.
8

Another research method focuses on the stock trend, and then determines stock trading
signals. As matter of fact, price is an infinite continuous attribute, while trend is a category.
Trend prediction can simplify the problem because it belongs to the field of classification
problem. In actual practice, investor does not have to know the exact price in the future. A
trend prediction is good enough to help them make decisions.

Up / Down

Figure 2.5: Price VS trend

To sum up, it is impossible to predict absolute prices: models should not be used to
predict stock prices, predict stock trends instead. Until now, there is still no clear and
systematic rule on how to predict stock trend. In this report, we will try to put effort to cast
some light on it. To be more precise, what we predict is the turning point of the trend, including
top shape pattern and bottom shape pattern.
2.4

Data Granularity
Nowadays, stock trades can take place in very high frequency when the market is open.

The data granularity can be various such as second, minute, hour, day and even a fix investment
period of time.
Most of the papers failed to consider the effect of data granularity and applied daily
trading data to predict the following daily price movement. If the granularity of training data
is the same as the prediction one, we consider the resolution of training data is too low. In
9

another word, it is not a good idea to use daily K line data to predict a next day price movement,
since there is not enough details information. Instead, 30-minute K line data is a better choice.
Because each daily K line can be expanded to 8 30-minute K lines.

Daily

30-Min

Figure 2.6: Scale down daily K line into 30-minute

LSA chooses trade day as training granularity and trade week (5 days = 1 week) as the
prediction granularity. That means it predicts the trend of a stock in a predefined period
measured by trade week. Certainly, short-term or even high frequency stock prediction is also
available. Accordingly, a higher resolution data needs to be fed into the model, such as 30minute or even 5-minute data.

10

Chapter 3: Methodology
The basic idea of LSA is to split a financial time series into several line segments. As
shown in figure 3.1, the stock trend pattern can be split into some rising and falling segments.

Figure 3.1: Line Segment Algorithm

Unlike the other de-noise method, LSA is based on the characteristic of financial time
series [24]. It greatly reduces the size of the original data and provides an easy and efficient
technique in stock trend analysis. The following will give more specific related information.
3.1

Contain Relation
For a pair of consecutive candlesticks, if the price range of one candlestick is a subset

of the other, we call these two candlesticks are in a ‘contain’ relation. Contain relation is
calculated by high/low price.
[Lk-1, Hk-1] ⸧ [Lk, Hk] or [Lk, Hk] ⸧ [Lk-1, Hk-1]

11

(3.1)

When both supply and demand are equally strong, market will not show its clear
direction. Instead, it will produce many consecutive candlesticks in parallel (oscillations) and
cause the learning model to generate false predictions. The purpose of “contain” relation is to
define the valid candlestick and eliminate the oscillations.

Figure 3.2: Merge operation

Two candlesticks in contain relation can be merged into one. This process is called as
merge operation. In an up trend merge (equation 3.2), the merged high (or low) price equals to
the bigger one of the high (or low) prices. A similar process applied to the down trend merge
as well (equation 3.3).
Hmerge = Max(Hk-1, Hk) and Lmerge = Max(Lk-1, Lk)

(3.2)

Hmerge = Min(Hk-1, Hk) and Lmerge = Min(Lk-1, Lk)

(3.3)

If there are more than two candlesticks in contain relation, for example, after the first
merge, the new candlestick contains the next one, the merge operation can be applied
recursively until there is no contain candlestick anymore. The merged candlestick can represent
those original ones in contain relation. Actually, merge operation eliminates stock fluctuations
by extending its trend duration.

12

1st merge

2nd merge

Figure 3.3: Recursive merge operation

3.2

Top/Bottom Shape Patterns
Shape pattern is used to describe the beginning or the end of a trend. There are two

kind of shape patterns: top shape and bottom shape. A downtrend begins with a top shape and
ends with a bottom shape, while an uptrend begins and ends with a bottom shape and top shape,
respectively. A shape pattern is composed of at least 6 consecutive candlesticks. The following
definitions and algorithms are applied to define top/bottom shape patterns.
The high price which is the highest one among the previous 5 and next 1 candle stick
is top shape.
Hk = Max (Hk-4, Hk-3, Hk-2, Hk-1, Hk+1)

(3.4)

Lk = Max (Lk-4, Lk-3, Lk-2, L k-1, Lk+1)

(3.5)

Similarly, the low price which is the lowest one among the previous 5 and next 1 candle
stick is bottom shape.

Lk = Min (Lk-4, Lk-3, Lk-2, Lk-1, Lk+1)

(3.6)

Hk = Min (Hk-4, Hk-3, Hk-2, Hk-1, Hk+1)

(3.7)

13

Top shape

Bottom shape

Figure 3.4: Type of shape pattern

Figure 3.4 is the ideal shape patterns. In actual practice, top and bottom shape patterns
are usually stick together as following figure.

Top shape
Bottom shape

Figure 3.5: Shape patterns in stock

Top/bottom shape patterns are fundamental of the algorithm. They help us to analyze
the stock in the following ways: (1) They are very effective indicators for buying and selling
points. (2) Between each pair of top/bottom or bottom/top shape, there is a trend. (3) Trend
and noise can be classified by the duration and strength between each pair of top/bottom shape
patterns.

14

3.3

Duration and Strength
The purpose of strength and duration is to further separate noise from trend. For each

trend, it begins and ends with a pair of top/bottom shape, which represents the two endpoints
of a trend. We define the number of the candlesticks between the shapes as trend duration.
D = Count (top current – bottom previous) or

(3.8)

D = Count (bottom current – top previous)

(3.9)

For example, the trend duration of the following downtrend is D = 5.

Strength

Duration
Figure 3.6: Duration and strength of a downtrend

Similarly, for each pair of top/bottom shape, the gap between them represents the
strength of the trend.
S = (Lk0_previous top - Hk0_bottom) / Hk0_bottom or

(3.10)

S = (Hk0_previous bottom - Lk0_top) / Lk0_top

(3.11)

If S < 0, it means the pair of top/bottom shape overlaps with each other, as illustrated
in Figure 3.7. High price of bottom shape overlaps with the low price of top shape. If 0 ≤ S <
1.5%, in this situation, we consider the trend is weak. Otherwise, if S ≥ 5%, the trend is strong.
15

Figure 3.7: Overlap of Top/Bottom Shape

Noise and volatile features in a stock trend forecast are major challenges because they
hinder the extraction of useful information [28]. To address this problem, a valid trend should
meet the following requirements in our framework. Otherwise, they will be considered as noise
and eliminated by LSA.
D > 4 (after merge operation)

(3.12)

S > 1.5% (after merge operation)

(3.13)

The reason for D > 4 is because the granularity of prediction should be higher that day.
The input feature is calculated by daily trading data. Therefore, the prediction level should be
weekly trend. And 5 days equal to a week in stock market. Similarly, supposed the transaction
fee of each buy/sell operation is 1.5%, then strength should be greater than that. Otherwise,
there will be no profit. Both duration and strength can be adjusted according to the
characteristic of a specific stock. Figures 3.8 and 3.9 compare the trend line segment in two
difference duration values. Strength follows the same way as duration. For more information
on how strength will affect the algorithm, please refer to appendix.

16

Figure 3.8 How Duration affects LSA: D=3

Figure 3.9 How Duration affects LSA: D=4

17

3.4

Procedure of LSA
The process of LSA algorithm is as follow:
1.

Combine the consecutive K lines which are in contain relation

2.

Calculate all shape patterns, including top / bottom shape patterns

3.

Delete the shape patterns fail to meet the duration and strength criteria

4.

Classify the turning points and the false alarms

5.

Eliminate those repeated invalid shape patterns in consecutive order

6.

Connect the turning points with line segments

Figure 3.10: Flow diagram of LSA

18

3.5

Turning Point and False Alarm
After the step 3 from above diagram, we will get the shape patterns. If the stock trend

rises (or falls) to a certain point and begins to fall (or rises), that point is turning point. In
another word, for a consecutive of same shape patterns, the last one is the turning point.
Turning point = the last one of consecutive top/bottom shapes

(3.14)

Everyone wants to buy or sell stocks at turning points to maximize profits. However,
those turning points shown in figure are actually theoretical points. In actual practice, we
usually buy or sell in the next candlestick. Since the top/bottom shape will not appear until the
next candlestick is shown. That means we never know whether it is a top/bottom shape until it
reveals itself. We call this One-Day-Delay (ODD). With ODD, the trading signal is delayed
by one day (or 30-minute, depends on the data granularity) so that we could determine whether
the turning point will show up or not. The purpose of ODD is to avoid the prediction error.
In fact, even a turning point is formed. Sometimes, it may turn out to be a mistake as
new shape patterns are generating. The reason for that is when the shape pattern, which is
different from the previous one, appears after 5 valid candlesticks, the trend is considered to
be ready to complete at any time. However, its complete state can be broken by another new
top/bottom shape pattern showing up afterwards. The trend returns to its incomplete state again.
We call this phenomenon as false alarm (smaller dots in white).
False alarms = all shape patterns between two consecutive turning points

19

(3.15)

Figure 3.11: Turning points and false alarm

False alarm is bad in our model. What make the prediction difficult is that we can only
make a judgment base on the data in the past. That is, the data in the left of time axis. So, false
alarm is impossible to eliminate. You never know whether this shape pattern is false alarm or
not, until you get another shape pattern. It is tempting to use future data to make a prefect
prediction model, but it is not practical. Therefore, we classify the shape patterns into two
groups: turning point and false alarm. We aim to apply deep learning models to help us identify
false alarm from turning point and improve the overall prediction performance.
Our prediction model is quite straightforward. After de-noising by LSA, the data will
be greatly compressed. Instead of feeding the whole original data into the model, we only input
those features of the shape patterns. Each line segment is composed of more than 5 K line.
That means the size of the data fed to the model is less than one fifth of the original data, which
can greatly improve its efficiency.

20

3.6

Feature Selection and Processing
Features are essential to the performance. Previous studies were mostly conducted by

feeding daily trading data, such as open price or close price, as input. That is one of the reasons
why those models fail to develop a high accurate model. To avoid “rubbish in, rubbish out”,
features are chosen based on the following criteria: (1) Only those trend-related technical
indicators will be selected, and (2) All selected technical indicators are converted into trend
deterministic data.
To increase the amount of information of input features, besides the common technical
analysis indicators, this report also adds some relatively novel technical analysis indicators
based on the LSA algorithm. such as SAM_SHAP, SAM_NUM and TBS. Following table is
the description in details.
Technical indicator

Description

RSI

Momentum indicator that determines whether overbought or oversold

K_SO

Measures the level of close price relative to the previous turning point

R_WIL

Judges the trend of short-term market behavior

DIF

Subtraction of quick and slow moving-average, it reflects the price trend

DEA

Oscillator that fluctuates above and below the zero line

MACD

Displays trend following characteristics and momentum characteristics

CHG_RATE

Percentage of price change compared to the last turning points

SAM_SHAP

Identifies whether the shape pattern is the same as the previous one

SAM_NUM

Calculates the number of same consecutive shape patterns

TBS

Identifies top shape pattern, bottom shape patterns and false alarms
Table 3.1: Selected technical indicators and their description

After the technical indicators are calculated, we go one more step further by converting
continuous-valued inputs to discrete trend deterministic values “0” or “1”, where “0” means
down trend and “1” indicates up trend. This conversion is based on the fact that each
21

continuous value, when compared with its previous one, indicates the future up or down trend.
Since we have to find out the co-relation between the input trends and output trend, it would
be easier with trend deterministic data. Take price change rate as an example, it is a better
momentum-based indicator rather than price itself.
In fact, each input parameters in its discrete from indicates a possible up or down trend
determined based on its inherent property. Such as, RSI is a technical curve based on the ratio
of the rise and the fall in a certain period. It can reflect the prosperity of the stock markets, and
generally used for identifying the overbought and oversold points. It ranges between 0 and 100.
If RSI is over 70, the stock is overbought. It may go down in near future. Therefore, we set it
as “1”. If it is below 30, the stock is oversold, we set it as “0”. For the values between 30 and
70, if RSI at time “t” is greater than that at time “t-1”, we set it as “1”, otherwise, we set it “0”.
The other technical indicators follow a similar process.
3.7

Process of Feature Extraction
The whole process of feature extraction is shown as the chart. First, original data go

through data preprocessing by LSA, then all technical indicators are calculated. Stock trends
are generated based on our new definition. All the continuous technical indicators are
converted to discrete values and fed into deep learning network.

22

Figure 3.12: Process of feature extraction

In short, there are mainly three steps in our feature engineering process:
(1) Merge operation is applied to eliminate the low level noise while strength and
duration are used to get rid of high level fluctuations.
(2) Then, the model is trained with daily data to predict a week trend with duration
greater than 5, which ensure the data resolution is high enough.
(3) Last, the model predicts trend instead of price with those trend deterministic discrete
data rather than continuous data.
In summary, LSA effectively eliminates the noise and fluctuations. It simplifies the
prediction problem by identifying the turning points of stock trend. After the data are
preprocessed by LSA, only those data of shape patterns are needed to feed into the model.

23

Chapter 4: Experiment Design
The outputs of LSA are the top/bottom shape patterns. Random Forest or LSTM is
applied to separate false alarm from turning point. RF is a simple but efficient algorithm for
classification problem, while LSTM is famous for its internal memory mechanism in time
series prediction. The prediction accuracy of both models will be compared and analyzed. We
will present the details regarding how we obtain the predicted value and evaluate the
performance of each model.
4.1

Data Source
To ensure that the data is representative, our data is composed of 6 stock indices from

3 different markets, more than 20 years of data. All data is obtained from yahoo finance. The
data is divided into training set and test set. To predict future stock trend, only those data from
the past can be used.
Index

Duration

Region

1

Dow

1992 – 2021

United States

2

Nasdaq

1971 – 2021

United States

3

S&P 500

1992 – 2021

United States

4

HIS

1987 – 2021

Hong Kong

5

SSEC

1997 – 2021

China

6

SCI

1991 - 2021

China

Table 4.1: Source of dataset

In regard to the prediction data labeling, all top/bottom shape patterns were classified
into two classes: turning points and false alarms. Our goal is to predict whether the shape
pattern is a turning point or false alarm. Based on this knowledge, we are solving a
classification problem. In this report, the label is defined as:
24

1, if shape pattern is turning point
0, otherwise, false alarm
4.2

Model 1: LSTM
LSTM has the capability of learning long-term dependencies. It is popular in language

translation, since it can effectively extract feature dependencies form the content in the
previous sentences. Actually, financial time series is similar to language. The current stock
price not only depends on the price of yesterday, but also highly relates to those prices before
yesterday.
4.2.1

Model Structure
All features of historical data must be sliced into a predefined size sample. Window

size means that the amount of data covered by sliding window for each row of training set. Its
size can be adjusted as needed. The default size is 6 consecutive shape patterns for training and
1 for prediction. If we convert the shape patterns into days, that is more than 30 day’s trading
data is applied to predict the price movement 5 days later.

Figure 4.1: Slide window

25

In our model, we applied stacked LSTM structure. Studies [29-30] indicated that the
depth is the key to many challenging prediction problems. Increasing the depth of the network
benefits the model with fewer neurons, less time and more accurate in learning the features.
LSTM is easy to fall into overfitting. Dropout is used as a regularization technique. It
prevents overfitting by randomly ignoring certain nodes in a layer during training. In our model,
there are 4 LSTM layers. A dropout layer is attached to each of the LSTM layer.

Figure 4.2: Model structure

4.2.2

Experiment Result
Our algorithm works well. It successfully predicts the movement of stock. The test set

is 28 shape patterns by default. We also apply different prediction size to see how it affects the
accuracy. The size of test data ranges from 39 to 80; the accuracy drops down from 69.23% to
63.75%. Figure 4.3 shows one of the prediction results. “1” means turning point, while “0”

26

means false alarm. For more information about the accuracy in different test size, please refer
to the appendix.

Figure 4.3 Accuracy 66.05% at test data size of 53

The average accuracy is about 66.04%. Compared with those existing models, our
result is slightly better.
Model

Accuracy (%)

1

Simplistic Model

54.82

2

SVM

61.98

3

LSTM

65.05

4

CNN

59.34

5

Multiple Pipeline Model

63.30

6

NFNN

65.93

7

Our Proposed Model

66.04

Table 4.2: Comparison with existing models*

(Cited from [31] for comparison)

The following figure is generated to visualize the comparison of real trend and predict
trend. As we can see, most of the predict values are close to “0” or “1”, but there are still some
in between. That because the output dense layer originally generates continuous values. So, a
classification process is applied to the output of dense layer with a threshold.

27

Figure 4.4: Predict result before classification

The output of LSTM is the probability value of the classification result. Usually, the
threshold is 0.5 by default. When the output probability value is greater than 0.5, it is
considered as turning point. Otherwise, it is false alarms. In actual practice, the thresholds can
be adjusted in according to the characteristics stock and the investment strategies.

Figure 4.5: Predict result after classification

Figure 4.5 shows the final result after classification with threshold value of 0.5. The
accuracy rate is calculated by the number of correct predictions divided by the total number of
test data size.
After the accuracy in different prediction size, we also analyze how the number of
epochs affects prediction results. When the number of epochs is less than 50, all the prediction
results concentrate around the center. There is almost no gap between the prediction values.

28

Obviously, the model is underfit. As the number increase, the gap begins to go wider. As the
figure shows, the loss curve begins to converge when the number beyond 1800.

Figure 4.6 Loss curve

Following is a comparison of single and multivariate prediction results. There are 10
input features in our model, we also tried to use only one feature instead. As the result shows,
the prediction value centralizes between 0.2 and 0.5. It seems that the model is under fit. We
found that as the number of feature increases, the gap becomes wider. Therefore, we conclude
that the more features we draw from the data; the less training data we might acquire.

Figure 4.7: Prediction result with univariate input feature

29

Figure 4.8: Prediction result with multivariate input features

We evaluated the performance of the model by different layers. As shown in the
following table, the more layers, the better accuracy it can reach. Each additional layer benefits
the accuracy, but the improvement rate decreases as the number of layer increase. While it is
not theoretically clear what is the additional power gained by the deeper architecture, it was
observed empirically that deep RNNs work better than shallower ones on some tasks [32].
Items

Number of layers
1

2

3

4

Accuracy (%)

57.79

65.47

69.03

70.12

Improvement (%)

N/A

13.29

5.44

1.60

Table 4.3: Comparison of single layer and stacked structure

The focus is also to compare the performance of LSTM when the inputs are represented
in the form of real continuous values and discrete trend deterministic data. It can be seen from
the results that the accuracy rate is good when the model is trained by the continuous-valued
input. However, when the discrete deterministic data is applied to the model, the performance
is further improved. It is not a big change, but still noticeable. The reason behind the improved
performance is discrete deterministic data help the model classify the inherent trend that reach
technical indicator shows.
30

Discrete values

Continuous values

Accuracy (%)

68.92

64.11

Time

27m42s

33m57s

Table 4.4: Comparison of continuous and discrete values

4.2.3

Positive Findings and Limitations
In summary, there are three positive findings: (1) Multivariate is better than single

feature, especially when the size of data is relatively small. (2) Stacked LSTM model has
superior performance on extracting hidden features than single layer model. (3) In financial
time series, discrete input is more suitable for trend prediction, while continuous value is
appropriate for price prediction.
However, based on the prediction result, there is not much difference between our
proposed model and the other LSTM models. Does it mean that our LSA make no noticeable
contribution to our framework? The answer to that is LSA greatly decreases the size of training
data results in data insufficiency. Take Nasdaq as example, there are 22 years of data. That is
12758 daily trading data, which seems to be a decent amount. However, after the data process
by LSA, only the shape patterns are left. The number of data reduced to 1831.
LSTM requires a relatively big amount of data to make it performs well. There are
approximately 18,000 parameters in our model. But the size of training data is only 1831. Even
we apply multivariate to expand the data, but that is just barely enough to make the model
works. This also explains why we use multivariate in our model.

31

4.3

Model 2: Random Forest
Unlike neural network which requires large data to train, Random Forest works well

even with small to medium data. That would be a great advantage in our situation. It is an
ensemble supervised machine learning algorithm which uses multiple decision trees in
aggregate to help make more stable and accurate predictions. It is widely used because of its
flexibility, simplicity, and often quality results.
4.3.1

Model Evaluation
In our report, two-class classification is applied to evaluate the prediction performance.

As shown in table 4.5, tp, fp, fn and tn mean true positive, false positive, false negative and

Positive

TP

TN

Negative

Actual

true negative, respectively.

FP

FN

Positive

Negative

Predicted
Table 4.5: Confusion matrix for two-class classification

To get a more detailed overview of how the model performed, we present a
classification report that computes the Accuracy, Specificity, Precision, and Recall.
Accuracy measures the portion of all testing samples classified correctly:
=

(

+

+
)−(

−

)

Recall measures the ability of a classifier to correctly identify positive labels:
32

=

+

Specificity measures the classifier’s ability to correctly identify negative labels:
=

+

Precision measures the proportion of all correctly identified samples in a population of
samples which are classified as positive labels:

4.3.2

Experiment Result

=

+

The accuracy_score function calculates the accuracy, which is defined as the number
of accurate predictions the model made on the test set. The accuracy of our model is around
70.12%, which is pretty good.

Figure 4.9: Accuracy result*

_____________________________________________________________________________________
* Acknowledgement: part of the code was cited and modified from Alex Reed “sigma_coding_youtube”:
https://github.com/areed1192/sigma_coding_youtube/blob/master/python/python-data-science/machinelearning/random-forest/random_forest_price_prediction.ipynb

33

Accuracy is an important evaluation indicator. However, it may not be suitable for an
imbalanced dataset [33-34]. The precision is intuitively the ability of the classifier not to label
as positive a sample that is negative. The best value is 1, and the worst value is 0. If the accuracy
is high, it means our model is correctly classifying items. In some cases, we will have models
that may have low precision or high recall.
precision

recall

f1-score

support

False Alarm

0.736585

0.762626

0.749380

198.000000

Top/Btm Shape

0.649254

0.617021

0.632727

141.000000

accuracy

0.702065

0.702065

0.702065

0.702065

macro avg

0.692920

0.689824

0.691053

339.000000

weighted avg

0.700262

0.702065

0.700861

339.000000

Table 4.6: Classification report

It is difficult to compare two models with low precision and high recall. Vice versa. In
order to make results comparable, a metric called the F1-Score is applied. It uses Harmonic
Mean in place of Arithmetic Mean by punishing the extreme values more, which helps to
measure recall and precision at the same time. Following is a confusion matrix to better
visualize our result.

34

Figure 4.10: Confusion matrix output

Compared to the validation set, the Out-of-Bag error score uses a sample of data which
was not necessarily used during the model’s analysis. Therefore, the OOB sample set is a more
random and more variate than the validate set. In another word, the OOB score usually has a
relatively smaller accuracy score. Above is the random forest OOB error score for our model.
It is slightly bigger than the accuracy score.

Figure 4.11: Out-Of-Bag error score

The chart is to visualize the feature importance. By this way, we can see how much
each feature contributes to the overall performance.

35

Figure 4.12: Feature importance chart

36

Chapter 5: Conclusion
In this report, we presented a stock prediction framework based on a novel de-noise
algorithm and deep learning. First, we developed LSA to identity the top/bottom shape patterns.
Then, the features of shape patterns are fed to LSTM and Random Forest respectively to
identity turning points and false alarms.
Although actual potential performance of LSA on LSTM is still unclear since the data
is insufficient and a lower level of data (such as 30-min) is difficult to get from public, the
experiment results of Random Forest shows that LSA plays a significant role in improving the
stock trend prediction performance. Compared to the other mathematics approaches, such as
Fourier Transform, Wavelet Transform and Principal Component Analysis, LSA is more like
a de-noise method from a financial perspective. It not only efficiently eliminates the noise, but
also greatly shrinks the size of data. Furthermore, this framework provides an alternative way
for stock analysis, which might benefit the traditional prediction models. It is also applicable
to the other time series, such as futures, options, temperature etc.
Certain parameters of the algorithm affect the accuracy of the classification. For
example, the values of strength and duration have great impacts on the shape patterns.
Meanwhile, the up trend and down trends are actually inconsistent in strength and duration.
Instead of using the same set of strength and duration criteria, it would be better to divide it
into two sets of values. Therefore, in future work, the optimization of LSA parameters can be
considered as the starting point for further improve the performance of the algorithm.

37

Bibliography

[1]

Ji, Xuan, Jiachen Wang, and Zhijun Yan. "A stock price prediction method based on deep
learning technology." International Journal of Crowd Science 5.1 (2021): 55-72.

[2]

Shynkevich, Yauheniya, T. Martin McGinnity, Sonya A. Coleman, Ammar Belatreche,
and Yuhua Li. "Forecasting price movements using technical indicators: Investigating
the impact of varying input window length." Neurocomputing 264 (2017): 71-88.

[3]

Bao, Wei, Jun Yue, and Yulei Rao. "A deep learning framework for financial time series
using stacked autoencoders and long-short term memory." PloS one 12.7 (2017):
e0180944.

[4]

Cervelló-Royo, Roberto, Francisco Guijarro, and Karolina Michniuk. "Stock market
trading rule based on pattern recognition and technical analysis: Forecasting the DJIA
index with intraday data." Expert systems with Applications 42.14 (2015): 5963-5975.

[5]

Sezer, Omer Berat, and Ahmet Murat Ozbayoglu. "Algorithmic financial trading with
deep convolutional neural networks: Time series to image conversion
approach." Applied Soft Computing 70 (2018): 525-538.

[6]

Hoseinzade, Ehsan, and Saman Haratizadeh. "CNNpred: CNN-based stock market
prediction using a diverse set of variables." Expert Systems with Applications 129 (2019):
273-285.

[7]

Ma, Yilin, Ruizhu Han, and Xiaoling Fu. "Stock prediction based on random forest and
LSTM neural network." 2019 19th International Conference on Control, Automation and
Systems (ICCAS). IEEE, (2019): 126-130.

[8]

Booth, G. Geoffrey, Teppo Martikainen, Salil K. Sarkar, Ilkka Virtanen, and Paavo YliOlli. "Nonlinear dependence in Finnish stock returns." European Journal of Operational
Research 74, no. 2 (1994): 273-283.

[9]

Pai, Ping-Feng, and Chih-Sheng Lin. "A hybrid ARIMA and support vector machines
model in stock price forecasting." Omega 33.6 (2005): 497-505.

[10] Adebiyi Ayodele, A., Adewumi Aderemi, O. and Ayo Charles, K. "Comparison of
ARIMA and Artificial Neural Networks Models for Stock Price Prediction. Journal of
Applied Mathematics", (2014): 1-7.
[11] Le Linh, and Ying Xie. "Recurrent embedding kernel for predicting stock daily
direction." 2018 IEEE/ACM 5th International Conference on Big Data Computing
Applications and Technologies (BDCAT). IEEE, (2018): 160-166.
38

[12] Lin, Yuling, Haixiang Guo, and Jinglu Hu. "An SVM-based approach for stock market
trend prediction." The 2013 international joint conference on neural networks (IJCNN).
IEEE, (2013): 1-7.
[13] Kara Yakup, Melek Acar Boyacioglu, and Ömer Kaan Baykan. "Predicting direction of
stock price index movement using artificial neural networks and support vector machines:
The sample of the Istanbul Stock Exchange." Expert systems with Applications 38.5
(2011): 5311-5319.
[14] Niu, Tong, Jianzhou Wang, Haiyan Lu, Wendong Yang, and Pei Du. "Developing a deep
learning framework with two-stage feature selection for multivariate financial time series
forecasting." Expert Systems with Applications 148 (2020): 113237.
[15] Göçken, Mustafa, Mehmet Özçalıcı, Aslı Boru, and Ayşe Tuğba Dosdoğru. "Integrating
metaheuristics and artificial neural networks for improved stock price prediction." Expert
Systems with Applications 44 (2016): 320-331.
[16] Han, Min, Shuhui Zhang, Meiling Xu, Tie Qiu, and Ning Wang. "Multivariate chaotic
time series online prediction based on improved kernel recursive least squares
algorithm." IEEE transactions on cybernetics 49.4 (2018): 1160-1172.
[17] Siripurapu, Ashwin. "Convolutional networks for stock trading." Stanford Univ Dep
Comput Sci 1.2 (2014): 1-6.
[18] Patel Jigar, Sahil Shah, Priyank Thakkar, and Ketan Kotecha. "Predicting stock and stock
price index movement using trend deterministic data preparation and machine learning
techniques." Expert systems with applications 42.1 (2015): 259-268.
[19] Khaidem Luckyson, Snehanshu Saha, and Sudeepa Roy Dey. "Predicting the direction
of stock market prices using random forest." [J]. Computing Research Repository,
3(2016): 1605 – 1624
[20] Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural
computation 9.8 (1997): 1735-1780.
[21] Chung Junyoung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. "Empirical
evaluation of gated recurrent neural networks on sequence modeling." arXiv preprint
arXiv:1412.3555 (2014).
[22] Fischer, Thomas, and Christopher Krauss. "Deep learning with long short-term memory
networks for financial market predictions." European Journal of Operational Research
270.2 (2018): 654-669.

39

[23] Sirignano, Justin, and Rama Cont. "Universal features of price formation in financial
markets: perspectives from deep learning." Quantitative Finance 19.9 (2019): 1449-1459.
[24] Luo, Linkai, and Xi Chen. "Integrating piecewise linear representation and weighted
support vector machine for stock trading signal prediction." Applied Soft Computing
13.2 (2013): 806-816.
[25] Chang, Pei-Chann, Chin-Yuan Fan, and Chen-Hao Liu. "Integrating a piecewise linear
representation method and a neural network model for stock trading points prediction."
IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)
39.1 (2008): 80-92.
[26] Guo, Yanhui, Siming Han, Chuanhe Shen, Ying Li, Xijie Yin, and Yu Bai. "An adaptive
SVR for high-frequency stock price forecasting." IEEE Access 6 (2018): 11397-11404.
[27] Cao, Jiasheng, and Jinghan Wang. "Stock price forecasting model based on modified
convolution neural network and financial time series analysis." International Journal of
Communication Systems 32.12 (2019): e3987.
[28] Wang, Baohua, Hejiao Huang, and Xiaolong Wang. "A novel text mining approach to
financial time series forecasting." Neurocomputing 83 (2012): 136-145.
[29] Hermans, Michiel, and Benjamin Schrauwen. "Training and analysing deep recurrent
neural networks." Advances in neural information processing systems 26 (2013): 190198.
[30] Pascanu, Razvan, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. "How to
construct deep recurrent neural networks." arXiv preprint arXiv:1312.6026 (2013).
[31] Hao, Yaping, and Qiang Gao. "Predicting the trend of stock market index using the
hybrid neural network based on multiple time scale feature learning." Applied Sciences
10.11 (2020): 3961.
[32] Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with
neural networks." Advances in Neural Information Processing Systems, (2014): 31043112.
[33] Kara, Yakup, Melek Acar Boyacioglu, and Ömer Kaan Baykan. "Predicting direction of
stock price index movement using artificial neural networks and support vector machines:
The sample of the Istanbul Stock Exchange." Expert systems with Applications 38.5
(2011): 5311-5319.
[34] Shi Lukui, Qin Zhijiao, Yan Huiqiang. "Stock turning point prediction method based on
minimum variance.", Application Research of Computers. Vol.34 No.11. 11(2017):
3373-3378
40

Appendix

How strength affects LSA: the one above S = 1.5%; the one below S = 1.0%

41

Test Data: 39, Accuracy: 69.23%

Test Data: 66, Accuracy: 65.15%

Test Data: 80, Accuracy: 63.75%

42