--
When it comes to time series prediction the reader (the listener, the viewer…) starts thinking about predicting stock prices. This is expected to help to determine when to sell and when to buy more. Sometimes we see papers that describe how one can do this. Paper [1] provides an example here, the authors even provide some results. However, the “Deep Learning with Python” book by Chollet emphasizes that one should not try to use time series prediction techniques to predict stock prices. Chollet explains it in a way, that in the case of a stock market, the data about the previous state is not a good basis to estimate the future state. In paper [3] the authors even conclude that stock price is a martingale and, therefore, the best estimate of the future price (in terms of estimation error) is the current price.
So, is it possible to use a neural network to predict stock prices?
Disclaimer: this theoretical overview reflects my own knowledge of the subject, so it may use incorrect terms, be entirely incorrect and so on. So if you know more than me, you may die laughing. I have warned you.
What is a share? A share is a document that testifies the holder’s right to claim a part of the company’s profits. This implies that the share’s price should depend on the company’s profits. Moreover, the share’s price depends not on the exact company’s profits, but expected profits. This means that the share’s price represents the market traders’ opinion about future profits. And opinions may be wrong. We all remember the stories of startups that cost much but eventually appeared to provide nothing revolutionary and then lost their market price almost entirely. Therefore we can conclude that stock prices depend on the subjective opinion of the market traders.
Consider the figure below. The figure depicts the plot of share’s price of Maersk company. As one can see, one share cost 7718 DKK on the 2nd of April, 2019. The next day the price was 7750 DKK per share. What was the reason? We can see a small capital D
letter at the bottom of the plot. This letter means that the company pays dividends this day, and, apparently, the dividends were large enough to surge the demand. So, the upcoming event can cause price growth.
Now consider another plot. This plot shows share prices for Yandex. These are the days when we have heard rumours that one of the banks was going to take over Yandex. Usually, amid such rumours share prices grow, since this means the buyer is going to buy shares from the market, thus increasing demand. This time investors decided that these are not good news.
We can make a simple conclusion here: share price depends mostly on the opinion of traders about the company’s future, and not on the previous price itself. Therefore there is no sense in predicting future stock prices using previous values.
We should predict something using values the target depends on, or, at least, correlates with. In the case of stock prices, one has to take into account events that are external to the market. Probably, it would not be possible to predict such events using a neural network. The fact that more traders went bankrupt than became billionaire tells us that a human is not often able to tell the future. To know more about predicting unpredictable, read “The Black Swan” book by Nassim Nicholas Taleb.
In theory, the theory and the practice are highly interconnected, but in practice, they are not. Here we are going to try predicting something and see what happens.
We are going to train a neural network that will predict (n+1)-th price using n known values (previous prices). We assume that the time between two subsequent price measurements is constant. First of all, we need the dataset. We can take stock prices at Yahoo Finance.
We will predict daily prices, which means that a day is represented in the dataset with a single value. We will predict the close price using close prices for several previous days. We will use Maersk as the test company.
We will get the data using the yfinance
Python package. We should take into account that Yahoo may change their API, so the package may stop working unexpectedly. This has already happened at least once, so we have to be prepared for other changes. So, let’s install the package:
pip install yfinance
For more info on how to use the package see here. Now, let’s contact the market:
import yfinance as yf# create the object that represents Maersk stock data
# here MAERSK-B.CO -- is the Maerks's ticker
maersk = yf.Ticker('MAERSK-B.CO')
We haven’t downloaded any data yet, we have only created the object that we can use to request the data. Yahoo Finance provides dividend information for Maersk and, as we have already seen, dividends affect stock prices. Therefore we want the neural network to take dividends into account when it predicts the prices. This means that when we tell the network to predict the close price for a particular day using a set of prices for the previous days we also need to provide it with a marker that tells whether dividends are paid that day. To get the dates when the dividends are paid, check the maersk.dividends
property. To get the share prices we call the history
method. The method takes several arguments, and we are especially interested in period
and interval
.
The period
argument defines the period for which we request the data. The argument supports some predefined string values, and we will use one of them. We pass string ’max’
which tells it to give us all data available: starting from the first day shares became available at the market until today. Using start
and end
argument one can define the exact period. However, since we will use all data available, we will be using the period
argument and pass ’max’
.
The interval
parameter tells the method the interval between two subsequent values. It takes one of the predefined values and we will pass ’1d’
there since we are going to use daily prices.
You can read more on the history
method and its arguments here.
So, it’s time to get some data!
history = maersk.history(period='max', interval='1d')
Now the history
variable holds a pandas’ DataFrame with the prices. Let’s look at them:
It’s time to prepare the data. When designing a neural network to predict time series, one should decide how many inputs the network will have. In our case, we have to choose the number of prices fed into the network to predict the next one. Since we do not know this number now, it is better to be able to generate datasets with different amount of inputs. Fortunately, Keras developers have already thought about that and now Keras provides a generator for time series that can generate datasets with different amount of inputs. In the case of time series prediction, both input and target values are drawn from the same series. Which means that we use the sliding window of size j
, where j
is the number of values we use to predict (j+1)
-th value. In other words, we take j
subsequent elements ({x₁, x₂, ... xⱼ}
) of the time series, then we take the (j+1)
-th element (x₍ⱼ₊₁₎
) and set it as the target value. The pair (j
values, (j+1)
-th value) makes a single training example. To make another training example, we move the sliding window by one, and use {x₂, x₃, ... x₍ⱼ₊₁₎}
as inputs and x₍ⱼ₊₂₎
as the target value.
Keras provides us with TimeseriesGenerator class, and we will use this class to generate the training set. The only difficulty here is that we also want the network to take dividends into account. Therefore we have to write a function that uses TimeseriesGenerator
class to generate the training set and then enriches the generator’s output with the information about the dividends.
def generate_series(data, value_num):
close = data['Close']
dividends = data['Dividends']
tsg = TimeseriesGenerator(close, close,
length=value_num,
batch_size=len(close))
global_index = value_num
i, t = tsg[0]
has_dividends = np.zeros(len(i))
for b_row in range(len(t)):
assert(abs(t[b_row] - close[global_index]) <= 0.001)
has_dividends[b_row] = dividends[global_index] > 0
global_index += 1
return np.concatenate((i, np.transpose([has_dividends])),
axis=1), t
The function takes two arguments: the dataset we want it to process (the data
argument) and the number of input values the series should have (the value_num
argument).
As you know, neural networks are trained using Gradient Descent that employs the gradient of the cost function. The easiest approach assumes that we compute the cost function gradient using the entire dataset. However, there are downsides here. Firstly, the dataset maybe extremely large, which will make it very time consuming to compute the gradient. Secondly, if the dataset is extremely large, then the gradient value can also be extremely large, so large, that it simply does not fit into the machine precision. The second issue is, of course, usually important in extreme cases (slight pun intended). Some smart people have pointed out that we do not actually need the exact gradient value[4]. We only need its estimate that determines which direction we should move to minimize the cost function. Therefore we can estimate the gradient using a small subset of the training examples. Of course, we will eventually walk through the entire dataset, but there is no need to compute the gradient for the entire dataset at once. We can divide the dataset into several subsets called batches and process only a single batch at a time. We update the network’s weights using the gradient computed for a single batch. Once we have processed all batches, we can say we have run a single training epoch. Within a single training session, there might be more than one epoch, the exact number of epochs depends on the task. The same smart people emphasize that the training examples must be shuffled[4]. That means that a pair of subsequent training examples must not belong to the same batch.
Let’s test the function and generate a dataset that uses four input values.
inputs, targets = generate_series(history, 4)
Let’s look at a single example.
# print(inputs[3818])array([1.246046e+04, 1.232848e+04, 1.244496e+04, 1.274000e+04,
1.000000e+00])
As we can see, a training example is a vector with four prices and an additional fifths value that indicates whether dividends are paid that day. Note that values are relatively large. Indeed, the close price ranges from 767.7 to 12740.0 Neural networks do not work well with such ranges, so we have to normalize the data. We will use the simplest normalization strategy, MinMax normalization.
h_min = history.min()
normalized_h = (history - h_min) / (history.max() - h_min)
Since we have modified the initial data, we have to re-generate the dataset.
inputs, targets = generate_series(normalized_h, 4)
Let’s look at the normalized data.
# print(inputs[3818])array([0.9766511 , 0.96562732, 0.97535645, 1. , 1. ])
As we can see, the values now range from 0 to 1. That makes the task easier. However, we now have to keep h.min()
and h.max()
so that we can normalize network inputs when we predict the prices and to denormalize its output to get the exact value.
Finally, it’s time for neural networks. The network will have (n+1)
inputs, n
for prices and one for dividend indicator, and one output. We still need to determine n
. For this, we will write a function that creates a neural network with a specified number of inputs. We use input_shape=(n+1,)
expression to include the dividend indicator.
def create_model(n):
m = models.Sequential()
m.add(layers.Dense(64, activation='relu', input_shape=(n+1,)))
m.add(layers.Dense(64, activation='relu'))
m.add(layers.Dense(1))
return m
Before training a network, we divide the dataset into two parts: train and test sets. We are going to use the training set to train the network and the test set to test the network performance on the unknown data. We will never use examples of the test set while training the network.
train_inputs = inputs[:-1000]
val_inputs = inputs[-1000:]train_targets = targets[:-1000]
val_targets = targets[-1000:]
Let’s write one more function. This function will help us decide how many inputs the network should have. This function takes the number of inputs to check the number of epochs to train for. The function will create a network, prepare data for it, then train the network and evaluate its performance on the test set.
def select_inputs(data, start, end, epochs):
models = {}
for inputs in range(start, end+1):
print('Using {} inputs'.format(inputs))
model_inputs, targets = generate_series(data, inputs) train_inputs = model_inputs[:-1000]
val_inputs = model_inputs[-1000:]
train_targets = targets[:-1000]
val_targets = targets[-1000:]
m = create_model(inputs)
print('Training')
m.compile(optimizer='adam', loss='mse')
h = m.fit(train_inputs, train_targets,
epochs=epochs,
batch_size=32,
validation_data=(val_inputs, val_targets))
model_info = {'model': m, 'history': h.history}
models[inputs] = model_info
return models
Now, let’s train networks with 2 to 10 inputs for 20 epochs:
trained_models = select_inputs(normalized_h, 2, 10, 20)
When the training is done, we can get a short summary with the following code:
model_stats = {}
for k, v in trained_models.items():
train_history = v['history']
loss = train_history['loss'][-1]
val_loss = train_history['val_loss'][-1]
model_stats[k] = {'inputs': k, 'loss': loss, 'val_loss': val_loss}
Printing the model_stats
values we are able to see the summary:
{2: {'inputs': 2,
'loss': 6.159038594863468e-05,
'val_loss': 0.0006709674960002303},
3: {'inputs': 3,
'loss': 7.425233190960614e-05,
'val_loss': 0.00021176348975859583},
4: {'inputs': 4,
'loss': 7.471898652647588e-05,
'val_loss': 0.00022580388654023408},
5: {'inputs': 5,
'loss': 8.866131339595126e-05,
'val_loss': 0.00027424713294021784},
6: {'inputs': 6,
'loss': 7.322355930846842e-05,
'val_loss': 0.0003323734663426876},
7: {'inputs': 7,
'loss': 8.709070955596233e-05,
'val_loss': 0.0004295352199114859},
8: {'inputs': 8,
'loss': 8.170129280188121e-05,
'val_loss': 0.00024587249546311797},
9: {'inputs': 9,
'loss': 7.327485314296024e-05,
'val_loss': 0.0003118165017804131},
10: {'inputs': 10,
'loss': 8.064566193526276e-05,
'val_loss': 0.0003668071269057691}}
As we can see, the error computed with the test set is always slightly greater than the value computed for the train set. This means that the network handles known data (training examples) slightly better than unknown (test examples).
We can now plot the test error depending on the network’s input number.
import matplotlib.pyplot as pltval_loss = []
indices = []
for k, v in model_stats.items():
indices.append(k)
val_loss.append(v['val_loss'])plt.plot(indices, val_loss)
With the plot, we can see which network has shown the lowermost test error. The exact result may change with time depending on the amount of historical data available through Yahoo Finance.
There is one interesting observation. If one runs this script twice, then they are expected to receive different results. In other words, the lowermost test error is produced by different networks. Since the only difference between the networks is the number of inputs, then we can conclude that the test error does not depend on the number of inputs that much. This, in turn, supports the initial speculation that we won’t be able to predict stock prices with a neural network. Apparently, the network trains to ignore some of the inputs, concluding that the output does not depend on them.
Remember, that we have normalized the data. Now let’s compute the exact error for the networks.
close_min = history['Close'].min()
close_max = history['Close'].max()
for k in model_stats:
e = ((close_max - close_min) * model_stats[k]['val_loss'] + close_min)
print(k, e)
Output:
2 771.0400773414451
3 770.341964375037
4 771.6538168560887
5 771.9637314503287
6 770.3164239349957
7 771.5147973106168
8 778.0784490537151
9 779.7546236891968
10 770.8432766947052
Wow! The errors are very large. Even for the network that has shown the lowermost test error, the exact error is very large. Honestly speaking, I would not trust a network with such errors when deciding which shares to buy. Neither would I recommend others to trust it.
Now we can draw a plot that compares the exact prices and the predicted.
As one can see the graphs do not match that often.