Can We Predict Value of a Stock using Machine Learning?
Abhijit Roy · Follow
Published in ·
--
Stock value prediction is a major task in the current machine learning domain. Several approaches have been proposed to solve this problem. Most of them are based on Time Series analysis of a stock value, others are based on the news sentiments to be used to predict a particular stock value. The stock prediction problem is particularly very interesting to work on because there are several sides to this problem. The price value depends on people, sentiment, company performance, news, regulators, and market-making financial bodies like banks. As every mentioned factor is pretty much active at the same time and each contributes to shifting in price, they make it pretty hard to use a particular factor to account for the price change. I myself have tried an approach proposed in this article. In my current work, I am going to propose a much more robust method. Some of the ideas have been taken forward from my previous work with several modifications in order to overcome the flaws in my previous work. You may go through my previous in order to get a better understanding.
This is a complete research purpose-based approach. Please do not invest based on this algorithm.
The Market Indicators
In my current idea, I have tried to design an algorithm in such a way that takes into account how actual market indicators work. Most of us have a common perception that the stock market is very uncertain, it surely is, but somehow people study the markets and discover strategies. So, the question stands, how the actual traders study the markets? All thanks to sites like Investopedia and Zerodha, I discovered a few market indicators like Support, Resistance, Volatility Indexes. Now, one thing to know is that the indicators work according to the perception of the trader.
Let’s see some definitions:
Support: Support can be defined as a price level that acts as a floor by preventing the price to fall more downwards. Basically, at that level, there are buyers who buy and push back the markets.
Resistance: Similar to support, Resistance is a price level that acts as a ceiling that prevents the price to move up further. At that level, basically, the selling chips in and pushes down the stock price.
If support is broken it becomes resistance and vice versa.
Volatility indexes/Standard Deviations: We all know that here everything here is about money, so the risk is naturally limited. But to take a risk, we must have some expectations. Volatility indexes reflect expectations. Two volatility indexes define a range or a band between which the probable stock value may fall for the next time frame, based on the current time frame behavior. Normally, the volatility indexes are defined with respect to the moving average. The upper index according to standard practice is 2% of moving average + current moving average and the lower is the current moving average — 2% of moving average. So, this defines a band. If this band is broken, it is considered that there are large volume transactions, mostly taking place.
There are two types of support and resistance: long and short. This basically depends on the time limit under study.
Now, we have talked about a few live market indicators. One thing to keep in mind is, all these indicators are calculated for making an assumption of the future price at t+1 units based on the data we have from the last t units. Understanding this point is pretty important to understand the modeling concept used.
So, the idea is, if the real minds are using these indicators for predictions, why not use them in the modeling.
The Economic Indicators
The stock value prices are affected if the whole market is affected. Now, to record this change we can take into consideration the economic indicators like the Dow Jones Indicator, and the SnP 500 indicator. They are standard indicators of how the whole market economy is actually doing. These indicators can be downloaded from here.
The Trends
The stock market is dependent on public sentiments. Now, suppose a company has made a deal or launched a new product, so, this may shoot up the price of that company’s stock. This is simply because people are noticing the company better now. This can be easily noticed if we can get the trends of the company. Google offers these trends, which may serve as an indicator of the price. If people are talking about something more it is either very good or very bad. In either case, the price value changes. The direction of the change can be well correlated with news sentiments.
News Sentiments
Again, as I have mentioned earlier, News plays a huge role in stock price variations. Because it creates sentiment. If there is positive news about a company, its price rises due to obvious reasons. The financial news about a stock is available at sites like Finviz and Financial Times. If we can obtain news sentiments, we can use them easily.
Correlated Stocks
If we try to predict a particular stock value, there are several other stocks that are superbly correlated with the considered stock value. If we can take the top 10 or 5 stocks with high correlations, they may serve as good indicators.
Now, we have talked about some points, let’s get into some more details:
Stock Prediction as a Time-Series data
Stock markets are operated by traders in different ways. Some operate on an hour-wise basis, some on 5 minutes basis, some on a daily basis. In this work, I have worked on a daily basis. So, we have to predict the day’s closing price, the adjusted close for the t+1 day. Now, we have opening price value, closing price value, adjusted close, volume, high and low values of the stock for the t-th day. So, if we have the value of the t-th unit and we want to predict the t+1th unit, we use time-series analysis. We use Recurrent Neural Networks for time series analysis.
Incorporating Features
All the features we talked about above like the indicators, trends, and sentiment except the correlated stock prices are already ready for the t+1 th day only. The trends, economic indicators, and news can be scraped and ready for before even the market opens on the t+1 th day. For the Correlated stocks, we can use time series analysis on each of them individually and generate the t+1 th day’s price.
All these features are ready for the t+1 th day value. These features affect the value predicted as the time series output as external features. So, they can be simply incorporated in a regression model with the output of the time series analysis of the concerned stock.
In this manner, our neural network will be able to incorporate risk management, buyer and seller’s presence, news sentiment presence, and the economic conditions during the prediction of the subject price.
Let’s look into the working of the approach.
Working
The initial work is explained in my previous work. You can refer to that to get a more clear understanding. I have tried to predict the amazon stock price value and used the top 300 tickers of the SnP 500 companies.
The above diagrams give correlations among the top 300 tickers. I have considered the top 10 tickers that have the highest correlations with our target ticker Amazon (AMZN).
The above image shows the tickers and their correlations.
Now, we have taken the adjusted close values of the tickers and created individual RNN models in order to obtain their values for the t+1 th day.
To train the RNN, we have taken 20 days of consecutive data and trained the model to predict the data of the 21 st day.
The above model has been used to train the individual RNN models for the given data.
Now, let’s talk about our target ticker Amazon(AMZN). First, let’s check the declarations of the market indicators.
Moving average: Mean of last 20 days Adjusted close value.
Upper volatility index: Moving average of t-th day+ 2% of Moving average of the t-th day
Lower volatility index: Moving average of the t-th day- 2% of the Moving average of the t-th day
Short Resistance: Maximum High value for the last 10 days.
Short Support: Minimum Low value for the last 10 days.
Long Support: Minimum Low value for the last 50 days.
Long Resistance: Maximum High value for the last 50 days.
The definitions are not standard. They are purely assumptive for this project purpose. Please do not these definitions. They are purely for research purposes and idea projection.
The above image shows adjusted close, the volatility indexes, and the short support and resistance.
This image shows the moving average, the volatility indexes, and the long support and resistance.
The Trends can be downloaded in csv format from the google trends website. Similarly, the economic indicator, dow jones can be downloaded as csv from here and S&P 500 can be downloaded from here.
The above graph shows the variation of moving average with the indicators.
For News sentiments, I have scraped the news for amazon stock for the last 2 years from here. At present we can scrape sites like FINVIZ and Financial Times.
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
scores=df_news['News'].apply(analyzer.polarity_scores).tolist()
scores_df=pd.DataFrame(scores)
df_news_final=pd.concat([df_news,scores_df],axis=1)
The above code snippet is used to get the news sentiment values. We are using SentimentIntensityAnalyzer() from nltk.sentiment.vader library.
The analyzer basically gives the polarity of the news. If the news is good or positive, the positive polarity is high, and so on. The columns are: Positive, Negative, and Neutral.
So, we combine all the features to create our training dataset.
RNN_data_cols=['High','Low','Open','Close','Volume','Adj Close','Moving_av']
As we have discussed earlier, the above 7 features are directly dependent factors and we need to pass them through an RNN in order to obtain the t+1 th value, according to the time series analysis.
Regression_data_cols=['Upper_volatility','Lower_volatility','Short_resistance','Short_support','Long_resistance','Long_support','trend_hit','Dow_jones','snp500','compound','neg','neu','pos','BF-B','CTAS','CPRT','AJG','HUM','MKTX','INFO','CHD','A','APD']
The above 23 columns are external features that affect the value of the RNN obtain t+1 time series data. They include different features like news sentiment, relevant stock price, etc.
To train the RNN again we create 20 days of consecutive data and try to predict the 21st day’s Adjusted Close.
The above model has been used to predict the final stock value. As we can see, first the RNN predicts the value using the time series analysis, which is then passed with the affecting features into a regressional model to predict the value considering the effect of the external factors, which are present in the original market conditions.
We need to use the individual RNN models for the 10 relevant tickers to produce their probabilistic values for the t+1th day from time series analysis, which will be passed as the external features. For them, we just need to follow the trend or movement, to get the correlational movement in our target stock, so, time series analysis will be enough for those stocks.
Results
The Red-line in the above graph shows the actual values and the blue line shows the predicted values. We can see that our model gets very close to the actual values of the stock.
In this article, we have seen an approach to predict stock price, that takes into consideration, the actual trader’s point of view, and tries to analyze from there. The presence of the support and resistance lines helps the model to simulate a real market scenario with actual sellers and buyers. The News and trend show the effect of people’s sentiments on the real market.
This project is done totally to project the idea behind from a research point of view. The definitions and assumptions are made to conduct the project and are project-specific only.
The code is available: here.
I hope this helps.