Using Machine Learning To Predict Future Stock Price (2024)

With Python, Scikit-learn, Tensorflow, and Alpha Vantage API

Shiyan

Published in

Shiyan Boxer

6 min read

Apr 13, 2023

Using Machine Learning To Predict Future Stock Price (3)

In recent years, the use of machine learning algorithms to predict stock prices has gained significant attention. With the vast amounts of financial data available, machine learning models can potentially identify patterns and predict future stock prices more accurately than traditional time-series analysis methods.

In this tutorial, we'll use Python to retrieve stock data using the Alpha Vantage API and build a machine-learning model to predict future stock prices.

Alpha Vantage is a free API that provides real-time and historical stock data. To retrieve stock data, you must sign up for an API key from the Alpha Vantage website.

https://www.alphavantage.co/

Once you have an API key, you can use the requests library in Python to retrieve stock data from the API. Here's an example of how to retrieve daily adjusted stock data for the Apple stock symbol (AAPL):

import requests
import pandas as pd# replace YOUR_API_KEY with your Alpha Vantage API key
api_key = 'YOUR_API_KEY'
# set ticker symbol and API endpoint
symbol = 'AAPL'
endpoint = f'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol={symbol}&apikey={api_key}'
# retrieve stock data from API and store in pandas DataFrame
response = requests.get(endpoint)
data = response.json()['Time Series (Daily)']
df = pd.DataFrame(data).T
df.index = pd.to_datetime(df.index)
df = df.astype(float)

In this example, we first set the api_key variable to our API key from Alpha Vantage. We then set the symbol variable to AAPL, which is the stock symbol for Apple. We construct the API endpoint URL with the symbol and api_key variables using f-strings.

We use the requests.get function to retrieve the stock data from the API, and then parse the response JSON data using the json() method. We extract the Time Series (Daily) data and store it in a Pandas DataFrame. We convert the index column to a pandas DatetimeIndex using the to_datetime() method and convert all columns to float data type using the astype() method.

Before using the stock data to train a machine learning model, we need to preprocess the data and extract relevant features. In this tutorial, we'll focus on using the closing price of the stock as the target variable, and use the previous day's closing price as a feature.

Here's an example of how to preprocess the data to calculate the percentage change in closing price and set the target variable to be the next day's closing price:

# calculate percentage change in closing price
df['Close_pct'] = df['4. close'].pct_change()# set target variable to be next day's closing price
df['Target'] = df['4. close'].shift(-1)

In this example, we use the pct_change() method to calculate the percentage change in closing price and store the result in a new column called Close_pct. We then use the shift() method to shift the 4. close column up by one row, effectively setting the target variable as the next day's closing price.

Finally, we drop any rows with missing data using the dropna() method:

# drop rows with missing data
df.dropna(inplace=True)

Now that we have preprocessed the data, we can use it to train a machine-learning model to predict future stock prices. There are many machine learning models that can be used for stock price prediction, such as linear regression, decision trees, random forests, and neural networks. In this tutorial, we’ll use a simple linear regression model to predict the next day’s closing price based on the previous day’s closing price.

We’ll use the scikit-learn library to build our linear regression model. Here’s an example of how to split the data into training and testing sets, and train the linear regression model:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression# split data into training and testing sets
X = df['4. close'].shift(1).values.reshape(-1, 1)
y = df['Target'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# train linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

In this example, we use the train_test_split() function from scikit-learn to split the data into training and testing sets. We set X to be the previous day's closing price (shifted up by one row) and y to be the next day's closing price. We then use train_test_split() to split X and y into training and testing sets, with a test size of 20% and a random state of 42.

We initialize a LinearRegression object and use the fit() method to train the model on the training data.

After training the model, we need to evaluate its performance on the testing set. One common metric for regression models is the mean squared error (MSE), which measures the average squared difference between the predicted values and the actual values.

Here's an example of how to calculate the MSE for the linear regression model:

from sklearn.metrics import mean_squared_error# evaluate model on testing set
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f'MSE: {mse:.2f}')

In this example, we use the predict() method to generate predictions for the testing set, and then calculate the MSE using the mean_squared_error() function from scikit-learn. We print the MSE to the console.

Now that we have trained and evaluated the model, we can use it to make predictions on new data. Here's an example of how to use the model to predict the next day's closing price for Apple stock:

import numpy as np# predict next day's closing price
last_close = df['4. close'].iloc[-1]
next_close = model.predict(np.array([[last_close]]))[0]
print(f'Predicted next day\'s closing price: {next_close:.2f}')

In this example, we use the iloc[] method to extract the most recent closing price from the DataFrame, and then use the trained model to predict the next day's closing price based on this value. We print the predicted closing price to the console.

In this example, we use the RandomForestRegressor class from scikit-learn instead of the LinearRegression class to train a random forest regression model on the training data. We set the number of trees in the forest to 100, and use a random state of 42 for reproducibility.

We then use the predict() method to generate predictions for the testing set and calculate the MSE. Finally, we use the trained model to predict the next day's closing price, just like before.

from sklearn.ensemble import RandomForestRegressor# split data into training and testing sets
X = df['4. close'].shift(1).values.reshape(-1, 1)
y = df['Target'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# train random forest regression model
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# evaluate model on testing set
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f'MSE: {mse:.2f}')
# predict next day's closing price
last_close = df['4. close'].iloc[-1]
next_close = model.predict(np.array([[last_close]]))[0]
print(f'Predicted next day\'s closing price: {next_close:.2f}')

Random forest regression is a powerful machine learning technique that can often outperform linear regression for predicting complex patterns in data. By trying different models and techniques, you can find the best approach for predicting stock prices with your particular data set.

In this tutorial, we used Python to retrieve stock data from the Alpha Vantage API, preprocessed the data to extract relevant features, trained a linear regression and random forest model to predict future stock prices, and evaluated the model's performance on a testing set. We also made predictions using the trained model.

While this is a simple example, there are many ways to improve and extend the model, such as using additional features, trying different machine-learning algorithms, and tuning hyperparameters. With the right data and model, machine learning can be a powerful tool for predicting stock prices and making informed investment decisions.

It’s worth noting that stock price prediction is a difficult task, and no model can perfectly predict future prices. There are many factors that can influence stock prices, such as economic indicators, company news and performance, and global events. Therefore, it’s important to use caution when making investment decisions based on any predictions made by a machine learning model.

Overall, this tutorial provides a basic framework for using Python and machine learning to predict stock prices. By exploring different models and techniques, and incorporating additional features, you can develop more accurate and robust models for predicting stock prices.