Implementing Time Series Stock Price Prediction with LSTM and yfinance in Python (2024)

Let’s break down the code part by part:

  1. Importing Libraries:
  • Use Case: Preparing the necessary tools for financial data analysis and neural network modeling.
  • Application: This section sets up the environment with libraries such as yfinance for data retrieval, numpy and pandas for data manipulation, and MinMaxScaler, Sequential, LSTM, and Dense for building and training neural networks.
import yfinance as yf
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense

Explanation:

  • yfinance: This library is used to download historical stock data from Yahoo Finance.
  • numpy: Provides support for large, multi-dimensional arrays and matrices, which is commonly used in numerical operations.
  • pandas: A powerful library for data manipulation and analysis, particularly useful for handling time series data.
  • MinMaxScaler: Part of scikit-learn, it scales data to a specified range (0 to 1 in this case), which is important for neural network training.
  • Sequential, LSTM, Dense: Components of the Keras library, where Sequential is used to initialize the neural network, LSTM represents the LSTM layer, and Dense is used to add a fully connected layer to the neural network.
  • 2. Downloading Historical Stock Data:
  • Use Case: Fetching historical stock prices for a specified period.
  • Application: Investors, traders, and analysts can use this to gather data for a particular stock, enabling various analyses and predictions.
stock_symbol = 'AAPL'
start_date = '2020–01–01'
end_date = '2021–01–01'
data = yf.download(stock_symbol, start=start_date, end=end_date)

Explanation:

  • stock_symbol: Specifies the stock symbol for which historical data is to be downloaded (in this case, 'AAPL' for Apple).
  • start_date and end_date: Define the date range for which historical stock data is to be retrieved.
  • yf.download(): Fetches historical stock data using Yahoo Finance API.

3. Data Preprocessing:

  • Use Case: Preparing the historical stock data for input to a machine learning model.
  • Application: This section extracts and scales the closing prices, a crucial step in preparing the data for the LSTM model, ensuring that it is on a consistent scale for training.
close_prices = data['Close'].values.reshape(-1, 1)
scaler = MinMaxScaler(feature_range=(0, 1))
close_prices_scaled = scaler.fit_transform(close_prices)

Explanation:

  • close_prices: Extracts the closing prices from the downloaded data and reshapes them.
  • scaler: Initializes a MinMaxScaler, which scales data between 0 and 1.
  • close_prices_scaled: Applies Min-Max scaling to normalize the closing prices.

4. Creating Input Data for LSTM:

  • Use Case: Structuring the data into sequences for training the LSTM model.
  • Application: This function defines the input sequences (X) and corresponding target values (y) for training the LSTM model, which is essential for capturing temporal patterns in the data.
def create_lstm_data(data, time_steps=1):
x, y = [], []
for i in range(len(data) - time_steps):
x.append(data[i:(i + time_steps), 0])
y.append(data[i + time_steps, 0])
return np.array(x), np.array(y)

Explanation:

  • create_lstm_data: Defines a function to create input sequences for the LSTM model.
  • The function takes historical stock prices (data) and a specified number of time_steps to create input sequences x and corresponding target values y

5. Setting Time Steps and Creating Input Data:

  • Use Case: Defining the number of time steps for the LSTM model and reshaping the data.
  • Application: This part sets the time steps, a hyperparameter affecting the memory of the LSTM. Reshaping the data ensures that it conforms to the input format expected by the LSTM layer.
time_steps = 10
x, y = create_lstm_data(close_prices_scaled, time_steps)
x = np.reshape(x, (x.shape[0], x.shape[1], 1))

Explanation:

  • time_steps: Sets the number of time steps for the LSTM model (in this case, 10).
  • x, y: Creates input data (x) and target values (y) using the previously defined function.
  • Reshapes the input data to the required format for the LSTM model.

6. Building the LSTM Model:

  • Use Case: Constructing a neural network architecture suitable for time series prediction.
  • Application: The model architecture is designed with two LSTM layers and one dense layer. This configuration is suitable for capturing and learning complex patterns in sequential data like stock prices.
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(x.shape[1], 1)))
model.add(LSTM(units=50))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error')

Explanation:

  • model: Initializes a sequential neural network model.
  • Adds two LSTM layers with 50 units each and a dense layer with 1 unit.
  • Compiles the model using the Adam optimizer and mean squared error as the loss function.

7. Training the Model:

  • Use Case: Optimizing the model’s parameters using historical data.
  • Application: This part trains the LSTM model using historical stock prices, allowing the model to learn patterns and relationships within the data.
model.fit(x, y, epochs=50, batch_size=32)

Explanation:

  • Trains the LSTM model using the prepared input data (x) and target values (y).
  • epochs=50: Specifies the number of training iterations.
  • batch_size=32: Sets the number of samples per gradient update.

8. Predicting Future Stock Prices:

  • Use Case: Utilizing the trained model to make predictions for future stock prices.
  • Application: The code predicts future stock prices based on the last available historical data, providing insights for potential market movements.
future_dates = pd.date_range(start=end_date, periods=30)
last_prices = close_prices[-time_steps:]
last_prices_scaled = scaler.transform(last_prices.reshape(-1, 1))
x_pred = np.array([last_prices_scaled[-time_steps:, 0]])
x_pred = np.reshape(x_pred, (x_pred.shape[0], x_pred.shape[1], 1))
predicted_prices_scaled = model.predict(x_pred)
predicted_prices = scaler.inverse_transform(predicted_prices_scaled)

Explanation:

  • future_dates: Generates future dates for prediction.
  • Extracts the last 10 closing prices to predict future prices.
  • Scales the last prices and prepares the input data for prediction.
  • Uses the trained LSTM model to predict future prices and inverse transforms the scaled predictions.

9. Displaying Predictions:

  • Use Case: Visualizing and presenting the predicted future stock prices.
  • Application: The predicted prices are displayed in a structured format, allowing users to assess and make decisions based on the model’s forecasts.
future_data = pd.DataFrame({'Date': future_dates, 'Predicted Price': predicted_prices.flatten()})
print(future_data)

Explanation:

  • Creates a DataFrame (future_data) with future dates and the corresponding predicted prices.
  • Prints the DataFrame, providing a readable format for the predicted future prices.

Remember that this is a basic example, and real-world stock price prediction involves more sophisticated models, feature engineering, and careful evaluation. The code provided is for educational purposes and may need adjustments for different scenarios.

Implementing Time Series Stock Price Prediction with LSTM and yfinance in Python (2024)
Top Articles
Latest Posts
Article information

Author: Aracelis Kilback

Last Updated:

Views: 6064

Rating: 4.3 / 5 (64 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Aracelis Kilback

Birthday: 1994-11-22

Address: Apt. 895 30151 Green Plain, Lake Mariela, RI 98141

Phone: +5992291857476

Job: Legal Officer

Hobby: LARPing, role-playing games, Slacklining, Reading, Inline skating, Brazilian jiu-jitsu, Dance

Introduction: My name is Aracelis Kilback, I am a nice, gentle, agreeable, joyous, attractive, combative, gifted person who loves writing and wants to share my knowledge and understanding with you.