Predicting Cryptocurrency Prices Using Regression Models - KDnuggets (2024)

In this article, we explore how to get started with the prediction of cryptocurrency prices using multiple linear regression. The factors investigated include predictions on various time intervals as well as the use of various features in the models such as opening price, high price, low price and volume.

By Eleonora Shantsila, Full Stack Software Engineer on May 27, 2022 in Data Science

Predicting Cryptocurrency Prices Using Regression Models - KDnuggets (1)
Source: https://unsplash.com/s/photos/cryptocurrency

As of March 2022, the cryptocurrency market stands at over $2 trillion [1] but remains incredibly volatile, having previously hit $3 trillion in November 2021. The same volatility is also seen within the individual cryptocurrencies, within the past month alone Etherium and Bitcoin have decreased by over 18% and 19% respectively. The number of new currencies coming onto the market is also increasing, with over 18,000 cryptocurrencies in existence as of March 2022 [2].

This volatility is what makes long-term cryptocurrency predictions more difficult. In this article, we will go through how to get started with cryptocurrency predictions using linear regression models. We will look at predictions over a number of time intervals whilst using various model features, like opening price, high price, low price and volume. The cryptocurrencies looked at in this article include more established players Bitcoin and Etherium as well as those still in fairly early stages namely Polkadot and Stellar.

In this article, we will be using multiple linear regression. Regression models are used to determine the relationship between variables by fitting a line through the data. Simple linear regression is a model used to predict a dependent variable (for instance the closing price of a cryptocurrency) using one independent variable (such as opening price), whereas multiple linear regression takes into account several independent variables.

The data we will be using comes from CoinCodex [3] and provides daily opening, high, low and closing as well as the volume and the market cap. Various combinations of features were experimented with to produce the models with predictions also being made over daily and weekly intervals. The python sklearn package was used to train the models and the R2 value was used as a metric of the accuracy of the model, with 1 indicating a perfect model.

The data used and the models produced have been uploaded to a Layer project and can be downloaded for further use and investigations. All the results and corresponding graphs can also be found in the Layer project. In this article we will discuss sections of the code used to implement the models.The full code can be accessed in this collab notebook.

In order to initialise the project and access the data and models you can run:

import layerlayer.login()layer.init("predictingCryptoPrices")

The data was uploaded to the Layer project for each of the 4 cryptocurrency datasets (Bitcoing, Ethereum, Polkadot and Stellar) by defining the functions used to read in the data and annotating them with the dataset and resources decorators.

@dataset("bitcoin")@resources(path="./data")def getBitcoinData(): return pd.read_csv("data/bitcoin.csv")layer.run([getBitcoinData])

You can run the following commands to access the datasets and save them to a pandas dataframe

layer.get_dataset("bitcoin").to_pandas()layer.get_dataset("ethereum").to_pandas()layer.get_dataset("polkadot").to_pandas()layer.get_dataset("stellar").to_pandas()

Daily predictions

The first set of multiple regression models were built to predict the prices at daily intervals. The closing price was first predicted using opening, low and high price on that day. Daily prices over the course of 1 year were used with a 75%-25% train-test split. The below function was used to train the model for a given dataset.

def runNoVolumePrediction(dataset): # Defining the parameters # test_size: proportion of data allocated to testing # random_state: ensures the same test-train split each time for reproducibility parameters = { "test_size": 0.25, "random_state": 15, } # Logging the parameters # layer.log(parameters) # Loading the dataset from Layer df = layer.get_dataset(dataset).to_pandas() df.dropna(inplace=True) # Dropping columns we won't be using for the predictions of closing price dfX = df.drop(["Close", "Date", "Volume", "Market Cap"], axis=1) # Getting just the closing price column dfy = df["Close"] # Test train split (with same random state) X_train, X_test, y_train, y_test = train_test_split(dfX, dfy, test_size=parameters["test_size"], random_state=parameters["random_state"]) # Fitting the multiple linear regression model with the training data regressor = LinearRegression() regressor.fit(X_train,y_train) # Making predictions using the testing data predict_y = regressor.predict(X_test) # .score returns the coefficient of determination R² of the prediction layer.log({"Prediction Score :":regressor.score(X_test,y_test)}) # Logging the coefficient corresponding to each variable coeffs = regressor.coef_ layer.log({"Opening price coeff":coeffs[0]}) layer.log({"Low price coeff":coeffs[1]}) layer.log({"High price coeff":coeffs[2]}) # Plotting the predicted values against the actual values plt.plot(y_test,predict_y, "*") plt.ylabel('Predicted closing price') plt.xlabel('Actual closing price') plt.title("{} closing price prediction".format(dataset)) plt.show() #Logging the plot layer.log({"plot":plt}) return regressor

The function was running with the model decorator to save the model to the Layer project as shown below.

@model(name='bitcoin_prediction')def bitcoin_prediction(): return runNoVolumePrediction("bitcoin")layer.run([bitcoin_prediction])

To access all of the models used in this set of predictions you can run the below code.

layer.get_model("bitcoin_prediction")layer.get_model("ethereum_prediction")layer.get_model("polkadot_prediction")layer.get_model("stellar_prediction")

Table 1 below summarises the R2 statistics for the 4 cryptocurrencies using this model.


Predicting Cryptocurrency Prices Using Regression Models - KDnuggets (2)
Table 1: Accuracy of predictions of cryptocurrency closing price using opening, low and high prices on that day

The graphs below show the predicted values against the true values for the best and worst-performing currencies Polkadot and Stellar (Figure 1a and 1b).

Predicting Cryptocurrency Prices Using Regression Models - KDnuggets (3)
Figure 1a: Stellar predicted closing price against the true closing price for the day using opening, low and high prices as features
Predicting Cryptocurrency Prices Using Regression Models - KDnuggets (4)
Figure 1b: Polkadot predicted closing price against the true closing price for the day using opening, low and high prices as features

From the graph and the R2 values, we can see that the linear regression model using opening, low and high price is very accurate for all 4 currencies, regardless of their level of maturity.

Next, we investigate whether adding the volume improves the model. The volume corresponds to the total number of trades taking place that day. Interestingly enough after fitting regression models using opening, low and high prices and the volume the results were largely identical. You can access the bitcoin_prediction_with_volume, ethereum_prediction_with_volume, polkadot_prediction_with_volume and stellar_prediction_with_volume models in the Layer project. Further investigation showed, that the coefficient for the volume variable was 0 across the 4 cryptocurrencies. This indicated the volume of trades of a cryptocurrency is not a good predictor of its price on that day.

Weekly predictions

In the next model, we look at predicting the end of week prices using the data from that weeks opening price on a Monday along with the low and high prices that week. For this model data from the past 5 years was used for Bitcoin, Ethereum and Stellar. For Polkadot, which was only launched in 2020, all of the historical data was used.
After downloading the data the low and high prices for each week were calculated and the opening and closing prices for the week were recorded. These datasets can also be accessed on Layer by running:

layer.get_dataset("bitcoin_5Years").to_pandas()layer.get_dataset("ethereum_5Years").to_pandas()layer.get_dataset("polkadot_5Years").to_pandas()layer.get_dataset("stellar_5Years").to_pandas()

Again, a 75%-25% train-test split was used and a multiple linear regression model was fitted using the opening, high and low prices for the week. The models can also be accessed on the Layer project. The R2 metric was again used for comparison with the results shown in Table 2 below.

Predicting Cryptocurrency Prices Using Regression Models - KDnuggets (5)
Table 2: Accuracy of predictions of cryptocurrency closing price using opening, low and high prices

The accuracy of the models when predicting with data over the course of a week were slightly worse than when predicting over the course of a day however, the accuracy itself was still very high. Even the worst performing model for the cryptocurrency Stellar achieve and R2 score of 0.985.

Overall, linear regression models using the opening, low and high prices as features perform very well on both daily and weekly intervals. A natural extension of these models would be looking into predicting further into the future, for instance using this weeks data to predict the price next week.


[1] https://fortune.com/2022/03/02/crypto-market-cap-2-trillion/#:~:text=As%20of%2010%3A15%20a.m.,when%20it%20topped%20%243%20trillion.

[2] https://www.investopedia.com/tech/most-important-cryptocurrencies-other-than-bitcoin/#:~:text=One%20reason%20for%20this%20is,communities%20of%20backers%20and%20investors.

[3] https://coincodex.com/crypto/stellar/historical-data/



Eleonora Shantsila is a full stack software engineer working at an events start up called Lounge, having previously worked as a full stack engineer in Financial Services. Eleonora has a background in Mathematics (BSc from St Andrews University) and Computational Science (MS from Harvard University) and enjoys working on data science projects in their spare time. Feel free to connect on LinkedIn.


More On This Topic

  • 3 Reasons Why You Should Use Linear Regression Models Instead of…
  • Building Predictive Models: Logistic Regression in Python
  • Write and train your own custom machine learning models using PyCaret
  • Building Multimodal Models: Using the widedeep Pytorch package
  • How to Make Large Language Models Play Nice with Your Software…
  • Using Generative Models for Creativity
Predicting Cryptocurrency Prices Using Regression Models - KDnuggets (2024)

FAQs

What is the best algorithm to predict crypto prices? ›

The best algorithms for predicting the price of a cryptocurrency are XGBoost with Particle Swarm Optimization (PSO), Long Short-Term Memory (LSTM), and Random Forest. These algorithms have been found to provide accurate predictions with low error rates.

What is the best model for crypto prediction? ›

Khedr et al. (2021) concluded that LSTM is considered to be the best method for predicting cryptocurrency price time series due to its ability to recognize long-term time-series associations.

How do you predict the price of cryptocurrency? ›

You can predict cryptocurrency prices by using techniques such as crypto technical analysis, fundamental analysis, on-chain research, and market sentiment evaluation. Technical analysis thrives in crypto due to its high volatility.

Can regression model be used for prediction? ›

Regression analysis is a form of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable (s) (predictor). This technique is used for forecasting, time series modelling and finding the causal effect relationship between the variables.

What is the most reliable source for crypto predictions? ›

Here is the list of the Top 10 Best Crypto Prediction Sites in 2024:
  • Binance.
  • CryptoPredictions.
  • IntoTheBlock.
  • CryptoRating.
  • DigitalCoinPrice.
  • CoinPriceForecast.
  • Changelly.
  • CoinCodex.
Apr 9, 2024

Can AI predict crypto prices? ›

AI can predict crypto market trends by analyzing vast amounts of historical data, identifying patterns, and generating insights to make probabilistic forecasts.

Is there any app to predict cryptocurrency? ›

WalletInvestor is the tool for forecasting cryptocurrency prices, utilizing machine learning algorithms to provide daily and long-term predictions for over 880 digital currencies.

Which crypto will give 1000x? ›

Etuktuk is a new 1000x crypto that has been gaining traction in the market. It is a decentralized platform designed to provide a secure and transparent ecosystem for users to trade, invest, and earn rewards.

What is the crypto price prediction API? ›

The Cryptocurrency Price Prediction API is a powerful tool for predicting the future price of a specified cryptocurrency using machine learning and AI technology. With simple API calls, users can receive accurate price predictions and retrieve a list of available cryptocurrencies.

How to predict cryptocurrency prices using machine learning? ›

According to the authors, researchers generally use several machine learning techniques seeking model accuracy and better forecasts. Recently, Orte et al. (2023) applied the Random Forest technique to forecast Bitcoin price direction. They find that this technique better predicts long positions than short ones.

How to predict crypto prices based on market cap? ›

The most important one is the supply and demand chain. In a scenario where the demand for a crypto is high and the supply is scarce, the price will increase. The price increase will also lead to an increase in market capitalization.

How to use a regression equation to predict? ›

How to Use a Linear Regression Model to Calculate a Predicted Response Value. Step 1: Identify the independent variable . Step 2: Calculate the predicted response value by plugging in the given value into the least-squares linear regression line y ^ ( x ) = a x + b .

When should a regression model not be used to make a prediction? ›

Regression models should not be used to extrapolate beyond the range of the data. The model is valid only for predictor values for which the data has sufficient values (even in the case that sufficient data is available, there could be other problems: see “Testing the Assumptions: Regression Diagnostics”).

What is the formula for regression prediction? ›

The formula for simple linear regression is Y = mX + b, where Y is the response (dependent) variable, X is the predictor (independent) variable, m is the estimated slope, and b is the estimated intercept.

Which machine learning algorithms for Bitcoin price prediction? ›

LSTM is a deep learning algorithm with good predictive performance for time series data. The conclusion on whether it is necessary to refer to the data of multiple periods before when predicting the price of Bitcoin is that the prediction accuracy of the model that only needs the previous period is the best.

Top Articles
Latest Posts
Article information

Author: Edwin Metz

Last Updated:

Views: 5543

Rating: 4.8 / 5 (78 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Edwin Metz

Birthday: 1997-04-16

Address: 51593 Leanne Light, Kuphalmouth, DE 50012-5183

Phone: +639107620957

Job: Corporate Banking Technician

Hobby: Reading, scrapbook, role-playing games, Fishing, Fishing, Scuba diving, Beekeeping

Introduction: My name is Edwin Metz, I am a fair, energetic, helpful, brave, outstanding, nice, helpful person who loves writing and wants to share my knowledge and understanding with you.