Home Artificial Intelligence Investing in Mexico: Using a Decision Tree Model to Predict NAFTRAC’s Price Movements Disclaimer Constructing the Model Interpreting the Model Next Steps and Suggestions

Investing in Mexico: Using a Decision Tree Model to Predict NAFTRAC’s Price Movements Disclaimer Constructing the Model Interpreting the Model Next Steps and Suggestions

2
Investing in Mexico: Using a Decision Tree Model to Predict NAFTRAC’s Price Movements
Disclaimer
Constructing the Model
Interpreting the Model
Next Steps and Suggestions

On the earth of finance, predicting the movement of stock prices is usually a beneficial tool for investors and traders. One approach to this problem is to make use of machine learning algorithms, comparable to decision trees, to learn patterns from historical data and make predictions about future price movements.

The asset we’re specializing in in this text is NAFTRAC, which is an exchange-traded fund (ETF) that tracks the value and yield performance of the Mexican Stock Exchange’s IPC Index. This ETF provides a broad exposure to the biggest and most liquid stocks listed on the Mexican Stock Exchange, making it a preferred alternative for investors fascinated about the Mexican market.

The information used for this evaluation was sourced from Yahoo Finance, a reliable and widely used platform for financial information. It provides historical price data for a big selection of assets, including NAFTRAC, which we’ll use to coach our decision tree model.

In this text, we’ll walk through the means of creating a call tree model to predict whether the value of NAFTRAC will go up or down the following day.

This text is for informational purposes only and doesn’t constitute financial advice.

We’ll start by importing the essential libraries and loading our data:

import pandas as pd 
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load your data
data = pd.read_csv('data/NAFTRACISHRS.MX.csv')

Next, we’ll create some features for our model. On this case, we’ll use the day prior to this’s price and the change in price from the day prior to this:

# Create features
data['Previous Day Price'] = data['Price'].shift(1)
data['Price Change'] = data['Price'] - data['Previous Day Price']

We’ll also have to create a goal variable for our model. This will likely be a binary variable indicating whether the value went up (1) or down (0) from the day prior to this:

#Drop the primary row which does not have a previous day price 
data = data.dropna()

# Create goal
# It is a binary variable indicating whether the value went up (1) or down (0) from the day prior to this
data['Target'] = (data['Price Change'] > 0).astype(int)

Now we will split our data right into a training set and a test set:

# Split data into features (X) and goal (y)
X = data[['Previous Day Price', 'Price Change']]
y = data['Target']

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Finally, we will create our decision tree model, train it on our training data, and make predictions on our test data:

# Create decision tree
clf = DecisionTreeClassifier()

# Train decision tree
clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = clf.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy}')

The accuracy of our model tells us how often it appropriately predicted the direction of price movement on our test data. An accuracy of 1.0 implies that the model made correct predictions 100% of the time, while an accuracy of 0.5 implies that the model was correct 50% of the time.

While this model is an excellent start line, there are lots of ways it might be improved. For instance, we could add more features, comparable to the quantity of trades, the opening and shutting prices, and other technical indicators. We could also try different machine learning algorithms, comparable to random forests or gradient boosting, which could give higher results.

One other necessary step is to judge the model’s performance on out-of-sample data. This implies testing the model on latest data that it hasn’t seen through the training process, to be certain that it will probably generalize well to latest situations.

2 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here