Use ML Generate Factor in Factor Investing

Use ML Generate Factor in Factor Investing



Zero to one in Financial ML Developer with SKlearn

In my previous article on Factor Investing with Machine Learning, we discussed the fundamentals of Factor Investing and how machine learning can be used to enhance its performance. However, there are still some unresolved challenges. One of the weaknesses of traditional stock factor classification is the use of percentile classification, which may be easy to understand but can lead to incorrect groupings. To address this, we can leverage the power of Support Vector Machine (SVM) models to improve our factor classification approach. SVMs are known for their ability to handle complex data distributions and can help us achieve more accurate and robust stock factor classification, leading to better investment decisions.

Labeling

We start by defining Outperform stocks. For this article, I am defining Outperform stocks is Stock top 20 Performance. And use “Market Cap” and “Earnings before Tax” for Classification. You can change both.

import matplotlib.pyplot as plt

Outperform = df_rank[df_rank['Outperform'] == True] .sample(n= 200, random_state=1)
Underperform = df_rank[df_rank['Outperform'] == False] .sample(n= 200, random_state=1)

plt.figure(figsize=(6, 4)) # extra code – not needed, just formatting
plt.plot(Outperform["Market Cap"] , Outperform["Earnings before Tax"] , "r.", label="Outperform", alpha= 1)
plt.plot(Underperform["Market Cap"] , Underperform["Earnings before Tax"] , "b.", label="Underperform", alpha= 1)

# extra code – beautifies and saves Figure 4–2
plt.xlabel("Market Cap")
plt.ylabel("Earnings before Tax", rotation=0)
plt.grid()
plt.legend(loc="upper left")

plt.show()

From the figure we can believe that the variable has relationship with Alpha.The big question,How can we classify Outperform and Underperformstocks?

One method is SVM.

Support Vector Machine

A Support Vector Machine(SVM.) is a powerful and versatile machine learning model, capable of performing liner non linear classification and even novelty detection in small & medium datasets.We will show how to use SVM. for stock factor classification

We can use it for an easy process

df_rank = df_rank.fillna(value= df_rank.mean() )

y = df_rank["Outperform"] = np.where(df_rank['Alpha'] > 0.25 , 1, 0)
X = df_rank[["Market Cap", "Earnings before Tax"]]

# SVM Classifier model
svm_clf = SVC(kernel="linear")
svm_clf.fit(X, y)

We get coef.

print(svm_clf.coef_[0] )

For predicting Class.

 svm_clf.predict(X)

Gaussian Radial Basis Function

Sometimes we want to capture non-linear patterns, have flexibility in adapting to different data distributions, and handle high-dimensional data. we can use Gaussian Radial Basis Function (RBF)

The Gaussian Radial Basis Function (RBF) kernel is a popular kernel used in Support Vector Machines (SVM) for solving classification and regression problems. It is also known as the Gaussian kernel or the radial basis function kernel.

The Gaussian RBF kernel is defined by the formula:

K(x, x’) = exp(-gamma * ||x — x’||²)

where x and x’ are data points, ||x — x’||² is the squared Euclidean distance between the two data points, and gamma is a hyperparameter that controls the shape of the kernel. The gamma parameter determines the width of the Gaussian distribution, and it plays a crucial role in determining the flexibility of the decision boundary.

The Gaussian RBF kernel is popular because it can capture complex non-linear patterns in the data by mapping the data points to a higher-dimensional feature space. It can be used to find non-linear decision boundaries that can separate data points of different classes that are not linearly separable in the original feature space. The kernel trick allows SVM to find a hyperplane that separates the data points in this higher-dimensional feature space, making it a powerful tool for solving complex classification problems.

One important aspect of using the Gaussian RBF kernel is tuning the gamma hyperparameter. A smaller value of gamma results in a wider Gaussian distribution, which may result in a smoother decision boundary and a more flexible model. On the other hand, a larger value of gamma results in a narrower Gaussian distribution, which may result in a more localized decision boundary and a less flexible model. The optimal value of gamma depends on the specific characteristics of the data and the problem at hand, and it often requires careful tuning to achieve the best performance of the SVM model.

In summary, the Gaussian RBF kernel is a powerful tool in SVM for capturing non-linear patterns in the data and finding complex decision boundaries. It allows SVM to solve classification and regression problems that are not linearly separable in the original feature space, making it a popular choice in many machine learning applications. However, proper tuning of the gamma hyperparameter is crucial for achieving the best performance of the SVM model.

SVC(kernel="rbf")

Summary

We discusses how machine learning can be used to enhance factor investing performance, specifically addressing the weakness of traditional stock factor classification using percentile classification. The article suggests leveraging Support Vector Machine (SVM) models to improve factor classification by using SVM for stock factor classification. It demonstrates how to use SVM for classification using “Market Cap” and “Earnings before Tax” as features, and also explains the Gaussian Radial Basis Function (RBF) kernel, a popular kernel used in SVM for capturing non-linear patterns in data. The article highlights the importance of tuning the gamma hyperparameter for optimal performance of the SVM model. In summary, SVM with the Gaussian RBF kernel is a powerful tool for factor investing, but proper tuning of hyperparameters is crucial for best results.

Ref :

Machine Learning for Factor Investing by Guillaume Coqueret

  1. “Advances in Financial Machine Learning” by Marcos Lopez de Prado: This book provides an in-depth overview of the intersection of finance and machine learning, including techniques for factor modeling, risk management, and portfolio optimization using machine learning algorithms.
  2. “Machine Learning for Factor Investing: Empirical Methods for Systematic Trading” by Marcos Lopez de Prado:
  3. “Applied Machine Learning” by Kelleher, Mac Namee, and D’Arcy
  4. “Quantitative Momentum: A Practitioner’s Guide to Building a Momentum-Based Stock Selection System” by Wesley R. Gray and Jack R. Vogel
  5. Hands-on ml with scikit-learn keras and tensorflow by Geron Aurelien
  6. Machine Learning and Data Science Blueprints for Finance: From Building Trading Strategies to Robo-Advisors Using Python by Hariom Tatsat , Sahil Puri , Brad Lookabaugh

Note Book

https://colab.research.google.com/drive/15LW-EcqPl8NMNtSeDe6uR652l7dUbum7?usp=sharing


Use ML Generate Factor in Factor Investing was originally published in DataDrivenInvestor on Medium, where people are continuing the conversation by highlighting and responding to this story.

Original Post>