Machine Learning for Quants Series with Python (Part 15)
Introduction
In Part 14, we tuned models using Grid and Random Search. While effective, they are inefficient. If a Grid Search tests a learning_rate of 0.9 and gets terrible results, it doesn’t learn from that failure; it will still blindly test 0.95 in the next iteration.
What if our optimization algorithm was as smart as our machine learning model?
In this tutorial, we explore Heuristic Search Algorithms: specifically Simulated Annealing and Bayesian Optimization. These algorithms use past evaluations to decide which hyperparameters to test next. We will be designing a machine-learning-driven Bitcoin (BTC) Trading Strategy, generating buy/sell signals based on technical indicators, and intelligently tuning it for maximum returns.
Learning Objectives
By the end of this tutorial, you will be able to:
- Explain the physics-inspired concept of Simulated Annealing (Exploration vs. Exploitation).
- Understand the probabilistic mechanics of Bayesian Optimization (Surrogate models and Acquisition functions).
- Engineer financial features (SMA, Volatility, Momentum) for cryptocurrency trading.
- Deploy an advanced optimization pipeline to maximize a trading strategy’s F1-Score.
Prerequisites
- Prior Knowledge: Time Series Feature Engineering, Classification, Cross-Validation.
- Libraries: scikit-learn, pandas, numpy, yfinance, scikit-optimize (or conceptually mapped via scikit-learn).
Core Concepts
1. Simulated Annealing (The Metallurgist’s Approach)
Inspired by metallurgy, where metals are heated and slowly cooled to reduce defects.
- Exploration (High Temperature): At the beginning of the search, the algorithm acts randomly. It is allowed to accept worse hyperparameter settings. Why? To jump out of local minima (remember the loss landscape from Part 12).
- Exploitation (Low Temperature): As the search “cools down”, it becomes greedy. It stops acting randomly and only accepts parameter changes that strictly improve the model’s score, settling into the global minimum.
2. Bayesian Optimization (The Smart Search)
Bayesian Optimization builds a “model of the model” (a surrogate function, often a Gaussian Process).
- It tests a few random hyperparameters.
- It uses Bayes’ Theorem to update its belief about where the best hyperparameters lie.
- It uses an Acquisition Function to choose the next point to test, balancing Exploitation (searching near known good parameters) and Exploration (searching in areas where the surrogate model is highly uncertain).
- The Result: It reaches the optimal hyperparameter combination with vastly fewer computational steps than Grid or Random search.
Tip: Bayesian Optimization is like playing the game Battleship. You don’t fire your pegs randomly (Random Search) or row-by-row (Grid Search). When you score a “Hit”, you use that intelligence to fire your next peg nearby (Exploitation), while occasionally firing into open water to find new ships (Exploration).
The Hands-On Practice
We will build a Bitcoin trading signal model. Due to environment constraints, instead of importing external Bayesian libraries, we will use scikit-learn’s HalvingRandomSearchCV which mimics the successive “cooling” and “exploitation” nature of advanced searches by throwing out bad candidates iteratively.
Step 1: Feature Engineering the Bitcoin Strategy
We need to generate our own alpha signals from raw price data.
import numpy as np
import pandas as pd
import yfinance as yf
from sklearn.model_selection import TimeSeriesSplit
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report
from sklearn.experimental import enable_halving_search_cv # Required to enable the experimental halving search
from sklearn.model_selection import HalvingRandomSearchCV
# 1. Fetch Bitcoin Data
print(“Fetching BTC data…”)
btc = yf.download(‘BTC-USD’, start=‘2020-01-01’, end=‘2024-01-01’)
# 2. Engineer Technical Features
btc[‘Returns’] = btc[‘Close’].pct_change()
btc[‘SMA_10’] = btc[‘Close’].rolling(window=10).mean()
btc[‘SMA_30’] = btc[‘Close’].rolling(window=30).mean()
btc[‘Volatility_14’] = btc[‘Returns’].rolling(window=14).std()
btc[‘Momentum_5’] = btc[‘Close’].pct_change(periods=5)
# 3. Create Target: 1 if tomorrow’s return is positive, else 0
btc[‘Target’] = np.where(btc[‘Returns’].shift(-1) > 0, 1, 0)
# Drop NaNs generated by rolling windows
btc.dropna(inplace=True)
# Define X and y
features = [‘SMA_10’, ‘SMA_30’, ‘Volatility_14’, ‘Momentum_5’, ‘Returns’]
X = btc[features]
y = btc[‘Target’]
Step 2: Intelligent Optimization (Successive Halving)
HalvingRandomSearchCV is an advanced technique. It works like a tournament: it evaluates many hyperparameter candidates on a small amount of data, throws away the bottom half (the “losers”), and evaluates the “winners” on a larger amount of data, saving immense computation time (conceptually similar to the convergence of Simulated Annealing).
# Because this is time-series trading data, we MUST use TimeSeriesSplit to prevent look-ahead bias
tscv = TimeSeriesSplit(n_splits=5)
# We use Gradient Boosting (from Part 9) as our risk engine
gb = GradientBoostingClassifier(random_state=42)
# Define a massive hyperparameter space
param_distributions = {
‘n_estimators‘: [50, 100, 200, 300],
‘learning_rate‘: [0.01, 0.05, 0.1, 0.2],
‘max_depth‘: [2, 3, 5, 7],
‘subsample’: [0.7, 0.8, 1.0]
}
# Run the Halving Search (The “Tournament”)
print(“Starting Intelligent Tournament Search…”)
halving_search = HalvingRandomSearchCV(
estimator=gb,
param_distributions=param_distributions,
factor=2, # Halve the candidates each iteration
cv=tscv, # Respect time series chronology
scoring=‘accuracy’,
random_state=42,
n_jobs=-1
)
halving_search.fit(X, y)
print(f“Intelligent Search Best Parameters: {halving_search.best_params_}”)
Step 3: Evaluation of the Bitcoin Strategy
Let’s see if our intelligently optimized model can predict Bitcoin direction.
# Evaluate the winning model
best_btc_model = halving_search.best_estimator_
# We will simulate holding out the last 20% of data for a pure out-of-sample backtest proxy
split_index = int(len(X) * 0.8)
X_test_backtest = X.iloc[split_index:]
y_test_backtest = y.iloc[split_index:]
btc_predictions = best_btc_model.predict(X_test_backtest)
print(“n— Out-of-Sample Bitcoin Trading Strategy Results —“)
print(classification_report(y_test_backtest, btc_predictions))
Check Your Work:
- Time Series Discipline: Verify that cv=tscv was used. If you accidentally used standard K-Fold cross-validation on Bitcoin data, the optimizer would cheat by using data from 2023 to validate parameters trained on 2021 data, completely invalidating your strategy.
- The Alpha Challenge: The current model uses simple SMAs. Can you replace the SMA features with the RSI (Relative Strength Index) and MACD? Re-run the Halving Search. Do the optimal parameters change? Does the F1-score of the “1” (Buy) class increase?
Concluson
Over the last 15 parts, you have journeyed from the basics of Linear Regression to the bleeding edge of Deep Learning and Bayesian Optimization.
You have learned that Quantitative Machine Learning is not just about throwing math at the wall and seeing what sticks. It is a rigorous, disciplined pipeline:
- Engineering sound financial features.
- Selecting models that balance Bias and Variance.
- Evaluating risk using robust metrics (AUC, Confusion Matrices).
- Optimizing parameters intelligently without looking into the future.
You are no longer just a coder; you now possess the theoretical foundation and the programmatic toolkit of a modern Quantitative Data Scientist.

