Artificial Intelligence , Machine Learning and Data Science Hubspot

Unlock the Power of Artificial Intelligence, Machine Learning, and Data Science with our Blog Discover the latest insights, trends, and innovations in Artificial Intelligence (AI), Machine Learning (ML), and Data Science through our informative and engaging Hubspot blog. Gain a deep understanding of how these transformative technologies are shaping industries and revolutionizing the way we work. Stay updated with cutting-edge advancements, practical applications, and real-world use.

Thursday, 19 March 2026

Beyond GridSearchCV: Advanced Hyperparameter Tuning Strategies for Scikit-learn Models

Beyond GridSearchCV: Advanced Hyperparameter Tuning Strategies for Scikit-learn Models
Image by Author | Ideogram

Introduction

Ever felt like trying to find a needle in a haystack? That’s part of the process of building and optimizing machine learning models, particularly complex ones like ensembles and neural networks, where several hyperparameters need to be manually set by us before training them. Machine learning model hyperparameters like the learning rate, the number of estimators to train in an ensemble, the maximum depth of a decision tree, etc., can yield models with varying degrees of performance depending on how such hyperparameters’ values were set: finding the optimal configuration for each of them is not an easy task.

Thankfully, Scikit-learn provides several classes to implement hyperparameter tuning strategies based on search algorithms combined with cross-validation. In this previous article, we introduced basic strategies like GridSearchCV. Now, we will venture into three additional strategies and how to implement them in Scikit-learn:

Randomized search (RandomizedSearchCV)
Bayes search (BayesSearchCV)
Successive halving strategies (HalvingGridSearchCV and HalvingRandomSearchCV)

Randomized Search

While grid search exhaustively seeks across a grid of “possible” values defined by us for several hyperparameters to find the best combination within that grid, the RandomizedSearchCV class samples hyperparameter values from the grid at random based on a specified or default distribution. When the number of hyperparameters to tune is large and the tuning range greatly varies, this is a more efficient approach.

To see it in action, let’s first load the MNIST dataset for image classification and import the necessary Python modules and classes for training a random forest classifier and tuning its hyperparameters:

import numpy as np
from sklearn.datasets import load_digits
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.metrics import accuracy_score

Loading and splitting the MNIST data into training and test sets:

digits = load_digits()
X, y = digits.data, digits.target
 
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

We initialize the random forest classifier — without training it yet — and define a hyperparameter space to sample from:

rf = RandomForestClassifier(random_state=42)
 
param_dist = {
    'n_estimators': [50, 100, 200, 300],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'bootstrap': [True, False]
}

Now we define the object responsible for the hyperparameter tuning process, passing in the random forest instance, the hyperparameter space we just defined, and specifying the number of random trials to perform (n_iter) as well as the number of training-validation folds for the cross-validation process inherently applied as part of the search. Once defined, the fit() method executes the entire process and yields the best hyperparameter setting found.

search = RandomizedSearchCV(
    estimator=rf,
    param_distributions=param_dist,
    n_iter=20,
    cv=5,
    scoring='accuracy',
    random_state=42,
    n_jobs=-1
)

search.fit(X_train, y_train)
 
print("Best Parameters:", search.best_params_)
 
best_rf = search.best_estimator_
y_pred = best_rf.predict(X_test)
print("Test Accuracy:", accuracy_score(y_test, y_pred))

My result is a “best” ensemble found with the following hyperparameter settings and an accuracy of nearly 98% on the test data:

Best Parameters: {'n_estimators': 50, 'min_samples_split': 2, 'min_samples_leaf': 1, 'max_depth': 20, 'bootstrap': False}
Test Accuracy: 0.9777777777777777

Bayes Search

This strategy also randomly samples from a defined search space, but it does so more intelligently, by choosing promising points and areas, being even more efficient than random search for challenging problems and datasets. The necessary class is not located in the base Scikit-learn library, but in a separate extension built by the same community for advanced optimization strategies. This “add-on” library is called skopt, short for scikit-optimize (you may need to install it before importing with pip install scikit-optimize).

Here’s a full example of how it works to optimize another random forest classifier on the same dataset:

from skopt import BayesSearchCV
from skopt.space import Real, Integer
 
from sklearn.ensemble import RandomForestClassifier
 
search_space = {
    'n_estimators': Integer(100, 300),
    'max_depth': Integer(5, 30),
    'min_samples_split': Integer(2, 10),
    'min_samples_leaf': Integer(1, 4)
}
 
opt = BayesSearchCV(
    estimator=RandomForestClassifier(),
    search_spaces=search_space,
    n_iter=20,
    cv=5,
    scoring='accuracy',
    random_state=42,
    n_jobs=-1
)
 
opt.fit(X_train, y_train)

As you can observe, the workflow is pretty similar as it was of RandomizedSearchCV.

Successive Halving Strategies

Successive Halving employs adaptive resource allocation to start with many possible model configurations and gradually narrow down the number of options. But there’s a catch: the computational budget is progressively increased as poorly performing configurations are discarded, thereby helping to focus resources on the most promising candidates. This makes the process more efficient than traditional grid or random search.

There are two classes in Scikit-learn to implement this strategy: HalvingGridSearchCV and HalvingRandomSearchCV. The former exhaustively evaluates all parameter combinations but prunes (removes) underperforming ones early, while the latter starts with randomly sampled configurations and applies pruning after sampling.

Implementing either of these requires specifying one hyperparameter as the resource, i.e. the hyperparameter whose value will be gradually increased as the search space is narrowed down.

from sklearn.experimental import enable_halving_search_cv
from sklearn.model_selection import HalvingRandomSearchCV
from scipy.stats import randint
from sklearn.ensemble import RandomForestClassifier
 
param_dist = {
    'max_depth': randint(5, 30),
    'min_samples_split': randint(2, 10),
    'min_samples_leaf': randint(1, 4),
    'bootstrap': [True, False]
}
 
search = HalvingRandomSearchCV(
    estimator=RandomForestClassifier(),
    param_distributions=param_dist,
    resource='n_estimators',
    max_resources=300,
    factor=2,
    cv=5,
    scoring='accuracy',
    random_state=42,
    n_jobs=-1
)
 
search.fit(X_train, y_train)

Visualizing the best model configuration found includes not only the hyperparameters in the search space, but also the one used as the resource — in this case, n_estimators:

Best Parameters: {'bootstrap': False, 'max_depth': 16, 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 256}

Wrapping Up

This article showcased three advanced strategies to fine-tune machine learning model hyperparameters in Scikit-learn — randomized search, Bayes search, and successive halving — all of which go beyond the classical grid search approach.

Artificial Intelligence , Machine Learning and Data Science Hubspot

Thursday, 19 March 2026

Beyond GridSearchCV: Advanced Hyperparameter Tuning Strategies for Scikit-learn Models

Introduction

Randomized Search

Bayes Search

Successive Halving Strategies

Wrapping Up

No comments:

Post a Comment

Build an Inference Cache to Save Costs in High-Traffic LLM Apps

Report Abuse

Labels

"Donate for a Noble Cause