Artificial Intelligence , Machine Learning and Data Science Hubspot

Unlock the Power of Artificial Intelligence, Machine Learning, and Data Science with our Blog Discover the latest insights, trends, and innovations in Artificial Intelligence (AI), Machine Learning (ML), and Data Science through our informative and engaging Hubspot blog. Gain a deep understanding of how these transformative technologies are shaping industries and revolutionizing the way we work. Stay updated with cutting-edge advancements, practical applications, and real-world use.

Wednesday, 8 May 2024

How to Save Gradient Boosting Models with XGBoost in Python

XGBoost can be used to create some of the most performant models for tabular data using the gradient boosting algorithm.

Once trained, it is often a good practice to save your model to file for later use in making predictions new test and validation datasets and entirely new data.

In this post you will discover how to save your XGBoost models to file using the standard Python pickle API.

After completing this tutorial, you will know:

How to save and later load your trained XGBoost model using pickle.

How to save and later load your trained XGBoost model using joblib.

Serialize Your XGBoost Model with Pickle

Pickle is the standard way of serializing objects in Python.

You can use the Python pickle API to serialize your machine learning algorithms and save the serialized format to a file, for example:

# save model to file

pickle.dump(model, open("pima.pickle.dat", "wb"))

Later you can load this file to deserialize your model and use it to make new predictions, for example:

# load model from file

loaded_model = pickle.load(open("pima.pickle.dat", "rb"))

The example below demonstrates how you can train a XGBoost model on the Pima Indians onset of diabetes dataset, save the model to file and later load it to make predictions.

Download the dataset and save it to your current working directory.

The full code listing is provided below for completeness.

# Train XGBoost model, save to file using pickle, load and make predictions

from numpy import loadtxt

import xgboost

import pickle

from sklearn import model_selection

from sklearn.metrics import accuracy_score

# load data

dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")

# split data into X and y

X = dataset[:,0:8]

Y = dataset[:,8]

# split data into train and test sets

seed = 7

test_size = 0.33

X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, Y, test_size=test_size, random_state=seed)

# fit model no training data

model = xgboost.XGBClassifier()

model.fit(X_train, y_train)

# save model to file

pickle.dump(model, open("pima.pickle.dat", "wb"))

# some time later...

# load model from file

loaded_model = pickle.load(open("pima.pickle.dat", "rb"))

# make predictions for test data

y_pred = loaded_model.predict(X_test)

predictions = [round(value) for value in y_pred]

# evaluate predictions

accuracy = accuracy_score(y_test, predictions)

print("Accuracy: %.2f%%" % (accuracy * 100.0))

Running this example saves your trained XGBoost model to the pima.pickle.dat pickle file in the current working directory.

pima.pickle.dat

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

After loading the model and making predictions on the training dataset, the accuracy of the model is printed.

Accuracy: 77.95%

Serialize XGBoost Model with joblib

Joblib is part of the SciPy ecosystem and provides utilities for pipelining Python jobs.

The Joblib API provides utilities for saving and loading Python objects that make use of NumPy data structures, efficiently. It may be a faster approach for you to use with very large models.

The API looks a lot like the pickle API, for example, you may save your trained model as follows:

# save model to file

joblib.dump(model, "pima.joblib.dat")

You can later load the model from file and use it to make predictions as follows:

# load model from file

loaded_model = joblib.load("pima.joblib.dat")

The example below demonstrates how you can train an XGBoost model for classification on the Pima Indians onset of diabetes dataset, save the model to file using Joblib and load it at a later time in order to make predictions.

# Train XGBoost model, save to file using joblib, load and make predictions

from numpy import loadtxt

from xgboost import XGBClassifier

from joblib import dump

from joblib import load

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

# load data

dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")

# split data into X and y

X = dataset[:,0:8]

Y = dataset[:,8]

# split data into train and test sets

seed = 7

test_size = 0.33

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)

# fit model on training data

model = XGBClassifier()

model.fit(X_train, y_train)

# save model to file

dump(model, "pima.joblib.dat")

print("Saved model to: pima.joblib.dat")

# some time later...

# load model from file

loaded_model = load("pima.joblib.dat")

print("Loaded model from: pima.joblib.dat")

# make predictions for test data

predictions = loaded_model.predict(X_test)

# evaluate predictions

accuracy = accuracy_score(y_test, predictions)

print("Accuracy: %.2f%%" % (accuracy * 100.0))

Running the example saves the model to file as pima.joblib.dat in the current working directory and also creates one file for each NumPy array within the model (in this case two additional files).

pima.joblib.dat

pima.joblib.dat_01.npy

pima.joblib.dat_02.npy

After the model is loaded, it is evaluated on the training dataset and the accuracy of the predictions is printed.

Accuracy: 77.95%

Summary

In this post, you discovered how to serialize your trained XGBoost models and later load them in order to make predictions.

Specifically, you learned:

How to serialize and later load your trained XGBoost model using the pickle API.
How to serialize and later load your trained XGBoost model using the joblib API.

Do you have any questions about serializing your XGBoost models or about this post? Ask your questions in the comments and I will do my best to answer.

Artificial Intelligence , Machine Learning and Data Science Hubspot

Wednesday, 8 May 2024

How to Save Gradient Boosting Models with XGBoost in Python

Serialize Your XGBoost Model with Pickle

Serialize XGBoost Model with joblib

Summary

No comments:

Post a Comment

Report Abuse

Labels

"Donate for a Noble Cause