Product

Wednesday, 21 February 2024

Binary Classification Tutorial with the Keras Deep Learning Library

 


Keras is a Python library for deep learning that wraps the efficient numerical libraries TensorFlow and Theano.

Keras allows you to quickly and simply design and train neural networks and deep learning models.

In this post, you will discover how to effectively use the Keras library in your machine learning project by working through a binary classification project step-by-step.

After completing this tutorial, you will know:

  • How to load training data and make it available to Keras
  • How to design and train a neural network for tabular data
  • How to evaluate the performance of a neural network model in Keras on unseen data
  • How to perform data preparation to improve skill when using neural networks
  • How to tune the topology and configuration of neural networks in Keras

1. Description of the Dataset

The dataset you will use in this tutorial is the Sonar dataset.

This is a dataset that describes sonar chirp returns bouncing off different services. The 60 input variables are the strength of the returns at different angles. It is a binary classification problem that requires a model to differentiate rocks from metal cylinders.

You can learn more about this dataset on the UCI Machine Learning repository. You can download the dataset for free and place it in your working directory with the filename sonar.csv.

It is a well-understood dataset. All the variables are continuous and generally in the range of 0 to 1. The output variable is a string “M” for mine and “R” for rock, which will need to be converted to integers 1 and 0.

A benefit of using this dataset is that it is a standard benchmark problem. This means that we have some idea of the expected skill of a good model. Using cross-validation, a neural network should be able to achieve a performance of around 84% with an upper bound on accuracy for custom models at around 88%.

2. Baseline Neural Network Model Performance

Let’s create a baseline model and result for this problem.

You will start by importing all the classes and functions you will need.

Now, you can load the dataset using pandas and split the columns into 60 input variables (X) and one output variable (Y). Use pandas to load the data because it easily handles strings (the output variable), whereas attempting to load the data directly using NumPy would be more difficult.

The output variable is string values. You must convert them into integer values 0 and 1.

You can do this using the LabelEncoder class from scikit-learn. This class will model the encoding required using the entire dataset via the fit() function, then apply the encoding to create a new output variable using the transform() function.

You are now ready to create your neural network model using Keras.

You will use scikit-learn to evaluate the model using stratified k-fold cross validation. This is a resampling technique that will provide an estimate of the performance of the model. It does this by splitting the data into k-parts and training the model on all parts except one, which is held out as a test set to evaluate the performance of the model. This process is repeated k-times, and the average score across all constructed models is used as a robust estimate of performance. It is stratified, meaning that it will look at the output values and attempt to balance the number of instances that belong to each class in the k-splits of the data.

To use Keras models with scikit-learn, you must use the KerasClassifier wrapper from the SciKeras module. This class takes a function that creates and returns our neural network model. It also takes arguments that it will pass along to the call to fit(), such as the number of epochs and the batch size.

Let’s start by defining the function that creates your baseline model. Your model will have a single, fully connected hidden layer with the same number of neurons as input variables. This is a good default starting point when creating neural networks.

The weights are initialized using a small Gaussian random number. The Rectifier activation function is used. The output layer contains a single neuron in order to make predictions. It uses the sigmoid activation function in order to produce a probability output in the range of 0 to 1 that can easily and automatically be converted to crisp class values.

Finally, you will use the logarithmic loss function (binary_crossentropy) during training, the preferred loss function for binary classification problems. The model also uses the efficient Adam optimization algorithm for gradient descent, and accuracy metrics will be collected when the model is trained.

Now, it is time to evaluate this model using stratified cross validation in the scikit-learn framework.

Pass the number of training epochs to the KerasClassifier, again using reasonable default values. Verbose output is also turned off, given that the model will be created ten times for the 10-fold cross validation being performed.

After tying this together, the complete example is listed below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this code produces the following output showing the mean and standard deviation of the estimated accuracy of the model on unseen data.

This is an excellent score without doing any hard work.

3. Re-Run the Baseline Model with Data Preparation

It is a good practice to prepare your data before modeling.

Neural network models are especially suitable for having consistent input values, both in scale and distribution.

Standardization is an effective data preparation scheme for tabular data when building neural network models. This is where the data is rescaled such that the mean value for each attribute is 0, and the standard deviation is 1. This preserves Gaussian and Gaussian-like distributions while normalizing the central tendencies for each attribute.

You can use scikit-learn to perform the standardization of your sonar dataset using the StandardScaler class.

Rather than performing the standardization on the entire dataset, it is good practice to train the standardization procedure on the training data within the pass of a cross-validation run and use the trained standardization to prepare the “unseen” test fold. This makes standardization a step in model preparation in the cross-validation process. It prevents the algorithm from having knowledge of “unseen” data during evaluation, knowledge that might be passed from the data preparation scheme like a crisper distribution.

You can achieve this in scikit-learn using a Pipeline. The pipeline is a wrapper that executes one or more models within a pass of the cross-validation procedure. Here, you can define a pipeline with the StandardScaler followed by your neural network model.

After tying this together, the complete example is listed below.

Running this example provides the results below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

You now see a small but very nice lift in the mean accuracy.

4. Tuning Layers and Number of Neurons in the Model

There are many things to tune on a neural network, such as weight initialization, activation functions, optimization procedure, and so on.

One aspect that may have an outsized effect is the structure of the network itself, called the network topology. In this section, you will look at two experiments on the structure of the network: making it smaller and making it larger.

These are good experiments to perform when tuning a neural network on your problem.

4.1. Evaluate a Smaller Network

Note that there is likely a lot of redundancy in the input variables for this problem.

The data describes the same signal from different angles. Perhaps some of those angles are more relevant than others. So you can force a type of feature extraction by the network by restricting the representational space in the first hidden layer.

In this experiment, you will take your baseline model with 60 neurons in the hidden layer and reduce it by half to 30. This will pressure the network during training to pick out the most important structure in the input data to model.

You will also standardize the data as in the previous experiment with data preparation and try to take advantage of the slight lift in performance.

After tying this together, the complete example is listed below.

Running this example provides the following result. You can see that you have a very slight boost in the mean estimated accuracy and an important reduction in the standard deviation (average spread) of the accuracy scores for the model.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

This is a great result because you are doing slightly better with a network half the size, which, in turn, takes half the time to train.

4.2. Evaluate a Larger Network

A neural network topology with more layers offers more opportunities for the network to extract key features and recombine them in useful nonlinear ways.

You can easily evaluate whether adding more layers to the network improves the performance by making another small tweak to the function used to create our model. Here, you add one new layer (one line) to the network that introduces another hidden layer with 30 neurons after the first hidden layer.

Your network now has the topology:

The idea here is that the network is given the opportunity to model all input variables before being bottlenecked and forced to halve the representational capacity, much like you did in the experiment above with the smaller network.

Instead of squeezing the representation of the inputs themselves, you have an additional hidden layer to aid in the process.

After tying this together, the complete example is listed below.

Running this example produces the results below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

You can see that you do not get a lift in the model performance. This may be statistical noise or a sign that further training is needed.

With further tuning of aspects like the optimization algorithm and the number of training epochs, it is expected that further improvements are possible. What is the best score that you can achieve on this dataset?

Summary

In this post, you discovered the Keras deep Learning library in Python.

You learned how you can work through a binary classification problem step-by-step with Keras, specifically:

  • How to load and prepare data for use in Keras
  • How to create a baseline neural network model
  • How to evaluate a Keras model using scikit-learn and stratified k-fold cross validation
  • How data preparation schemes can lift the performance of your models
  • How experiments adjusting the network topology can lift model performance

Do you have any questions about deep learning with Keras or this post? Ask your questions in the comments, and I will do my best to answer.

No comments:

Post a Comment

Connect broadband

How to raise an emotionally mature child

  How to raise an emotionally mature child Welcome back! One woman was frustrated with her strength t...