Artificial Intelligence , Machine Learning and Data Science Hubspot

Unlock the Power of Artificial Intelligence, Machine Learning, and Data Science with our Blog Discover the latest insights, trends, and innovations in Artificial Intelligence (AI), Machine Learning (ML), and Data Science through our informative and engaging Hubspot blog. Gain a deep understanding of how these transformative technologies are shaping industries and revolutionizing the way we work. Stay updated with cutting-edge advancements, practical applications, and real-world use.

Saturday, 20 April 2024

Object Classification with CNNs Using the Keras Deep Learning Library

Keras is a Python library for deep learning that wraps the powerful numerical libraries Theano and TensorFlow.

A difficult problem where traditional neural networks fall down is called object recognition. It is where a model is able to identify the objects in images.

In this post, you will discover how to develop and evaluate deep learning models for object recognition in Keras. After completing this tutorial, you will know:

About the CIFAR-10 object classification dataset and how to load and use it in Keras
How to create a simple Convolutional Neural Network for object recognition

How to lift performance by creating deeper Convolutional Neural Networks

The CIFAR-10 Problem Description

The problem of automatically classifying photographs of objects is difficult because of the nearly infinite number of permutations of objects, positions, lighting, and so on. It’s a tough problem.

This is a well-studied problem in computer vision and, more recently, an important demonstration of the capability of deep learning. A standard computer vision and deep learning dataset for this problem was developed by the Canadian Institute for Advanced Research (CIFAR).

The CIFAR-10 dataset consists of 60,000 photos divided into 10 classes (hence the name CIFAR-10). Classes include common objects such as airplanes, automobiles, birds, cats, and so on. The dataset is split in a standard way, where 50,000 images are used for training a model and the remaining 10,000 for evaluating its performance.

The photos are in color with red, green, and blue components but are small, measuring 32 by 32 pixel squares.

State-of-the-art results are achieved using very large convolutional neural networks. You can learn about state-of-the-art results on CIFAR-10 on Rodrigo Benenson’s webpage. Model performance is reported in classification accuracy, with very good performance above 90%, with human performance on the problem at 94% and state-of-the-art results at 96% at the time of writing.

There is a Kaggle competition that makes use of the CIFAR-10 dataset. It is a good place to join the discussion of developing new models for the problem and picking up models and scripts as a starting point.

Need help with Deep Learning in Python?

Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with code).

Click to sign-up now and also get a free PDF Ebook version of the course.

Loading The CIFAR-10 Dataset in Keras

The CIFAR-10 dataset can easily be loaded in Keras.

Keras has the facility to automatically download standard datasets like CIFAR-10 and store them in the ~/.keras/datasets directory using the cifar10.load_data() function. This dataset is large at 163 megabytes, so it may take a few minutes to download.

Once downloaded, subsequent calls to the function will load the dataset ready for use.

The dataset is stored as pickled training and test sets, ready for use in Keras. Each image is represented as a three-dimensional matrix, with dimensions for red, green, blue, width, and height. We can plot images directly using matplotlib.

# Plot ad hoc CIFAR10 instances

from tensorflow.keras.datasets import cifar10

import matplotlib.pyplot as plt

# load data

(X_train, y_train), (X_test, y_test) = cifar10.load_data()

# create a grid of 3x3 images

for i in range(0, 9):

plt.subplot(330 + 1 + i)

plt.imshow(X_train[i])

# show the plot

plt.show()

Running the code creates a 3×3 plot of photographs. The images have been scaled up from their small 32×32 size, but you can clearly see trucks, horses, and cars. You can also see some distortion in some images that have been forced to the square aspect ratio.

Small sample of CIFAR-10 images

Simple Convolutional Neural Network for CIFAR-10

The CIFAR-10 problem is best solved using a convolutional neural network (CNN).

You can quickly start by defining all the classes and functions you will need in this example.

# Simple CNN model for CIFAR-10

from tensorflow.keras.datasets import cifar10

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.layers import Dropout

from tensorflow.keras.layers import Flatten

from tensorflow.keras.constraints import MaxNorm

from tensorflow.keras.optimizers import SGD

from tensorflow.keras.layers import Conv2D

from tensorflow.keras.layers import MaxPooling2D

from tensorflow.keras.utils import to_categorical

...

Next, you can load the CIFAR-10 dataset.

...

# load data

(X_train, y_train), (X_test, y_test) = cifar10.load_data()

The pixel values range from 0 to 255 for each of the red, green, and blue channels.

It is good practice to work with normalized data. Because the input values are well understood, you can easily normalize to the range 0 to 1 by dividing each value by the maximum observation, which is 255.

Note that the data is loaded as integers, so you must cast it to floating point values in order to perform the division.

...

# normalize inputs from 0-255 to 0.0-1.0

X_train = X_train.astype('float32')

X_test = X_test.astype('float32')

X_train = X_train / 255.0

X_test = X_test / 255.0

The output variables are defined as a vector of integers from 0 to 1 for each class.

You can use a one-hot encoding to transform them into a binary matrix to best model the classification problem. There are ten classes for this problem, so you can expect the binary matrix to have a width of 10.

...

# one hot encode outputs

y_train = to_categorical(y_train)

y_test = to_categorical(y_test)

num_classes = y_test.shape[1]

Let’s start by defining a simple CNN structure as a baseline and evaluate how well it performs on the problem.

You will use a structure with two convolutional layers followed by max pooling and a flattening out of the network to fully connected layers to make predictions.

The baseline network structure can be summarized as follows:

Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function, and a weight constraint of max norm set to 3
Dropout set to 20%
Convolutional layer, 32 feature maps with a size of 3×3, a rectifier activation function, and a weight constraint of max norm set to 3
Max Pool layer with size 2×2
Flatten layer
Fully connected layer with 512 units and a rectifier activation function
Dropout set to 50%
Fully connected output layer with 10 units and a softmax activation function

A logarithmic loss function is used with the stochastic gradient descent optimization algorithm configured with a large momentum and weight decay start with a learning rate of 0.01.

...

# Create the model

model = Sequential()

model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3), padding='same', activation='relu', kernel_constraint=MaxNorm(3)))

model.add(Dropout(0.2))

model.add(Conv2D(32, (3, 3), activation='relu', padding='same', kernel_constraint=MaxNorm(3)))

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())

model.add(Dense(512, activation='relu', kernel_constraint=MaxNorm(3)))

model.add(Dropout(0.5))

model.add(Dense(num_classes, activation='softmax'))

# Compile model

epochs = 25

lrate = 0.01

decay = lrate/epochs

sgd = SGD(learning_rate=lrate, momentum=0.9, decay=decay, nesterov=False)

model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

print(model.summary())

You can fit this model with 25 epochs and a batch size of 32.

A small number of epochs was chosen to help keep this tutorial moving. Usually, the number of epochs would be one or two orders of magnitude larger for this problem.

Once the model is fit, you evaluate it on the test dataset and print out the classification accuracy.

...

# Fit the model

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=32)

# Final evaluation of the model

scores = model.evaluate(X_test, y_test, verbose=0)

print("Accuracy: %.2f%%" % (scores[1]*100))

Tying this all together, the complete example is listed below.

# Simple CNN model for the CIFAR-10 Dataset

from tensorflow.keras.datasets import cifar10

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.layers import Dropout

from tensorflow.keras.layers import Flatten

from tensorflow.keras.constraints import MaxNorm

from tensorflow.keras.optimizers import SGD

from tensorflow.keras.layers import Conv2D

from tensorflow.keras.layers import MaxPooling2D

from tensorflow.keras.utils import to_categorical

# load data

(X_train, y_train), (X_test, y_test) = cifar10.load_data()

# normalize inputs from 0-255 to 0.0-1.0

X_train = X_train.astype('float32')

X_test = X_test.astype('float32')

X_train = X_train / 255.0

X_test = X_test / 255.0

# one hot encode outputs

y_train = to_categorical(y_train)

y_test = to_categorical(y_test)

num_classes = y_test.shape[1]

# Create the model

model = Sequential()

model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3), padding='same', activation='relu', kernel_constraint=MaxNorm(3)))

model.add(Dropout(0.2))

model.add(Conv2D(32, (3, 3), activation='relu', padding='same', kernel_constraint=MaxNorm(3)))

model.add(MaxPooling2D())

model.add(Flatten())

model.add(Dense(512, activation='relu', kernel_constraint=MaxNorm(3)))

model.add(Dropout(0.5))

model.add(Dense(num_classes, activation='softmax'))

# Compile model

epochs = 25

lrate = 0.01

decay = lrate/epochs

sgd = SGD(learning_rate=lrate, momentum=0.9, decay=decay, nesterov=False)

model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

model.summary()

# Fit the model

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=32)

# Final evaluation of the model

scores = model.evaluate(X_test, y_test, verbose=0)

print("Accuracy: %.2f%%" % (scores[1]*100))

Running this example provides the results below. First, the network structure is summarized, which confirms the design was implemented correctly.

Model: "sequential"

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

conv2d (Conv2D) (None, 32, 32, 32) 896

dropout (Dropout) (None, 32, 32, 32) 0

conv2d_1 (Conv2D) (None, 32, 32, 32) 9248

max_pooling2d (MaxPooling2D (None, 16, 16, 32) 0

)

flatten (Flatten) (None, 8192) 0

dense (Dense) (None, 512) 4194816

dropout_1 (Dropout) (None, 512) 0

dense_1 (Dense) (None, 10) 5130

=================================================================

Total params: 4,210,090

Trainable params: 4,210,090

Non-trainable params: 0

_________________________________________________________________

The classification accuracy and loss are printed after each epoch on both the training and test datasets.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

The model is evaluated on the test set and achieves an accuracy of 70.5%, which is not excellent.

...

Epoch 20/25

1563/1563 [==============================] - 34s 22ms/step - loss: 0.3001 - accuracy: 0.8944 - val_loss: 1.0160 - val_accuracy: 0.6984

Epoch 21/25

1563/1563 [==============================] - 35s 23ms/step - loss: 0.2783 - accuracy: 0.9021 - val_loss: 1.0339 - val_accuracy: 0.6980

Epoch 22/25

1563/1563 [==============================] - 35s 22ms/step - loss: 0.2623 - accuracy: 0.9084 - val_loss: 1.0271 - val_accuracy: 0.7014

Epoch 23/25

1563/1563 [==============================] - 33s 21ms/step - loss: 0.2536 - accuracy: 0.9104 - val_loss: 1.0441 - val_accuracy: 0.7011

Epoch 24/25

1563/1563 [==============================] - 34s 22ms/step - loss: 0.2383 - accuracy: 0.9180 - val_loss: 1.0576 - val_accuracy: 0.7012

Epoch 25/25

1563/1563 [==============================] - 37s 24ms/step - loss: 0.2245 - accuracy: 0.9219 - val_loss: 1.0544 - val_accuracy: 0.7050

Accuracy: 70.50%

You can improve the accuracy significantly by creating a much deeper network. This is what you will look at in the next section.

Larger Convolutional Neural Network for CIFAR-10

You have seen that a simple CNN performs poorly on this complex problem. In this section, you will look at scaling up the size and complexity of your model.

Let’s design a deep version of the simple CNN above. You can introduce an additional round of convolutions with many more feature maps. You will use the same pattern of Convolutional, Dropout, Convolutional, and Max Pooling layers.

This pattern will be repeated three times with 32, 64, and 128 feature maps. The effect is an increasing number of feature maps with a smaller and smaller size given the max pooling layers. Finally, an additional and larger Dense layer will be used at the output end of the network in an attempt to better translate the large number of feature maps to class values.

A summary of the new network architecture is as follows:

Convolutional input layer, 32 feature maps with a size of 3×3, and a rectifier activation function
Dropout layer at 20%
Convolutional layer, 32 feature maps with a size of 3×3, and a rectifier activation function
Max Pool layer with size 2×2
Convolutional layer, 64 feature maps with a size of 3×3, and a rectifier activation function
Dropout layer at 20%.
Convolutional layer, 64 feature maps with a size of 3×3, and a rectifier activation function
Max Pool layer with size 2×2
Convolutional layer, 128 feature maps with a size of 3×3, and a rectifier activation function
Dropout layer at 20%
Convolutional layer,128 feature maps with a size of 3×3, and a rectifier activation function
Max Pool layer with size 2×2
Flatten layer
Dropout layer at 20%
Fully connected layer with 1024 units and a rectifier activation function
Dropout layer at 20%
Fully connected layer with 512 units and a rectifier activation function
Dropout layer at 20%
Fully connected output layer with 10 units and a softmax activation function

You can very easily define this network topology in Keras as follows:

...

# Create the model

model = Sequential()

model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3), activation='relu', padding='same'))

model.add(Dropout(0.2))

model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))

model.add(MaxPooling2D())

model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))

model.add(Dropout(0.2))

model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))

model.add(MaxPooling2D())

model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))

model.add(Dropout(0.2))

model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))

model.add(MaxPooling2D())

model.add(Flatten())

model.add(Dropout(0.2))

model.add(Dense(1024, activation='relu', kernel_constraint=MaxNorm(3)))

model.add(Dropout(0.2))

model.add(Dense(512, activation='relu', kernel_constraint=MaxNorm(3)))

model.add(Dropout(0.2))

model.add(Dense(num_classes, activation='softmax'))

# Compile model

epochs = 25

lrate = 0.01

decay = lrate/epochs

sgd = SGD(learning_rate=lrate, momentum=0.9, decay=decay, nesterov=False)

model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

model.summary()

...

You can fit and evaluate this model using the same procedure from above and the same number of epochs but a larger batch size of 64, found through some minor experimentation.

...

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=64)

# Final evaluation of the model

scores = model.evaluate(X_test, y_test, verbose=0)

print("Accuracy: %.2f%%" % (scores[1]*100))

Tying this all together, the complete example is listed below.

# Large CNN model for the CIFAR-10 Dataset

from tensorflow.keras.datasets import cifar10

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.layers import Dropout

from tensorflow.keras.layers import Flatten

from tensorflow.keras.constraints import MaxNorm

from tensorflow.keras.optimizers import SGD

from tensorflow.keras.layers import Conv2D

from tensorflow.keras.layers import MaxPooling2D

from tensorflow.keras.utils import to_categorical

# load data

(X_train, y_train), (X_test, y_test) = cifar10.load_data()

# normalize inputs from 0-255 to 0.0-1.0

X_train = X_train.astype('float32')

X_test = X_test.astype('float32')

X_train = X_train / 255.0

X_test = X_test / 255.0

# one hot encode outputs

y_train = to_categorical(y_train)

y_test = to_categorical(y_test)

num_classes = y_test.shape[1]

# Create the model

model = Sequential()

model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3), activation='relu', padding='same'))

model.add(Dropout(0.2))

model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))

model.add(MaxPooling2D())

model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))

model.add(Dropout(0.2))

model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))

model.add(MaxPooling2D())

model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))

model.add(Dropout(0.2))

model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))

model.add(MaxPooling2D())

model.add(Flatten())

model.add(Dropout(0.2))

model.add(Dense(1024, activation='relu', kernel_constraint=MaxNorm(3)))

model.add(Dropout(0.2))

model.add(Dense(512, activation='relu', kernel_constraint=MaxNorm(3)))

model.add(Dropout(0.2))

model.add(Dense(num_classes, activation='softmax'))

# Compile model

epochs = 25

lrate = 0.01

decay = lrate/epochs

sgd = SGD(learning_rate=lrate, momentum=0.9, decay=decay, nesterov=False)

model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

model.summary()

# Fit the model

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=64)

# Final evaluation of the model

scores = model.evaluate(X_test, y_test, verbose=0)

print("Accuracy: %.2f%%" % (scores[1]*100))

Running this example prints the classification accuracy and loss on the training and test datasets for each epoch.

The estimate of classification accuracy for the final model is 79.5% which is nine points better than our simpler model.

...

Epoch 20/25

782/782 [==============================] - 50s 64ms/step - loss: 0.4949 - accuracy: 0.8237 - val_loss: 0.6161 - val_accuracy: 0.7864

Epoch 21/25

782/782 [==============================] - 51s 65ms/step - loss: 0.4794 - accuracy: 0.8308 - val_loss: 0.6184 - val_accuracy: 0.7866

Epoch 22/25

782/782 [==============================] - 50s 64ms/step - loss: 0.4660 - accuracy: 0.8347 - val_loss: 0.6158 - val_accuracy: 0.7901

Epoch 23/25

782/782 [==============================] - 50s 64ms/step - loss: 0.4523 - accuracy: 0.8395 - val_loss: 0.6112 - val_accuracy: 0.7919

Epoch 24/25

782/782 [==============================] - 50s 64ms/step - loss: 0.4344 - accuracy: 0.8454 - val_loss: 0.6080 - val_accuracy: 0.7886

Epoch 25/25

782/782 [==============================] - 50s 64ms/step - loss: 0.4231 - accuracy: 0.8487 - val_loss: 0.6076 - val_accuracy: 0.7950

Accuracy: 79.50%

Extensions to Improve Model Performance

You have achieved good results on this very difficult problem, but you are still a good way from achieving world-class results.

Below are some ideas that you can try to extend upon the models and improve model performance.

Train for More Epochs. Each model was trained for a very small number of epochs, 25. It is common to train large convolutional neural networks for hundreds or thousands of epochs. You should expect performance gains can be achieved by significantly raising the number of training epochs.
Image Data Augmentation. The objects in the image vary in their position. Another boost in model performance can likely be achieved by using some data augmentation. Methods such as standardization, random shifts, or horizontal image flips may be beneficial.
Deeper Network Topology. The larger network presented is deep, but larger networks could be designed for the problem. This may involve more feature maps closer to the input and perhaps less aggressive pooling. Additionally, standard convolutional network topologies that have been shown useful may be adopted and evaluated on the problem.

Summary

In this post, you discovered how to create deep learning models in Keras for object recognition in photographs.

After working through this tutorial, you learned:

About the CIFAR-10 dataset and how to load it in Keras and plot ad hoc examples from the dataset
How to train and evaluate a simple Convolutional Neural Network on the problem
How to expand a simple Convolutional Neural Network into a deep Convolutional Neural Network in order to boost performance on the difficult problem
How to use data augmentation to get a further boost on the difficult object recognition problem

Do you have any questions about object recognition or this post? Ask your question in the comments, and I will do my best to answer.

Artificial Intelligence , Machine Learning and Data Science Hubspot

Saturday, 20 April 2024

Object Classification with CNNs Using the Keras Deep Learning Library

The CIFAR-10 Problem Description

Need help with Deep Learning in Python?

Loading The CIFAR-10 Dataset in Keras

Simple Convolutional Neural Network for CIFAR-10

Larger Convolutional Neural Network for CIFAR-10

Extensions to Improve Model Performance

Summary

No comments:

Post a Comment

Report Abuse

Labels

"Donate for a Noble Cause