Artificial Intelligence , Machine Learning and Data Science Hubspot

Unlock the Power of Artificial Intelligence, Machine Learning, and Data Science with our Blog Discover the latest insights, trends, and innovations in Artificial Intelligence (AI), Machine Learning (ML), and Data Science through our informative and engaging Hubspot blog. Gain a deep understanding of how these transformative technologies are shaping industries and revolutionizing the way we work. Stay updated with cutting-edge advancements, practical applications, and real-world use.

Saturday, 6 April 2024

How to Train Keras Deep Learning Models on AWS EC2 GPUs (step-by-step)

Keras is a Python deep learning library that provides easy and convenient access to the powerful numerical libraries like TensorFlow.

Large deep learning models require a lot of compute time to run. You can run them on your CPU but it can take hours or days to get a result. If you have access to a GPU on your desktop, you can drastically speed up the training time of your deep learning models.

In this post, you will discover how you can get access to GPUs to speed up the training of your deep learning models by using the Amazon Web Service (AWS) infrastructure. For a few dollars per hour and often a lot cheaper you can use this service from your workstation or laptop.

Tutorial Overview

The process is quite simple because most of the work has already been done for us.

Below is an overview of the process.

Setup Your AWS Account.
Launch Your AWS Instance.
Login and Run Your Code.
Close Your AWS Instance.

Note, it costs money to use a virtual server instance on Amazon. The cost is low for ad hoc model development (e.g. less than one US dollar per hour), which is why this is so attractive, but it is not free.

The server instance runs Linux. It is desirable although not required that you know how to navigate Linux or a unix-like environment. We’re just running our Python scripts, so no advanced skills are needed.

1. Setup Your AWS Account

You need an account on Amazon Web Services.

1. You can create an account by the Amazon Web Services portal and click “Sign in to the Console”. From there you can sign in using an existing Amazon account or create a new account.

AWS Sign-in Button

2. You will need to provide your details as well as a valid credit card that Amazon can charge. The process is a lot quicker if you are already an Amazon customer and have your credit card on file.

AWS Sign-In Form

Once you have an account you can log into the Amazon Web Services console.

You will see a range of different services that you can access.

2. Launch Your AWS Instance

Now that you have an AWS account, you want to launch an EC2 virtual server instance on which you can run Keras.

Launching an instance is as easy as selecting the image to load and starting the virtual server. Thankfully there is already an image available that has almost everything we need it is called the Deep Learning AMI (Amazon Linux) and was created and is maintained by Amazon. Let’s launch it as an instance.

1. Login to your AWS console if you have not already.

AWS Console

2. Click on EC2 for launching a new virtual server.
3. Select “US West Orgeon” from the drop-down in the top right hand corner. This is important otherwise you will not be able to find the image we plan to use.
4. Click the “Launch Instance” button.
5. Click “Community AMIs”. An AMI is an Amazon Machine Image. It is a frozen instance of a server that you can select and instantiate on a new virtual server.

Community AMIs

6. Enter “Deep Learning AMI” in the “Search community AMIs” search box and press enter.

Deep Learning AMI

7. Click “Select” to choose the AMI in the search result.
8. Now you need to select the hardware on which to run the image. Scroll down and select the “p3.2xlarge” hardware (I used to recommend g2 or g3 instances and p2 instances, but the p3 instances are newer and faster). This includes a Tesla V100 GPU that we can use to significantly increase the training speed of our models. It also includes 8 CPU Cores, 61GB of RAM and 16GB of GPU RAM. Note: using this instance will cost approximately $3USD/hour.

p3.2xlarge EC2 Instance

9. Click “Review and Launch” to finalize the configuration of your server instance.
10. Click the “Launch” button.
11. Select Your Key Pair.
- If you have a key pair because you have used EC2 before, select “Choose an existing key pair” and choose your key pair from the list. Then check “I” acknowledge…”.
- If you do not have a key pair, select the option “Create a new key pair” and enter a “Key pair name” such as keras-keypair. Click the “Download Key Pair” button.

Select Your Key Pair

12. Open a Terminal and change directory to where you downloaded your key pair.
13. If you have not already done so, restrict the access permissions on your key pair file. This is requred as part of the SSH access to your server. For example:

cd Downloads

chmod 600 keras-aws-keypair.pem

14. Click “Launch Instances”. If this is your first time using AWS, Amazon may have to validate your request and this could take up to 2 hours (often just a few minutes).
15. Click “View Instances” to review the status of your instance.

Deep Learning AMI Status

Your server is now running and ready for you to log in.

3. Login, Configure and Run

Now that you have launched your server instance, it is time to log in and start using it.

1. Click “View Instances” in your Amazon EC2 console if you have not already.
2. Copy “Public IP” (down the bottom of the screen in Description) to clipboard. In this example my IP address is 54.186.97.77. Do not use this IP address, your IP address will be different.
3. Open a Terminal and change directory to where you downloaded your key pair. Login to your server using SSH, for example:

ssh -i keras-aws-keypair.pem ec2-user@54.186.97.77

4. When prompted, type “yes” and press enter.

You are now logged into your server.

Terminal Login to Deep Learning AMI

The instance will ask what Python environment you wish to use. I recommend using:

TensorFlow(+Keras2) with Python3 (CUDA 9.0 and Intel MKL-DNN)

You can activate this virutal environment by typing:

source activate tensorflow_p36

This will just take a minute.

You are now ready to start training deep learning neural network models.

Looking for something to try on your new instance, see this tutorial:

Develop Your First Neural Network in Python With Keras Step-By-Step

4. Close Your AWS Instance

When you are finished with your work you must close your instance.

Remember you are charged by the amount of time that you use the instance. It is cheap, but you do not want to leave an instance on if you are not using it.

1. Log out of your instance at the terminal, for example you can type:

exit

2. Log in to your AWS account with your web browser.
3. Click EC2.
4. Click “Instances” from the left-hand side menu.

Review Your List of Running Instances

5. Select your running instance from the list (it may already be selected if you only have one running instance).

Select Your Running AWS Instance

6. Click the “Actions” button and select “Instance State” and choose “Terminate”. Confirm that you want to terminate your running instance.

It may take a number of seconds for the instance to close and to be removed from your list of instances.

Need help with Deep Learning in Python?

Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with code).

Click to sign-up now and also get a free PDF Ebook version of the course.

Tips and Tricks for Using Keras on AWS

Below are some tips and tricks for getting the most out of using Keras on AWS instances.

Design a suite of experiments to run beforehand. Experiments can take a long time to run and you are paying for the time you use. Make time to design a batch of experiments to run on AWS. Put each in a separate file and call them in turn from another script. This will allow you to answer multiple questions from one long run, perhaps overnight.
Run scripts as a background process. This will allow you to close your terminal and turn off your computer while your experiment is running.

You can do that easily as follows:

nohup /path/to/script >/path/to/script.log 2>&1 < /dev/null &

You can then check the status and results in your script.log file later. Learn more about nohup.

Always close your instance at the end of your experiments. You do not want to be surprised with a very large AWS bill.
Try spot instances for a cheaper but less reliable option. Amazon sell unused time on their hardware at a much cheaper price, but at the cost of potentially having your instance closed at any second. If you are learning or your experiments are not critical, this might be an ideal option for you. You can access spot instances from the “Spot Instance” option on the left hand side menu in your EC2 web console.

For more help on command line recopies to use on AWS, see the post:

10 Command Line Recipes for Deep Learning on Amazon Web Services

More Resources For Deep Learning on AWS

Below is a list of resources to learn more about AWS and building deep learning in the cloud.

An introduction to Amazon Elastic Compute Cloud (EC2) if you are new to all of this
An introduction to Amazon Machine Images (AMI)
Deep Learning AMI (Amazon Linux) on the AMI Marketplace.
P3 EC2 Instances

Summary

In this post, you discovered how you can develop and evaluate your large deep learning models in Keras using GPUs on the Amazon Web Service. You learned:

Amazon Web Services with their Elastic Compute Cloud offers an affordable way to run large deep learning models on GPU hardware.
How to set-up and launch an EC2 server for deep learning experiments.
How to update the Keras version on the server and confirm that the system is working correctly.
How to run Keras experiments on AWS instances in batch as background tasks.

Do you have any questions about running your models on AWS or about this post? Ask your questions in the comments and I will do my best to answer.