In our previous PyTorch notebook, we learned about how to get started quickly with PyTorch 1.2 using Google Colab. In this tutorial, we are going to take a step back and review some of the basic components of building a deep learning model using PyTorch.
This will be a brief tutorial and will avoid using jargon and overcomplicated code. That said, this is perhaps the most basic of models you can build with PyTorch.
If fact, it is so basic that it's ideal for those starting to learn about PyTorch and deep learning. So if you have a friend or colleague that wants to jump in, I highly encourage you to refer them to this tutorial as a starting point. Let's get started!
Author: Elvis Saravia
Complete Code Walkthrough: Blog post
Before getting started, we need to import a few modules which will be useful to obtain the necessary functions that will help us to build our deep learning model. The main ones are
torchvision. They contain the majority of the functions that you need to get started with PyTorch. However, as this is a deep learning tutorial we will need
torchvision.transforms which all contain utility functions to build our model. We probably won't use all the modules listed below but they are the typical modules you will be importing when starting your deep learning projects.
## The usual imports import torch import torch.nn as nn import torch.nn.functional as F import torchvision import torchvision.transforms as transforms ## for printing image import matplotlib.pyplot as plt import numpy as np
Below we check for the PyTorch version just to make sure that you are using the proper version. At the time of this tutorial, we are working with PyTorch 1.2.
## print out the pytorch version used print(torch.__version__)
Let's get right into it! As with any machine learning project, you need to load your dataset. We are using the MNIST dataset, which is the Hello World of datasets in the machine learning world.
The data consists of number images that are of size
28 X 28. We will discuss the images shortly, but our plan is to load data into batches of size
32, similar to the figure below.
Here are the complete steps we are performing when importing our data:
- We will import and tranform the data into tensors using the
- We will use
DataLoaderto build convenient data loaders, which makes it easy to efficiently feed data in batches to deep learning models. We will get to the topic of batches in a bit but for now just think of them as subsets of your data.
- As hinted above, we will also create batches of the data by setting the
batchparameter inside the data loader. Notice we use batches of
32in this tutorial but you can change it to
64if you like.
%%capture ## parameter denoting the batch size BATCH_SIZE = 32 ## transformations transform = transforms.Compose( [transforms.ToTensor()]) ## download and load training dataset trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform) trainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2) ## download and load testing dataset testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform) testloader = torch.utils.data.DataLoader(testset, batch_size=BATCH_SIZE, shuffle=False, num_workers=2)
Let's inspect what the trainset and testset objects contain.
## print the trainset and testset print(trainset) print(testset)
Dataset MNIST Number of datapoints: 60000 Root location: ./data Split: Train StandardTransform Transform: Compose( ToTensor() ) Dataset MNIST Number of datapoints: 10000 Root location: ./data Split: Test StandardTransform Transform: Compose( ToTensor() )
This is a beginner's tutorial so I will break down things a bit here:
BATCH_SIZEis a parameter that denotes the batch size we will use for our model
transformholds code for whatever transformations you will apply to your data. I will show you an example below to demonstrate exactly what it does to shed more light into its use
testsetcontain the actual dataset object. Notice I use
train=Trueto specify that this corresponds to the training dataset, and I use
train=Falseto specify that this is the remainder of the dataset which we call the testset. From the portion I printed above you can see that the split of the data was 85% (60000) / 15% (10000), corresponding to the portions of samples for training set and testing set, respectively.
trainloaderis what holds the data loader object which takes care of shuffling the data and constructing the batches.
Now let's look at that
transforms.Compose(...) function and see what it does. We will use a randomized image to demonstrate its use. Let's generate an image.
image = transforms.ToPILImage(mode='L')(torch.randn(1, 96, 96))
And let's render it:
<matplotlib.image.AxesImage at 0x7f2ff27d6588>
Okay, we have our image sample. And now let's apply some dummy transformation to it. We are going to rotate the image by
45 degrees. The transformation below takes care of that:
## dummy transformation dummy_transform = transforms.Compose( [transforms.RandomRotation(45, fill=(0,))]) dummy_result = dummy_transform(image) plt.imshow(dummy_result)
<matplotlib.image.AxesImage at 0x7f2ff22f5b00>
Notice you can put the transformations within
transforms.Compose(...). You can use the built in transformations offered by PyTorch or you can build your own and compose as you wish. In fact, you can place as many transformation as you wish in there. Let's try another composition of transformations: rotate + vertical flip.
## dummy transform dummy2_transform = transforms.Compose( [transforms.RandomRotation(45, fill=(0,)), transforms.RandomVerticalFlip()]) dummy2_result = dummy2_transform(image) plt.imshow(dummy2_result)
<matplotlib.image.AxesImage at 0x7f2ff112a358>
That's pretty cool right! Keep trying other transform methods. On the topic of exploring our data further, let's take a look at our images dataset.
As a practioner and researcher, I am always spend a bit of time and effort exploring and understanding my datasets. It's fun and this is a good practise to ensure that everything is in order.
Let's check what the train and test dataset contain. I will use matplotlib to print out some of the images from our dataset. With a bit of numpy I can convert images into numpy and print them out. Below I print out an entire batch.
## functions to show an image def imshow(img): #img = img / 2 + 0.5 # unnormalize npimg = img.numpy() plt.imshow(np.transpose(npimg, (1, 2, 0))) ## get some random training images dataiter = iter(trainloader) images, labels = dataiter.next() ## show images imshow(torchvision.utils.make_grid(images))
The dimensions of our batches are as follow:
for images, labels in trainloader: print("Image batch dimensions:", images.shape) print("Image label dimensions:", labels.shape) break
Image batch dimensions: torch.Size([32, 1, 28, 28]) Image label dimensions: torch.Size()
Now it's time to build the deep learning model that will be used to perform the image classification. We will keeps things simple and stack a few dense layers and a dropout layer to train our model.
Let's discuss a bit about the model:
- First of all the following structure involving a
classis standard code that's used to build the neural network model in PyTorch:
class MyModel(nn.Module): def __init__(self): super(MyModel, self).__init__() # layers go here def forward(self, x): # computations go here
- The layers are defined inside
super(...).__init__()is just there to stick things together. For our model, we stack a hidden layer (
self.d1) followed by a dropout layer (
self.dropout), which is then followed by an output layer (
nn.Linear(...)defines the dense layer and it requires the
outdimensions, which corresponds to the size of the input feature and output feature of that layer, respectively.
nn.Dropout(...)is used to define a dropout layer. Dropout is an approach in deep learning that helps a model to avoid overfitting. This means that dropout acts as a regularization technique that helps the model to not overfit on the images it has seen while training. We want this because we need a model that generalizes well to unseen examples -- in our case, the testing dataset. Dropout randomly zeroes some of the units of the neural network layer with probability of
p=0.2. Read more about the dropout layer here.
- The entry point of the model, i.e. where the data enters, is placed under the
forward(...)function. Typically, we also place other transformations we perform on the data while training inside this function.
- In the
forward()function we are performing a series of computations on the input data
- we flatten the images first, converting it from 2D (
28 X 28) to 1D (
1 X 784).
- then we feed the batches of those 1D images into the first hidden layer
- the output of that hidden layer is then applied a non-linear activate function) called
ReLU. It's not so important to know what
F.relu()does at the moment, but the effect that it achieves is that it allows faster and more effective training of neural architectures on large datasets
- as explained above, the dropout also helps the model to train more efficiently by avoiding overfitting on the training data
- we then feed the output of that dropout layer into the output layer (
- the result of that is then fed to the softmax function, which converts or normalized the output into a probability distribution which helps with outputting proper predictions values that are used to calculate the accuracy of the model; this will the final output of the model
- we flatten the images first, converting it from 2D (
## the model class MyModel(nn.Module): def __init__(self): super(MyModel, self).__init__() self.d1 = nn.Linear(28 * 28, 128) self.dropout = nn.Dropout(p=0.2) self.d2 = nn.Linear(128, 10) def forward(self, x): x = x.flatten(start_dim = 1) x = self.d1(x) x = F.relu(x) x = self.dropout(x) logits = self.d2(x) out = F.softmax(logits, dim=1) return out
Visually, the following is a diagram of the model we have built. Just keep in mind that the hidden layer is much bigger as shown in the diagram but due to space constraint, the diagram is just an approximation to the actual model.
As I have done in my previous tutorials, I always encourage to test the model with 1 batch to ensure that the output dimensions are what we expect. Notice how we are iterating over the dataloader which conveniently stores the
out contains the output of the model, which are the logits applied a
softmax layer which helps with prediction.
## test the model with 1 batch model = MyModel() for images, labels in trainloader: print("batch size:", images.shape) out = model(images) print(out.shape) break
batch size: torch.Size([32, 1, 28, 28]) torch.Size([32, 10])
We can clearly see that we get back the batches with 10 output values associate with it. These are used to compute the performance of the model.
Now we are ready to train the model but before that we are going to setup a loss function, an optimizer and a function to compute accuracy of the model.
learning_rateis the rate at which the model will try to optimize its weights, which is just another parameter for the model.
num_epochsis the number of training steps.
devicedetermines what hardware we will use to train the model. If a
gpuis present, then that will be used, otherwise it defaults to the
modelis just the model instance.
model.to(device)is in charge of setting the actaull device that will be used for training the model
criterionis just the metric that's used to compute the loss of the model while it forward and backward trains to optimize its weights.
optimizeris the optimization technique used to modify the weights in the backward propagation. Notice that it requires the
learning_rateand the model parameters which are part of the calculation to optimize weights.
learning_rate = 0.001 num_epochs = 5 device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") model = MyModel() model = model.to(device) criterion = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
The utility function below helps to calculate the accuracy of the model. For now, it's not important to understand how it's calculated but basically it compares the outputs of the model (predictions) with the actual target values (i.e., the labels of the dataset), and tries to compute the average of correct predictions.
## utility function to compute accuracy def get_accuracy(output, target, batch_size): ''' Obtain accuracy for training round ''' corrects = (torch.max(output, 1).view(target.size()).data == target.data).sum() accuracy = 100.0 * corrects/batch_size return accuracy.item()
Now it's time to train the model. The code portion that follows can be descrive in the following steps:
- The first thing in training a neural network model is defining the training loop, which is achieved by:
for epoch in range(num_epochs): ...
- We define two variables,
train_accthat will help us to monitor the running accuracy and loss of the modes while it trains over the different batches.
model.train()explicitly indicates that we are ready to start training.
- Notice how we are iterating over the dataloader, which conveniently gives us the batches in image-label pairs.
- That second
forloop means that for every training step we will iterate over all the batches and train the model over them.
- We feed the model the images via
model(images)and the output are the predictions of the model.
- The predictions together with the target labels are used to compute the loss using the loss function we defined earlier.
Before we update our weights for the next round of training, we perform the following steps:
- we use the optimizer object to reset all the gradients for the variables it will update. This is a safe step and it doesn't overwrites the gradients the model accumulates while training (those are stored in a buffer link text via the `loss.backward() call)
loss.backward()simply computes the gradient of the loss w.r.t to the model parameters
optimizer.step()then ensures that the model parameters are updated
Then we gather and accumulate the loss and accuracy, which is what we will use to tell us if the model is learning properly
## train the model for epoch in range(num_epochs): train_running_loss = 0.0 train_acc = 0.0 ## commence training model = model.train() ## training step for i, (images, labels) in enumerate(trainloader): images = images.to(device) labels = labels.to(device) ## forward + backprop + loss predictions = model(images) loss = criterion(predictions, labels) optimizer.zero_grad() loss.backward() ## update model params optimizer.step() train_running_loss += loss.detach().item() train_acc += get_accuracy(predictions, labels, BATCH_SIZE) model.eval() print('Epoch: %d | Loss: %.4f | Train Accuracy: %.2f' \ %(epoch, train_running_loss / i, train_acc/i))
Epoch: 0 | Loss: 1.5956 | Train Accuracy: 88.89 Epoch: 1 | Loss: 1.5311 | Train Accuracy: 93.71 Epoch: 2 | Loss: 1.5156 | Train Accuracy: 95.17 Epoch: 3 | Loss: 1.5072 | Train Accuracy: 95.87 Epoch: 4 | Loss: 1.5019 | Train Accuracy: 96.34
After all the training steps are over, we can clearly see that the loss keeps decreasing while the training accuracy of the model keeps rising, which is a good sign that the model is effectively learning to classify images.
We can verify that by computing the accuracy on the testing dataset to see how well the model performs on the image classificaiton task. As you can see below, our basic CNN model is performing very well on the MNIST classification task.
test_acc = 0.0 for i, (images, labels) in enumerate(testloader, 0): images = images.to(device) labels = labels.to(device) outputs = model(images) test_acc += get_accuracy(outputs, labels, BATCH_SIZE) print('Test Accuracy: %.2f'%( test_acc/i))
Test Accuracy: 97.03
Congratulation! You have made it to the end of this tutorial. This is a really long tutorial that aims to give an very basic introduction to the fundamentals of image classification using neural networks and PyTorch.
This tutorial was heavily inspired by this TensorFlow tutorial. We thank the authors of the corresponding reference for their valuable work.