In our previous PyTorch notebook, we learned about how to get started quickly with PyTorch 1.2 using Google Colab. In this tutorial, we are going to take a step back and review some of the basic components of building a deep learning model using PyTorch.

This will be a brief tutorial and will avoid using jargon and overcomplicated code. That said, this is perhaps the most basic of models you can build with PyTorch.

If fact, it is so basic that it's ideal for those starting to learn about PyTorch and deep learning. So if you have a friend or colleague that wants to jump in, I highly encourage you to refer them to this tutorial as a starting point. Let's get started!

Author: Elvis Saravia

Complete Code Walkthrough: Blog post

Getting Started

Before getting started, we need to import a few modules which will be useful to obtain the necessary functions that will help us to build our deep learning model. The main ones are torch and torchvision. They contain the majority of the functions that you need to get started with PyTorch. However, as this is a deep learning tutorial we will need torch.nn, torch.nn.functional and torchvision.transforms which all contain utility functions to build our model. We probably won't use all the modules listed below but they are the typical modules you will be importing when starting your deep learning projects.

## The usual imports
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms

## for printing image
import matplotlib.pyplot as plt
import numpy as np

Below we check for the PyTorch version just to make sure that you are using the proper version. At the time of this tutorial, we are working with PyTorch 1.2.

## print out the pytorch version used

Loading the Data

Let's get right into it! As with any machine learning project, you need to load your dataset. We are using the MNIST dataset, which is the Hello World of datasets in the machine learning world.

The data consists of number images that are of size 28 X 28. We will discuss the images shortly, but our plan is to load data into batches of size 32, similar to the figure below.

alt text

Here are the complete steps we are performing when importing our data:

  • We will import and tranform the data into tensors using the transforms module
  • We will use DataLoader to build convenient data loaders, which makes it easy to efficiently feed data in batches to deep learning models. We will get to the topic of batches in a bit but for now just think of them as subsets of your data.
  • As hinted above, we will also create batches of the data by setting the batch parameter inside the data loader. Notice we use batches of 32 in this tutorial but you can change it to 64 if you like.
## parameter denoting the batch size

## transformations
transform = transforms.Compose(

## download and load training dataset
trainset = torchvision.datasets.MNIST(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE,
                                          shuffle=True, num_workers=2)

## download and load testing dataset
testset = torchvision.datasets.MNIST(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=BATCH_SIZE,
                                         shuffle=False, num_workers=2)

Let's inspect what the trainset and testset objects contain.

## print the trainset and testset
Dataset MNIST
    Number of datapoints: 60000
    Root location: ./data
    Split: Train
Transform: Compose(
Dataset MNIST
    Number of datapoints: 10000
    Root location: ./data
    Split: Test
Transform: Compose(

This is a beginner's tutorial so I will break down things a bit here:

  • BATCH_SIZE is a parameter that denotes the batch size we will use for our model
  • transform holds code for whatever transformations you will apply to your data. I will show you an example below to demonstrate exactly what it does to shed more light into its use
  • trainset and testset contain the actual dataset object. Notice I use train=True to specify that this corresponds to the training dataset, and I use train=False to specify that this is the remainder of the dataset which we call the testset. From the portion I printed above you can see that the split of the data was 85% (60000) / 15% (10000), corresponding to the portions of samples for training set and testing set, respectively.
  • trainloader is what holds the data loader object which takes care of shuffling the data and constructing the batches.

Now let's look at that transforms.Compose(...) function and see what it does. We will use a randomized image to demonstrate its use. Let's generate an image.

image = transforms.ToPILImage(mode='L')(torch.randn(1, 96, 96))

And let's render it:

<matplotlib.image.AxesImage at 0x7f2ff27d6588>

Okay, we have our image sample. And now let's apply some dummy transformation to it. We are going to rotate the image by 45 degrees. The transformation below takes care of that:

## dummy transformation
dummy_transform = transforms.Compose(
    [transforms.RandomRotation(45, fill=(0,))])

dummy_result = dummy_transform(image)

<matplotlib.image.AxesImage at 0x7f2ff22f5b00>

Notice you can put the transformations within transforms.Compose(...). You can use the built in transformations offered by PyTorch or you can build your own and compose as you wish. In fact, you can place as many transformation as you wish in there. Let's try another composition of transformations: rotate + vertical flip.

## dummy transform 
dummy2_transform = transforms.Compose(
    [transforms.RandomRotation(45, fill=(0,)), transforms.RandomVerticalFlip()])

dummy2_result = dummy2_transform(image)

<matplotlib.image.AxesImage at 0x7f2ff112a358>

That's pretty cool right! Keep trying other transform methods. On the topic of exploring our data further, let's take a look at our images dataset.

Exploring the Data

As a practioner and researcher, I am always spend a bit of time and effort exploring and understanding my datasets. It's fun and this is a good practise to ensure that everything is in order.

Let's check what the train and test dataset contain. I will use matplotlib to print out some of the images from our dataset. With a bit of numpy I can convert images into numpy and print them out. Below I print out an entire batch.

## functions to show an image
def imshow(img):
    #img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

## get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

## show images

The dimensions of our batches are as follow:

for images, labels in trainloader:
    print("Image batch dimensions:", images.shape)
    print("Image label dimensions:", labels.shape)
Image batch dimensions: torch.Size([32, 1, 28, 28])
Image label dimensions: torch.Size([32])

The Model

Now it's time to build the deep learning model that will be used to perform the image classification. We will keeps things simple and stack a few dense layers and a dropout layer to train our model.

Let's discuss a bit about the model:

  • First of all the following structure involving a class is standard code that's used to build the neural network model in PyTorch:
class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()

        # layers go here

    def forward(self, x):

        # computations go here
  • The layers are defined inside def __init__(). super(...).__init__() is just there to stick things together. For our model, we stack a hidden layer (self.d1) followed by a dropout layer (self.dropout), which is then followed by an output layer (self.d2).
  • nn.Linear(...) defines the dense layer and it requires the in and out dimensions, which corresponds to the size of the input feature and output feature of that layer, respectively.
  • nn.Dropout(...) is used to define a dropout layer. Dropout is an approach in deep learning that helps a model to avoid overfitting. This means that dropout acts as a regularization technique that helps the model to not overfit on the images it has seen while training. We want this because we need a model that generalizes well to unseen examples -- in our case, the testing dataset. Dropout randomly zeroes some of the units of the neural network layer with probability of p=0.2. Read more about the dropout layer here.
  • The entry point of the model, i.e. where the data enters, is placed under the forward(...) function. Typically, we also place other transformations we perform on the data while training inside this function.
  • In the forward() function we are performing a series of computations on the input data
    • we flatten the images first, converting it from 2D (28 X 28) to 1D (1 X 784).
    • then we feed the batches of those 1D images into the first hidden layer
    • the output of that hidden layer is then applied a non-linear activate function) called ReLU. It's not so important to know what F.relu() does at the moment, but the effect that it achieves is that it allows faster and more effective training of neural architectures on large datasets
    • as explained above, the dropout also helps the model to train more efficiently by avoiding overfitting on the training data
    • we then feed the output of that dropout layer into the output layer (d2)
    • the result of that is then fed to the softmax function, which converts or normalized the output into a probability distribution which helps with outputting proper predictions values that are used to calculate the accuracy of the model; this will the final output of the model
## the model
class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.d1 = nn.Linear(28 * 28, 128)
        self.dropout = nn.Dropout(p=0.2)
        self.d2 = nn.Linear(128, 10)
    def forward(self, x):
        x = x.flatten(start_dim = 1)
        x = self.d1(x)
        x = F.relu(x)
        x = self.dropout(x)
        logits = self.d2(x)
        out = F.softmax(logits, dim=1)
        return out

Visually, the following is a diagram of the model we have built. Just keep in mind that the hidden layer is much bigger as shown in the diagram but due to space constraint, the diagram is just an approximation to the actual model.

alt text

As I have done in my previous tutorials, I always encourage to test the model with 1 batch to ensure that the output dimensions are what we expect. Notice how we are iterating over the dataloader which conveniently stores the images and labels pairs. out contains the output of the model, which are the logits applied a softmax layer which helps with prediction.

## test the model with 1 batch
model = MyModel()
for images, labels in trainloader:
    print("batch size:", images.shape)
    out = model(images)
batch size: torch.Size([32, 1, 28, 28])
torch.Size([32, 10])

We can clearly see that we get back the batches with 10 output values associate with it. These are used to compute the performance of the model.

Training the Model

Now we are ready to train the model but before that we are going to setup a loss function, an optimizer and a function to compute accuracy of the model.

  • The learning_rate is the rate at which the model will try to optimize its weights, which is just another parameter for the model.
  • num_epochs is the number of training steps.
  • device determines what hardware we will use to train the model. If a gpu is present, then that will be used, otherwise it defaults to the cpu.
  • model is just the model instance.
  • model.to(device) is in charge of setting the actaull device that will be used for training the model
  • criterion is just the metric that's used to compute the loss of the model while it forward and backward trains to optimize its weights.
  • optimizer is the optimization technique used to modify the weights in the backward propagation. Notice that it requires the learning_rate and the model parameters which are part of the calculation to optimize weights.
learning_rate = 0.001
num_epochs = 5

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = MyModel()
model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

The utility function below helps to calculate the accuracy of the model. For now, it's not important to understand how it's calculated but basically it compares the outputs of the model (predictions) with the actual target values (i.e., the labels of the dataset), and tries to compute the average of correct predictions.

## utility function to compute accuracy
def get_accuracy(output, target, batch_size):
    ''' Obtain accuracy for training round '''
    corrects = (torch.max(output, 1)[1].view(target.size()).data == target.data).sum()
    accuracy = 100.0 * corrects/batch_size
    return accuracy.item()

Training the Model

Now it's time to train the model. The code portion that follows can be descrive in the following steps:

  • The first thing in training a neural network model is defining the training loop, which is achieved by:
for epoch in range(num_epochs):
  • We define two variables, training_running_loss and train_acc that will help us to monitor the running accuracy and loss of the modes while it trains over the different batches.
  • model.train() explicitly indicates that we are ready to start training.
  • Notice how we are iterating over the dataloader, which conveniently gives us the batches in image-label pairs.
  • That second for loop means that for every training step we will iterate over all the batches and train the model over them.
  • We feed the model the images via model(images) and the output are the predictions of the model.
  • The predictions together with the target labels are used to compute the loss using the loss function we defined earlier.
  • Before we update our weights for the next round of training, we perform the following steps:

    • we use the optimizer object to reset all the gradients for the variables it will update. This is a safe step and it doesn't overwrites the gradients the model accumulates while training (those are stored in a buffer link text via the `loss.backward() call)
    • loss.backward() simply computes the gradient of the loss w.r.t to the model parameters
    • optimizer.step() then ensures that the model parameters are updated
  • Then we gather and accumulate the loss and accuracy, which is what we will use to tell us if the model is learning properly

## train the model
for epoch in range(num_epochs):
    train_running_loss = 0.0
    train_acc = 0.0

    ## commence training
    model = model.train()

    ## training step
    for i, (images, labels) in enumerate(trainloader):
        images = images.to(device)
        labels = labels.to(device)

        ## forward + backprop + loss
        predictions = model(images)
        loss = criterion(predictions, labels)

        ## update model params

        train_running_loss += loss.detach().item()
        train_acc += get_accuracy(predictions, labels, BATCH_SIZE)
    print('Epoch: %d | Loss: %.4f | Train Accuracy: %.2f' \
          %(epoch, train_running_loss / i, train_acc/i)) 
Epoch: 0 | Loss: 1.5956 | Train Accuracy: 88.89
Epoch: 1 | Loss: 1.5311 | Train Accuracy: 93.71
Epoch: 2 | Loss: 1.5156 | Train Accuracy: 95.17
Epoch: 3 | Loss: 1.5072 | Train Accuracy: 95.87
Epoch: 4 | Loss: 1.5019 | Train Accuracy: 96.34

After all the training steps are over, we can clearly see that the loss keeps decreasing while the training accuracy of the model keeps rising, which is a good sign that the model is effectively learning to classify images.

We can verify that by computing the accuracy on the testing dataset to see how well the model performs on the image classificaiton task. As you can see below, our basic CNN model is performing very well on the MNIST classification task.

test_acc = 0.0
for i, (images, labels) in enumerate(testloader, 0):
    images = images.to(device)
    labels = labels.to(device)
    outputs = model(images)
    test_acc += get_accuracy(outputs, labels, BATCH_SIZE)
print('Test Accuracy: %.2f'%( test_acc/i))
Test Accuracy: 97.03

Final Words

Congratulation! You have made it to the end of this tutorial. This is a really long tutorial that aims to give an very basic introduction to the fundamentals of image classification using neural networks and PyTorch.

This tutorial was heavily inspired by this TensorFlow tutorial. We thank the authors of the corresponding reference for their valuable work.