FastAI – Image Classification – [Chapter 5]


DataLoaders – Is a class that just stores whatever DataLoader object you pass to it and makes it available as train and valid.

To turn downloaded data into a DataLoaders object we need at least 4 things –

  1. What kind of data we are working with
  2. How to get the list of items
  3. How to label these items
  4. How to create the validation set

To create DataLoaders, fastai has API called data block API. With this API you can fully customize every stage of the creation of your DataLoaders.

Example of DataLoaders

pets = DataBlock(blocks = (ImageBlock, CategoryBlock),
                 get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'),
                 batch_tfms=aug_transforms(size=224, min_scale=0.75))
dls = pets.dataloaders(path/"images")

First we provide a tuple specifying the types we want for the independent and dependent variables

blocks = (ImageBlock, CategoryBlock)

Teh independent variables is the thing we are using to make predictions from and the dependent variables is the target.

get_image_files function take a path and returns a list of all the images in that path (recursively, by default)

Signature: get_image_files(path, recurse=True, folders=None)
def get_image_files(path, recurse=True, folders=None):
    "Get image files in `path` recursively, only in `folders`, if specified."
    return get_files(path, extensions=image_extensions, recurse=recurse, folders=folders)
File:      ~/SageMaker/.env/fastai/lib/python3.6/site-packages/fastai/data/
Type:      function

get_y – creates the labels in our dataset. Here we are using the regex to label the dataset

item_tfms – Usually the images in the dataset are of different sizes. They need to be of the same size to be grouped in a tensor. So we need to add a transform that will resize these images to the same size. Item transforms are piexes of code that run on each individual items. By default the Resize crops the images to fit a square shape of the size requested using the full width or height. You can ask fastai to pad the images with zeros(black) or squish/stretch them.

item_tfms=Resize(460, ResizedMethod.Squish)

item_tfms=Resize(460, ResizedMethod.Pad, pad_mode=’zeros’)

Padding, Squishing or stretching the images end up as unrealistic shapes, leading to a model that learns that things look different from how they actually are, giving lower accuracy. In practise, it is better to randonly select part of image and then crop to just that part. On each epoch (which is one complete pass through all of images in the data) we randomly select a different part of the each image. To do this, we can use “RandomResizedCrop”. The parameter min_scale determines how much of the image to select at minimum each time.

item_tfms=RandomResizedCrops(460, min_scale=0.3)

RandomResizedCrop is an example of data augmentation. Data augmentation refers to creating random variations of input data such that they appear different but do not change the meaning of data. Some common example of data augmentation for images are rotation, flipping, perspective wrapping, brightness changes and contrast changes.

Once the images are of the same size, we can apply the augmentations to an entire batch of images using GPU, which will save a lot of time. To tell fastai that we want to use transforms on a batch, we use “batch_tfms” parameter.

In the pets datablock we have used –

             batch_tfms=aug_transforms(size=224, min_scale=0.75))

This data augmentation strategy is called presizing

Presizing –

Presizing adopts to 2 strategies

  1. Resize images to relatively “large” dimensions
  2. Compose all of the common augmentation operations into one, and perform the combines operation on the GPU nly once at the end of processing.

This picture shows the two steps:

  1. Crop full width or height: This is in item_tfms, so it’s applied to each individual image before it is copied to the GPU. It’s used to ensure all images are the same size. On the training set, the crop area is chosen randomly. On the validation set, the center square of the image is always chosen.
  2. Random crop and augment: This is in batch_tfms, so it’s applied to a batch all at once on the GPU, which means it’s fast. On the validation set, only the resize to the final size needed for the model is done here. On the training set, the random crop and any other augmentations are done first.

Once you have the datablock ready, as a next step train a simple model

learn = cnn_learner(dls, resnet34, metrics=error_rate)

In the above block, we have not mentioned a “loss” function. fastai will generally try to select an appropriate loss function based on the kind of data and model. In this case, since the dataset contains images and have a categorical outcome, fastai will default to using “cross-entropy loss”

Cross-Entropy Loss = softmax + log + negative log likelihood

Cross Entropy loss
  • Cross-Entropy loss works when dependent variable has more than 2 categories.
  • Softmax is multi-category equivalent of sigmoid
  • In case of 2 categories, Softmax returns the same value as sigmoid for the first column and those values subtracted from 1 for the second column, summing up to 1 for a row
def softmax(x): 
    return exp(x) / exp(x).sum(dim=1, keepdim=True)
teddy0.021.021.02/4.60 = 0.22
grizzly-2.490.080.08/4.60 = 0.02
brown1.253.493.49/4.60 = 0.76
4.60 (sum of exp values)1.00 (sum of softmax values)

  • The exponential in softmax ensures
    • All our numbers are positive and then dividing by the sum ensures to have numbers add up to 1.
    • If one number in our activation is slightly bigger than the others, the exponential will amplify it.

Why log?

Probability can be between 0 and 1 which mean the model will not care whether it predicts 0.99 or 0.999. Those numbers are very close but 0.999 is 10 times more confident that 0.99. And, so we want to transform our numbers between 0 and 1 to instead to be between negative infinity and 0 and logarithm (in PyTorch implemented as torch.log) can help do that.

Taking the mean of the positive or negative log of probabilities gives the negative log likelihood.

In PyTorch nll_loss assumes that you already took the log of the softmax. It does not take the log.

PyTorch has a function called log_softmax that combines log and softmax. So, nll_loss should be used after log_softmax.

In PyTorch the combination of log_softmax + nll_loss is available as nn.CrossEntropyLoss. By default the PyTorch loss function take the mean of the loss of all items. You can use reduction=’none’ to disable it.

nn.CrossEntropyLoss(reduction=’none’)(acts, targ) or F.cross_entropy(acts, targ)

Improving Model

  1. Learning Rate Finder

As discussed in previous blog learning rate is an important parameter to efficiently train a model and so it is important to pick right value. fastai provides a tool Learning Rate Finder – lr_find() . So the idea here is to start with very small learning rate and use it for one mini-batch and find the losses. Increase the learning rate at certain percentage, apply it on another mini-batch and keep increasing until the loss gets worse instead of better.

2. Unfreezing and Transfer Learning

The earlier layers can only find things like edges or diagonals or very basic patterns and it’s not until the later layers that you start to find complex patterns. The final layer of a pre-trained model like resnet18, is specially designed to classify the categories in the original pre-training dataset. It is unlikely to be of any use in your model. So when we do transfer learning we remove the final layer and mostly replace it with a new linear layer with the correct number of outputs for our task. This newly added linear layer will have entirely random weights. So when fine tuning we tell the optimizer to update the weights in only those randomly added final layers and not to change the weights in the rest of the neural network. This is called freezing those pre-trained model.

fastai automatically freezes all the pre-trained layers. When we call fine_tune method, fastai does two things –

  • Trains the randomly added layers for one epoch, with all the other layers frozen
  • Unfreezes all the layers, and trains them for number of epochs requested.

To unfreeze the layers manually you can use – model.unfreeze()


Session –


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s