This week in the fast.ai course we got more into the details of getting data for image classification models, playing around with the different training parameters, and running them on sample data.
This blog post is focused on some of the main code steps in the lesson 2 Jupyter notebook, and how I used it to build my own penguin image classifier!
How to scrape images from Google Images
Coming from a web development background, it's kind of funny to see this kind of raw javascript in the browser - but it works. It opens up a csv of the urls of all of the selected photos.
urls = Array.from(document.querySelectorAll('.rg_di.rg_meta'))
.map(el=>JSON.parse(el.textContent).ou);
window.open('data:text/csv;charset=utf-8,' +
escape(urls.join('\n')));
View Data
After we've gotten all the photos downloaded into the folders (half the battle), it's important to look at the data and make sure everything's good. The command used in the notebook is
data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2,
ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)
ImageDataBunch is a class in the fast.ai library that has methods to create a folder structure for your photos to be used by the neural net. This is nice because models often expect a certain directory structure that contains a training set and a validation set. Kaggle, for example, provides data in this format for their competitions.
Then we display a sample of the images in the dataset:
data.showBatch()
And we now have some penguins!
Train model
Now that we know we have the data in the right format, we can actually create a model. The fast.ai library does this via the create_cnn method, which takes in a databunch object, a pre-existing cnn, and a metrics parameter.
learn = create_cnn(data, models.resnet34, metrics=error_rate)
Resnet, short for "residual network," is a type of deep convolutional neural network that comes with the PyTorch library.
Now, let's see how well our model does out of the box.
learn.fit_one_cycle(4)
epoch | train_loss | valid_loss | error_rate |
---|---|---|---|
1 | 1.211088 | 0.636326 | 0.245614 |
2 | 0.773919 | 0.332470 | 0.140351 |
3 | 0.547051 | 0.278447 | 0.087719 |
4 | 0.434100 | 0.278498 | 0.070175 |
Running four epochs at this learning rate gives us an error rate of 7%, which is pretty decent.
Understanding learning rates
Using the learning rate finder, we're looking for the steepest downward slope that exists for awhile. In the following plot, that looks like the slope between 1e-05 and 1e-03.
learn.lr_find()
learn.recorder.plot()
But what happens if your learning rate is too high?
learn.fit_one_cycle(1, max_lr=0.1)
epoch | train_loss | valid_loss | error_rate |
---|---|---|---|
1 | 12.220007 | 11567788032.000000 | 0.701754 |
A high validation loss is a sign that the learning rate is too high.
What about the learning rate being very low?
learn.fit_one_cycle(4, max_lr=1e-5)
epoch | train_loss | valid_loss | error_rate |
---|---|---|---|
1 | 1.650528 | 1.357501 | 0.719298 |
2 | 1.562972 | 1.340124 | 0.754386 |
3 | 1.539678 | 1.332040 | 0.754386 |
4 | 1.536463 | 1.332700 | 0.754386 |
If the training loss becomes larger than the validation loss, and the error rate is very large, it's a sign that the learning rate or number of epochs is too low.
Low learning rates also run the risk of overfitting. Overfitting is when your model starts learning your specific images rather than generalizing to any input data.
Cleaning up our data
FileDeleter is an app that runs in Jupyter notebooks that gives us the images the neural net is most unsure about - and then gives us the option to delete them if they actually aren't what we're trying to classify.
As you can see, there are some images in the dataset that it doesn't make sense to use when training our model, like the map.
This combination of using the neural net plus human feedback works really well.
Productionizing
When deploying a model to production, Jeremy Howard recommends deploying with CPU instead of GPU. Why? Because while GPU is faster, it's usually faster by a magnitude of 10 - 0.01 seconds for the model to run on a GPU vs. 0.1 seconds on a CPU - and while GPU's are pretty necessary to train your model, with a CPU, it's much easier to scale the requests.
Let's test the model!
Instead of training, we're going to do inference. Inference is when you're using a trained model to predict things, rather than training the model.
img = open_image(path/'stuffed'/'00000004.jpeg')
This is an image of a very cute stuffed penguin:
Now, when we get the prediction:
pred_class,pred_idx,outputs = learn.predict(img)
pred_class
It gives us the output Category stuffed. Which is correct! Pretty awesome.