Fast AI - Week One

Posted on Jan 11, 2019

During my time at the Recurse Center, I’m taking a machine learning course via Jeremy Howard’s Fast.ai - “Practical Deep Learning for Coders.” What attracted me to this course over others (like Andrew Ng’s Coursera course) was the top-down learning approach: the structure is hands on building models first, and diving into the theory later.

Most of the computer science classes I’ve taken in the past have taken on the opposite structure, to my frustration. I often after I get my hands dirty building something that I get excited to understand how that thing works underneath the surface, and why it was built that way.

The first week of the course was difficult, but not for the reasons I would have expected. I’d say 90% of the struggle was setting up the course on a GPU-enabled server, and getting the Jupyter ntoebooks set up. Jupyter notebooks are a way of hosting code popular with the scientific community. Each notebook is basically a web application that allows you to run separate pieces of code in “cells” on a browser.

In order to run my machine learning code, I wanted to take advantage of RC’s powerful community cluster because a) it’s a great resource b) it’s free, whereas many of the suggested cloud solutions charge a monthly fee as well as hourly for GPU usage. The tradeoff here, unsurprisingly, was cost vs time. Because some of the cloud providers had pre-made environments for the course, I spent a significant amount of time configuring my server’s environment.

I wrote a quick guide about how I set up the course on RC’s cluster.

The first week of the course is mostly about getting set up and familiar with the notebooks, but I also was introduced to some important concepts:

Learning rate: a learning rate is responsible for how quickly the parameters of the model are being adjusted. The fast.ai library has some handy functions to find learning rates:

learn.lr_find()
learn.sched.plot_lr()

Epoch: this is the number of cycles, or times that the model looks through the data. When playing around with different epochs, I noticed that increasing the number of epochs does not necessarily increase the accuracy of the model.

Training dataset: The training set is a labeled set of data that is used to fit the model - i.e. the weights of the connections between neurons. This should be the “gold standard” of the input data and the correct expected output, and thus shouldn’t contain invalid data, like photos that are text.

Validation dataset: The validation dataset provides the data to test the model against - the results of the model against the validation data set shows us how well it did.