Machine learning in production with Flask, Twilio, Docker, and Google Cloud

March 08, 2019

Making machine learning models work in the wild, and what I learned along the way.


The code for this project - excluding the model and training data, both of which are too big to upload - can be found on my Github.

It's common to hear how powerful machine learning models are, but courses and tutorials usually stop short of showing you how to use them in real life. Here, I want to talk about how I was able to train and export an image classification model and set it up in a small web app using Flask, Twilio, and Docker, so that anyone can text images to the model and have them return a classification.

My mom works as a native plant horticulturist in the bay area, and I thought it would be fun to build something that she would actually use - so I built an app that receives photos of local native plants, and texts back a classification.

app example

Training a model to classify images

When developing machine learning models, there are two main stages: training and inference. For my image classification model, training is the process of training the model to recognize certain types of plants. To do so, I executed a very similar set of steps as described in this blog post where I used the fast.ai library to build a penguin image classifier. The notebook code can be found here.

I used a resnet34 architecture, a model that is already trained on most of the common types of things found in photos on the internet.

Note: all the code snippets in this blog post are taken from the repo linked above.

learn = create_cnn(data, models.resnet34, metrics=error_rate)

Then I trained the model:

learn.fit_one_cycle(4)
epoch train_loss valid_loss error_rate
1 1.252663 0.858639 0.372727
2 0.915213 0.544021 0.227273
3 0.748020 0.436374 0.136364
4 0.641182 0.420302 0.118182

Found an optimal learning rate, and then trained again with that learning rate:

learn.lr_find()
learn.recorder.plot()

learning rate

learn.fit_one_cycle(2, max_lr=slice(1e-4,1e-3))

Training a model to recognize images is pretty compute-heavy, and thus requires a GPU - a chip that can handle many more operations than those found in our laptops. I trained my model on the Recurse Center's community cluster, two of the machines of which have MSI GeForce GTX 1080 Ti ARMOR 11G Graphics Cards. Fancy.

Inference

Inference is the process by which an already trained machine learning model takes in input, and returns a response. This can be done via CPU, as I specify in my app:

defaults.device = torch.device('cpu')

With the fast.ai library, the classification can be done as simply as calling learn.predict().

img = open_image('photo.png')

#run the classification for the downloaded image and create a response string
outputs = learn.predict(img)

That gives us a tensor, from which we can extract useful information. In this case, the first index tells us which classification the model gave us, taken from the max of the predictions in the third index.

(Category baccharis, tensor(1), tensor([0.3732, 0.4415, 0.1854]))

Building a web app with Flask

Cool, so now we have a model that does things! But how do we use the web to expose its functionality to the outside world?

Before this project I had heard of the web "micro-framework" Flask, but had never done any web development in Python. After this (very brief) exposure to it, it seems similar to ruby's Sinatra, the first web framework I built something in. Both are pretty lightweight and unopinionated - all you need to get started with your first endpoint is a few lines of code.

# app.py
from flask import Flask, request

app = Flask(__name__)

@app.route("/sms", methods=['POST'])
def sms_reply():
    # main app logic here
if __name__ == "__main__":
    app.run(host='0.0.0.0', port=8080)

Adding interactivity with Twilio

Twilio Programmable SMS is an API that allows you to create webhooks to send and receive text messages in your web app.

I followed their "Python Quickstart" tutorial which walks you through the steps of getting a Twilio phone number, setting up a webhook with ngrok, and creating an endpoint that responds to incoming messages.

twilio webhook

In my app, all I'm doing is getting the photo that the user sent to my Twilio number, and after it's classified by my model, sending a response message that tells the user which classification it received.

Now, whenever a user texts my Twilio number, the Twilio API will trigger the following route:

from twilio.twiml.messaging_response import Body, Message, MessagingResponse

@app.route("/sms", methods=['POST'])
def sms_reply():
    # machine learning stuff happens here

    # get URL of media that Twilio sends
    photoURL = request.values.get('MediaUrl0', None)
    if photoURL == None:
        return

    # download url contents in binary format
    r = requests.get(photoURL)

    # machine learning classification happens here

    resStr = "I'm " + accuracy + "% sure that\'s a " + classification   

    # return a message to the user telling them what kind of plant is in the photo!
    resp.message(resStr)
    return str(resp)

Dockerize the app

Using Docker is by no means mandatory for this project, but because I wanted to deploy this app locally from my laptop that doesn't have a GPU, I thought packaging it in Docker would be a good idea.

Here's the Dockerfile I used to build my image:

# use a Python base image
FROM python:3.6 as base 

FROM base as builder

COPY requirements.txt /
RUN pip install --trusted-host pypi.python.org -r requirements.txt

FROM base

COPY --from=builder /usr/local /usr/local
COPY . /app
WORKDIR /app

EXPOSE 8080

CMD [ "gunicorn", "-b", ":8080", "wsgi:app" ]

Thanks to my friend Karthik, and this blog post, this image takes advantage of layer caching and multistage builds.

Here's a visualization of how Docker uses layers: docker image

When you run an image and generate a container, you add a new writable layer (the “container layer”) on top of the underlying layers. When you make any changes to a container - writing, modifying, or deleting files - those changes are written to the top level thin writable container layer.

Multi-stage builds allow you to selectively choose when you're adding something to a build stage, and thus take advantage of the work that has already been done. You can read more here.

Deploy to Google App Engine

To figure out how to deploy my app using Google Cloud and Docker, I followed this blog post. The first step is to create a project - projects are abstractions that exist in Google Cloud Platform so that you can organize billing for different apps.

Then, create a YAML file called app.yaml - the Google Cloud CLI looks for this file to deploy the app.

# app.yaml
runtime: custom
env: flex

runtime: custom tells Google App Engine to configure your environment entirely with your Docker config.

Finally, deploy the thing!

gcloud app deploy

Now get some coffee, because the deploy takes a few minutes - sometimes longer.

I added a hello world route to my app just to check that the server is running: hello world

But if the Twilio webhook is configured correctly you should now be able to text photos to the classification model.

I find great pleasure in being able to interact with machine learning in the real world. Hopefully this inspires someone to do the same, and use their model in practice!