The code for this project - excluding the model and training data, both of which are too big to upload - can be found on my Github.
It’s common to hear how powerful machine learning models are, but courses and tutorials usually stop short of showing you how to use them in real life. Here, I want to talk about how I was able to train and export an image classification model and set it up in a small web app using Flask, Twilio, and Docker, so that anyone can text images to the model and have them return a classification.
My mom works as a native plant horticulturist in the bay area, and I thought it would be fun to build something that she would actually use - so I built an app that receives photos of local native plants, and texts back a classification.
When developing machine learning models, there are two main stages: training and inference. For my image classification model, training is the process of training the model to recognize certain types of plants. To do so, I executed a very similar set of steps as described in this blog post where I used the fast.ai library to build a penguin image classifier. The notebook code can be found here.
I used a resnet34 architecture, a model that is already trained on most of the common types of things found in photos on the internet.
Note: all the code snippets in this blog post are taken from the repo linked above.
learn = create_cnn(data, models.resnet34, metrics=error_rate)
Then I trained the model:
Found an optimal learning rate, and then trained again with that learning rate:
Training a model to recognize images is pretty compute-heavy, and thus requires a GPU - a chip that can handle many more operations than those found in our laptops. I trained my model on the Recurse Center’s community cluster, two of the machines of which have MSI GeForce GTX 1080 Ti ARMOR 11G Graphics Cards. Fancy.
Inference is the process by which an already trained machine learning model takes in input, and returns a response. This can be done via CPU, as I specify in my app:
defaults.device = torch.device('cpu')
With the fast.ai library, the classification can be done as simply as calling
img = open_image('photo.png') #run the classification for the downloaded image and create a response string outputs = learn.predict(img)
That gives us a tensor, from which we can extract useful information. In this case, the first index tells us which classification the model gave us, taken from the max of the predictions in the third index.
(Category baccharis, tensor(1), tensor([0.3732, 0.4415, 0.1854]))
Cool, so now we have a model that does things! But how do we use the web to expose its functionality to the outside world?
Before this project I had heard of the web “micro-framework” Flask, but had never done any web development in Python. After this (very brief) exposure to it, it seems similar to ruby’s Sinatra, the first web framework I built something in. Both are pretty lightweight and unopinionated - all you need to get started with your first endpoint is a few lines of code.
# app.py from flask import Flask, request app = Flask(__name__) @app.route("/sms", methods=['POST']) def sms_reply(): # main app logic here if __name__ == "__main__": app.run(host='0.0.0.0', port=8080)
Twilio Programmable SMS is an API that allows you to create webhooks to send and receive text messages in your web app.
I followed their “Python Quickstart” tutorial which walks you through the steps of getting a Twilio phone number, setting up a webhook with ngrok, and creating an endpoint that responds to incoming messages.
In my app, all I’m doing is getting the photo that the user sent to my Twilio number, and after it’s classified by my model, sending a response message that tells the user which classification it received.
Now, whenever a user texts my Twilio number, the Twilio API will trigger the following route:
from twilio.twiml.messaging_response import Body, Message, MessagingResponse @app.route("/sms", methods=['POST']) def sms_reply(): # machine learning stuff happens here # get URL of media that Twilio sends photoURL = request.values.get('MediaUrl0', None) if photoURL == None: return # download url contents in binary format r = requests.get(photoURL) # machine learning classification happens here resStr = "I'm " + accuracy + "% sure that\'s a " + classification # return a message to the user telling them what kind of plant is in the photo! resp.message(resStr) return str(resp)
Using Docker is by no means mandatory for this project, but because I wanted to deploy this app locally from my laptop that doesn’t have a GPU, I thought packaging it in Docker would be a good idea.
Here’s the Dockerfile I used to build my image:
# use a Python base image FROM python:3.6 as base FROM base as builder COPY requirements.txt / RUN pip install --trusted-host pypi.python.org -r requirements.txt FROM base COPY --from=builder /usr/local /usr/local COPY . /app WORKDIR /app EXPOSE 8080 CMD [ "gunicorn", "-b", ":8080", "wsgi:app" ]
Here’s a visualization of how Docker uses layers:
When you run an image and generate a container, you add a new writable layer (the “container layer”) on top of the underlying layers. When you make any changes to a container - writing, modifying, or deleting files - those changes are written to the top level thin writable container layer.
Multi-stage builds allow you to selectively choose when you’re adding something to a build stage, and thus take advantage of the work that has already been done. You can read more here.
To figure out how to deploy my app using Google Cloud and Docker, I followed this blog post. The first step is to create a project - projects are abstractions that exist in Google Cloud Platform so that you can organize billing for different apps.
Then, create a YAML file called app.yaml - the Google Cloud CLI looks for this file to deploy the app.
# app.yaml runtime: custom env: flex
runtime: custom tells Google App Engine to configure your environment entirely with your Docker config.
Finally, deploy the thing!
gcloud app deploy
Now get some coffee, because the deploy takes a few minutes - sometimes longer.
I added a hello world route to my app just to check that the server is running:
But if the Twilio webhook is configured correctly you should now be able to text photos to the classification model.
I find great pleasure in being able to interact with machine learning in the real world. Hopefully this inspires someone to do the same, and use their model in practice!