PyTorch Inference Server on GKE

Sanchit Ahuja
2 min readJan 18, 2022

Introduction

PyTorch is an open-source machine learning library used for applications such as computer vision and natural language processing . On the other hand TorchServe is a performant, flexible and easy to use tool for serving PyTorch eager mode and torch scripted models.

However, setting up PyTorch environment gets tricky at times. That’s where containerization comes into play. A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably.

In this article, we will be going over the steps to set up your TorchServe. container that can be deployed to your Kubernetes environment (GKE in our case) later. We will be using Docker to create images and Google Container Registry to store built images.

Project Setup

The project structure is as follow:

inference_server
- config.properties (TorchServe configuration)
- DockerFile (To create Docker container)
- requirements.txt (Additional dependencies)
- handler.py (Custom TorchServe Handler)
- utils.py (Utilities optional)
- config.json (Additional config requirements for your handler)
- model.bin (Stored model)
- version

Create Handler

config.properties

config.json (Optional)

Dockerfile

Skip utils.py from extra files if you are not adding utils.py in your project.

version

1.0

Building Docker Image

version="$(cat version)"
sudo docker build --tag <tag>.
sudo docker tag <tag> gcr.io/<project_id>/<tag>:$version

Before pushing the image to Container Registry make sure gcloud docker is configured. If not configure using the following command.

gcloud auth configure-docker

Pushing Docker Image to Container Registry

version="$(cat version)"
sudo docker push gcr.io/<project_id>/<tag>:$version

Deploying TorchServe Container on GKE (Kubernetes)

In order to deploy the TorchServe custom container on GKE (Google Kubernetes Engine) make sure the node has GPU attached to it.

The deployment file should look something as follow you can modify resource and GPU settings as per your requirements.

Create Deployment File

To make deployment accessible to every VMs in the project and load balance the requests across multiple pods we need to create a service. For info on Internal Load Balancer. Refer here

Create Service for the deployment

Configure Google Cloud Kubectl on your System

gcloud components install kubectl

Apply deployment.yaml and service.yaml

kubectl apply -f deployment.yamlkubectl apply -f service.yaml

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Sanchit Ahuja
Sanchit Ahuja

Written by Sanchit Ahuja

I am a Backend Developer specializing in Software Architecture, Databases & API Development. I also have a great interest in Quantitative trading & Stock Market

No responses yet

Write a response