PyTorch Inference Server on GKE

2 min readJan 18, 2022

Introduction

PyTorch is an open-source machine learning library used for applications such as computer vision and natural language processing . On the other hand TorchServe is a performant, flexible and easy to use tool for serving PyTorch eager mode and torch scripted models.

However, setting up PyTorch environment gets tricky at times. That’s where containerization comes into play. A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably.

In this article, we will be going over the steps to set up your TorchServe. container that can be deployed to your Kubernetes environment (GKE in our case) later. We will be using Docker to create images and Google Container Registry to store built images.

Project Setup

The project structure is as follow:

inference_server
  -  config.properties (TorchServe configuration)
  -  DockerFile (To create Docker container)
  -  requirements.txt (Additional dependencies)
  -  handler.py (Custom TorchServe Handler)
  -  utils.py (Utilities optional)
  -  config.json (Additional config requirements for your handler)
  -  model.bin (Stored model)
  -  version

Create Handler

config.properties

config.json (Optional)

Dockerfile

Skip utils.py from extra files if you are not adding utils.py in your project.

version

1.0

Building Docker Image

version="$(cat version)"
sudo docker build --tag <tag>.
sudo docker tag <tag> gcr.io/<project_id>/<tag>:$version

Before pushing the image to Container Registry make sure gcloud docker is configured. If not configure using the following command.

gcloud auth configure-docker

Pushing Docker Image to Container Registry

version="$(cat version)"
sudo docker push gcr.io/<project_id>/<tag>:$version

Deploying TorchServe Container on GKE (Kubernetes)

In order to deploy the TorchServe custom container on GKE (Google Kubernetes Engine) make sure the node has GPU attached to it.

The deployment file should look something as follow you can modify resource and GPU settings as per your requirements.

Create Deployment File

To make deployment accessible to every VMs in the project and load balance the requests across multiple pods we need to create a service. For info on Internal Load Balancer. Refer here

Create Service for the deployment

Configure Google Cloud Kubectl on your System

gcloud components install kubectl

Apply deployment.yaml and service.yaml

kubectl apply -f deployment.yamlkubectl apply -f service.yaml

References

Services, Load Balancing, and Networking

Concepts and resources behind networking in Kubernetes. Every gets its own IP address. This means you do not need to…

kubernetes.io

Configure Pods and Containers

© 2021 The Kubernetes Authors | Documentation Distributed under CC BY 4.0 Copyright © 2021 The Linux Foundation ®. All…

v1-20.docs.kubernetes.io

7. TorchServe default inference handlers - PyTorch/Serve master documentation

TorchServe provides following inference handlers out of box. It's expected that the models consumed by each support…

pytorch.org

Using Container Registry with Google Cloud | Container Registry documentation

To simplify your build and deployment workflows, some Google Cloud service accounts and runtime environments are…

cloud.google.com

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Written by Sanchit Ahuja

11 Followers

4 Following

I am a Backend Developer specializing in Software Architecture, Databases & API Development. I also have a great interest in Quantitative trading & Stock Market

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

More from Sanchit Ahuja

Sanchit Ahuja

Key Insights to Celery

In this post we will look into Intro to Celery, Installation with Django scenarios, Important configurations & Commonly faced issues..

Jun 29, 2021

Example how gRPC request and response flows

Towards Dev

Sanchit Ahuja

gRPC with Python

Lately, there’s been a lot of buzz around gRPC and Protocol Buffers. This article aims to simplify these terms and write small piece of…

Feb 9, 2022

Sanchit Ahuja

Object Oriented Programming with Go

Object oriented programming (OOP) is a programming paradigm based on the concepts of Objects which can contain methods and data in form of…

May 13, 2021

See all from Sanchit Ahuja

Recommended from Medium

Shaun Keenan

Serving Models with Ray Serve

Nov 6, 2024

Asynchronous Kafka Consumption with Confluent and FastAPI

Webmobix

Danny Thuering

Asynchronous Kafka Consumption with Confluent and FastAPI

Integrating Confluent’s Python library for Kafka with the FastAPI web framework.

Nov 28, 2024

Lists

Coding & Development

11 stories1033 saves

Predictive Modeling w/ Python

20 stories1857 saves

Practical Guides to Machine Learning

10 stories2225 saves

ChatGPT

21 stories991 saves

Dev Genius

DaeGon Kim

Deploying Jupyterhub on Kubern

Today we will deploy jupyterhub that provides jupyter notebooks for multiple users.

Dec 11, 2024

Deploying and Managing Ollama Models on Kubernetes: A Comprehensive Guide

AI Advances

Isuru Lakshan Ekanayaka

Deploying and Managing Ollama Models on Kubernetes: A Comprehensive Guide

Deploying machine learning models can be challenging, especially when aiming for scalable and maintainable deployments. Kubernetes (K8s)…

Nov 9, 2024

138

Technical Guide: End-to-End CI/CD DevOps with Jenkins, Docker, Kubernetes, ArgoCD, Github Actions , AWS EC2 and Terraform by Joel .O Wembo

Django Unleashed

Joel Wembo

Technical Guide: End-to-End CI/CD DevOps with Jenkins, Docker, Kubernetes, ArgoCD, Github Actions …

Building an end-to-end CI/CD pipeline for Django applications using Jenkins, Docker, Kubernetes, ArgoCD, AWS EKS, AWS EC2

Apr 12, 2024

1.2K

How to Replace Docker-in-Docker (DinD) in Your CI/CD Pipeline: Alternatives and Best Practices

Erwin Hermanto

How to Replace Docker-in-Docker (DinD) in Your CI/CD Pipeline: Alternatives and Best Practices

Docker-in-Docker (DinD) is a popular method for building and running containers within CI/CD pipelines. However, it has some limitations…

Oct 13, 2024

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams