PyTorch Inference Server on GKE
Introduction
PyTorch is an open-source machine learning library used for applications such as computer vision and natural language processing . On the other hand TorchServe is a performant, flexible and easy to use tool for serving PyTorch eager mode and torch scripted models.
However, setting up PyTorch environment gets tricky at times. That’s where containerization comes into play. A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably.
In this article, we will be going over the steps to set up your TorchServe. container that can be deployed to your Kubernetes environment (GKE in our case) later. We will be using Docker to create images and Google Container Registry to store built images.
Project Setup
The project structure is as follow:
inference_server
- config.properties (TorchServe configuration)
- DockerFile (To create Docker container)
- requirements.txt (Additional dependencies)
- handler.py (Custom TorchServe Handler)
- utils.py (Utilities optional)
- config.json (Additional config requirements for your handler)
- model.bin (Stored model)
- version
Create Handler
config.properties
config.json (Optional)
Dockerfile
version
1.0
Building Docker Image
version="$(cat version)"
sudo docker build --tag <tag>.
sudo docker tag <tag> gcr.io/<project_id>/<tag>:$version
Before pushing the image to Container Registry make sure gcloud docker is configured. If not configure using the following command.
gcloud auth configure-docker
Pushing Docker Image to Container Registry
version="$(cat version)"
sudo docker push gcr.io/<project_id>/<tag>:$version
Deploying TorchServe Container on GKE (Kubernetes)
In order to deploy the TorchServe custom container on GKE (Google Kubernetes Engine) make sure the node has GPU attached to it.
The deployment file should look something as follow you can modify resource and GPU settings as per your requirements.
Create Deployment File
To make deployment accessible to every VMs in the project and load balance the requests across multiple pods we need to create a service. For info on Internal Load Balancer. Refer here
Create Service for the deployment
Configure Google Cloud Kubectl on your System
gcloud components install kubectl
Apply deployment.yaml and service.yaml
kubectl apply -f deployment.yamlkubectl apply -f service.yaml
References
