Serving ML Models with TorchServe


Image by creator

This post will walk you thru a strategy of serving your deep learning Torch model with the TorchServe framework.

There are quite a little bit of articles about this topic. Nevertheless, typically they’re focused either on deploying TorchServe itself or on writing custom handlers and getting the tip results. That was a motivation for me to jot down this post. It covers each parts and provides end-to-end example.
The image classification challenge was taken for instance. At the tip of the day you’ll have the ability to deploy TorchServe server, serve a model, send any random picture of a garments and eventually get the anticipated label of a garments class. I feel that is what people may expect from an ML model served as API endpoint for classification.

Say, your data science team designed an exquisite DL model. It’s an incredible accomplishment with no doubts. Nevertheless, to make a price out of it the model must be someway exposed to the surface world (if it’s not a Kaggle competition). This is named model serving. On this post I’ll not touch serving patterns for batch operations in addition to streaming patterns purely based on streaming frameworks. I’ll concentrate on one option of serving a model as API (never mind if this API is named by a streaming framework or by any custom service). More precisely, this feature is the TorchServe framework.
So, whenever you resolve to serve your model as API you have got no less than the next options:

  • web frameworks reminiscent of Flask, Django, FastAPI etc
  • cloud services like AWS Sagemaker endpoints
  • dedicated serving frameworks like Tensorflow Serving, Nvidia Triton and TorchServe

All have its pros and cons and the alternative is perhaps not at all times straightforward. Let’s practically explore the TorchServe option.

The primary part will briefly describe how a model was trained. It’s not essential for the TorchServe nonetheless I feel it helps to follow the end-to-end process. Then a custom handler will likely be explained.
The second part will concentrate on deployment of the TorchServe framework.
Source code for this post is situated here: git repo

For this toy example I chosen the image classification task based on FashionMNIST dataset. In case you’re not acquainted with the dataset it’s 70k of grayscale 28×28 images of various clothes. There are 10 classes of the garments. So, a DL classification model will return 10 logit values. For the sake of simplicity a model is predicated on the TinyVGG architecture (in case you ought to visualize it with CNN explainer): simply few convolution and max pooling layers with RELU activation. The notebook model_creation_notebook within the repo shows all of the strategy of training and saving the model.
Briefly the notebook just downloads the information, defines the model architecture, trains the model and saves state dict with torch save. There are two artifacts relevant to TorchServe: a category with definition of the model architecture and the saved model (.pth file).

Two modules have to be prepared: model file and custom handler.

Model file
As per documentation “A model file should contain the model architecture. This file is mandatory in case of eager mode models.
This file should contain a single class that inherits from torch.nn.Module.

So, let’s just copy the category definition from the model training notebook and reserve it as (any name you favor):

TorchServe offers some default handlers (e.g. image_classifier) but I doubt it could be used as is for real cases. So, most certainly you have to to create a custom handler on your task. The handler actually defines easy methods to preprocess data from http request, easy methods to feed it into the model, easy methods to postprocess the model’s output and what to return because the end result within the response.
There are two options — module level entry point and sophistication level entry point. See the official documentation here.
I’ll implement the category level option. It mainly implies that I want to create a custom Python class and define two mandatory functions: initialize and handle.
Initially, to make it easier let’s inherit from the BaseHandler class. The initialize function defines easy methods to load the model. Since we don’t have any specific requirements here let’s just use the definition from the super class.

The handle function mainly defines easy methods to process the information. In the best case the flow is: preprocess >> inference >> postprocess. In real applications likely you’ll must define your custom preprocess and postprocess functions. For the inference function for this instance I’ll use the default definition within the super class:

Preprocess function

Say, you built an app for image classification. The app sends the request to TorchServe with a picture as payload. It’s probably unlikely that the image at all times complies with the image format used for model training. Also you’d probably train your model on batches of samples and tensor dimensions have to be adjusted. So, let’s make a straightforward preprocess function: resize image to the required shape, make it grayscale, transform to Torch tensor and make it as one-sample batch.

Postprocess function

A multiclass classification model will return an inventory of logit or softmax probabilities. But in real scenario you’d somewhat need a predicted class or a predicted class with the probability value or possibly top N predicted labels. After all, you may do it somewhere within the essential app/other service however it means you bind the logic of your app with the ML training process. So, let’s return the anticipated class directly within the response.
(for the sake of simplicity the list of labels is hardcoded here. In github version the handler reads is from config)

Okay, the model file and the handler are ready. Now let’s deploy TorchServe server. Code above assumes that you have got already installed pytorch. One other prerequisite is JDK 11 installed (note, just JRE is just not enough, you wish JDK).
For TorchServe you could install two packages: torchserve and torch-model-archiver.
After successful installation step one is to arrange a .mar file — archive with the model artifacts. CLI interface of torch-model-archiver is aimed to do it. Type in terminal:

torch-model-archiver --model-name fashion_mnist --version 1.0 --model-file path/ --serialized-file path/fashion_mnist_model.pth --handler path/

Arguments are the next:
model name: a reputation you ought to give to the model
version: semantic version for versioning
model file: file with class definition of the model architecture
serialized file: .pth file from
handler: Python module with handler

Because of this the .mar file called as model name (in this instance fashion_mnist.mar) will likely be generated within the directory where CLI command is executed. So, higher to cd to your project directory before calling the command.

Next step finally is to begin the server. Type in terminal:

torchserve --start --model-store path --models fmnist=/path/fashion_mnist.mar 

model store: directory where the mar files are situated
models: name(s) of the model(s) and path to the corresponding mar file.

Note, that model name in archiver defines how your .mar file will likely be named. The model name in torchserve defines the API endpoint name to invoke the model. So, those names may be the identical or different, it’s as much as you.

After those two command the server shall be up and running. By default TorchServe uses three ports: 8080, 8081 and 8082 for inference, management and metrics correspondingly. Go to your browser/curl/Postman and send a request to
If TorchServe works accurately you need to see {‘status’: ‘Healthy’}

Image by creator

A few hints for possible issues:
1. If after torchserve -start command you see errors within the log with mentioning “ module named captum” then install it manually. I encountered this error with the torchserve 0.7.1

2. It could occur that some port is already busy with one other process. Then likely you will notice ‘Partially healthy’ status and a few errors in log.
To examine which process uses the port on Mac type (for instance for 8081):

sudo lsof -i :8081

One option may be to kill the method to free the port. But it surely is perhaps not at all times a very good idea if the method is someway essential.
As a substitute it’s possible to specify any latest port for TorchServe in a straightforward config file. Say, you have got some application which is already working on 8081 port. Let’s change the default port for TorchServe management API by creating torch_config file with only one line:


(you may select any free port)

Next we’d like to let TorchServe know in regards to the config. First, stop the unhealthy server by

torchserve --stop

Then restart it as

torchserve --start --model-store path --models fmnist=/path/fashion_mnist.mar --ts-config path/torch_config

At this step it’s assumed the server is up and running accurately. Let’s pass a random clothes image to the inference API and get the anticipated label.
The endpoint for inference is


In this instance it’s http://localhost:8080/predictions/fmnist
Let’s curl it and pass a picture as

curl -X POST http://localhost:8080/predictions/fmnist -T /path_to_image/image_file

for instance with the sample image from the repo:

curl -X POST http://localhost:8080/predictions/fmnist -T tshirt4.jpg

(X flag is to specify the tactic /POST/, -T flag is to transfer a file)

Within the response we must always see the anticipated label:

Image by creator

Well, by following along this blog post we were capable of create a REST API endpoint to which we will send a picture and get the anticipated label of the image. By repeating the identical procedure on a server as a substitute of local machine one can leverage it to create an endpoint for user-facing app, for other services or as an illustration endpoint for streaming ML application (see this interesting paper for a reason why you likely shouldn’t do this:

Stay tuned, in the following part I’ll expand the instance: let’s make a mock of Flask app for business logic and invoke an ML model served via TorchServe (and deploy all the pieces with Kubernetes).
A straightforward use case: user-facing app with tons of business logic and with many alternative features. Say, one feature is uploading a picture to use a desired style to it with a mode transfer ML model. The ML model may be served with TorchServe and thus the ML part will likely be completely decoupled from business logic and other features within the essential app.


What are your thoughts on this topic?
Let us know in the comments below.


Notify of
Newest Most Voted
Inline Feedbacks
View all comments
coffee shop music
coffee shop music
4 months ago

coffee shop music

bahis sitesi
bahis sitesi
25 days ago

bahis oyna para kazan

Share this article

Recent posts

Could We Achieve AGI Inside 5 Years? NVIDIA’s CEO Jensen Huang Believes It’s Possible

Within the dynamic field of artificial intelligence, the search for Artificial General Intelligence (AGI) represents a pinnacle of innovation, promising to redefine the interplay...

MS reveals a part of 'Customized Co-Pilot'… “Testing in progress… coming soon”

A few of the 'Customized Co-Pilot' that Microsoft (MS) announced in January has been released. In addition they announced that they plan to...

Impact of Rising Sea Levels on Coastal Residential Real Estate Assets

Using scenario based stress testing to discover medium (2050) and long run (2100) sea level rise risksThis project utilizes a scenario based qualitative stress...

Create a speaking and singing video with a single photo…”Produce mouth shapes, facial expressions, and movements.” Alibaba introduced a man-made intelligence (AI) system that creates realistic speaking and singing videos from a single photo. It's the follow-up to the...

Recent comments

binance us registrácia on The Path to AI Maturity – 2023 LXT Report
Do NeuroTest work on The Stacking Ensemble Method
AeroSlim Weight loss price on NIA holds AI Ethics Idea Contest Awards Ceremony
skapa binance-konto on LLMs and the Emerging ML Tech Stack
бнанс рестраця для США on Model Evaluation in Time Series Forecasting
Bonus Pendaftaran Binance on Meet Our Fleet
Créer un compte gratuit on About Me — How I give AI artists a hand
To tài khon binance on China completely blocks ‘Chat GPT’
Regístrese para obtener 100 USDT on Reducing bias and improving safety in DALL·E 2
crystal teeth whitening on What babies can teach AI
binance referral bonus on DALL·E API now available in public beta prihlásení on Neural Networks and Life
Büyü Yapılmışsa Nasıl Bozulur on Introduction to PyTorch: from training loop to prediction
yıldızname on OpenAI Function Calling
Kısmet Bağlılığını Çözmek İçin Dua on Examining Flights within the U.S. with AWS and Power BI
Kısmet Bağlılığını Çözmek İçin Dua on How Meta’s AI Generates Music Based on a Reference Melody
Kısmet Bağlılığını Çözmek İçin Dua on ‘이루다’의 스캐터랩, 기업용 AI 시장에 도전장
uçak oyunu bahis on Thanks!
para kazandıran uçak oyunu on Make Machine Learning Work for You
medyum on Teaching with AI
aviator oyunu oyna on Machine Learning for Beginners !
yıldızname on Final DXA-nation
adet kanı büyüsü on ‘Fake ChatGPT’ app on the App Store
Eşini Eve Bağlamak İçin Dua on LLMs and the Emerging ML Tech Stack
aviator oyunu oyna on AI as Artist’s Augmentation
Büyü Yapılmışsa Nasıl Bozulur on Some Guy Is Trying To Turn $100 Into $100,000 With ChatGPT
Eşini Eve Bağlamak İçin Dua on Latest embedding models and API updates
Kısmet Bağlılığını Çözmek İçin Dua on Jorge Torres, Co-founder & CEO of MindsDB – Interview Series
gideni geri getiren büyü on Joining the battle against health care bias
uçak oyunu bahis on A faster method to teach a robot
uçak oyunu bahis on Introducing the GPT Store
para kazandıran uçak oyunu on Upgrading AI-powered travel products to first-class
para kazandıran uçak oyunu on 10 Best AI Scheduling Assistants (September 2023)
aviator oyunu oyna on 🤗Hugging Face Transformers Agent
Kısmet Bağlılığını Çözmek İçin Dua on Time Series Prediction with Transformers
para kazandıran uçak oyunu on How China is regulating robotaxis
bağlanma büyüsü on MLflow on Cloud
para kazandıran uçak oyunu on Can The 2024 US Elections Leverage Generative AI?
Canbar Büyüsü on The reverse imitation game
bağlanma büyüsü on The NYU AI School Returns Summer 2023
para kazandıran uçak oyunu on Beyond ChatGPT; AI Agent: A Recent World of Staff
Büyü Yapılmışsa Nasıl Bozulur on The Murky World of AI and Copyright
gideni geri getiren büyü on ‘Midjourney 5.2’ creates magical images
Büyü Yapılmışsa Nasıl Bozulur on Microsoft launches the brand new Bing, with ChatGPT inbuilt
gideni geri getiren büyü on MemCon 2023: We’ll Be There — Will You?
adet kanı büyüsü on Meet the Fellow: Umang Bhatt
aviator oyunu oyna on Meet the Fellow: Umang Bhatt
abrir uma conta na binance on The reverse imitation game
código de indicac~ao binance on Neural Networks and Life
Larry Devin Vaughn Wall on How China is regulating robotaxis
Jon Aron Devon Bond on How China is regulating robotaxis
otvorenie úctu na binance on Evolution of Blockchain by DLC
puravive reviews consumer reports on AI-Driven Platform Could Streamline Drug Development
puravive reviews consumer reports on How OpenAI is approaching 2024 worldwide elections Registrácia on DALL·E now available in beta