Home Artificial Intelligence The Essential Library to Construct Segmentation Models

The Essential Library to Construct Segmentation Models

The Essential Library to Construct Segmentation Models

MartinThoma, CC0, via Wikimedia Commons (edited)

Neural network models have proven to be highly effective in solving segmentation problems, achieving state-of-the-art accuracy. They’ve led to significant improvements in various applications, including medical image evaluation, autonomous driving, robotics, satellite imagery, video surveillance, and rather more. Nonetheless, constructing these models normally takes a protracted time, but after reading this guide you’ll have the option to construct one with just a couple of lines of code.

Table of content

  1. Introduction
  2. Constructing blocks
  3. Construct a model
  4. Train the model

Segmentation is the duty of dividing a picture into multiple segments or regions based on certain characteristics or properties. A segmentation model takes a picture as input and returns a segmentation mask:

(Left) An input image | (Right) Its segmentation mask. Each images by PyTorch.

Segmentation neural network models consist of two parts:

  • An encoder: takes an input image and extracts features. Examples of encoders are ResNet, EfficentNet, and ViT.
  • A decoder: takes the extracted features and generates a segmentation mask. The decoder varies on the architecture. Examples of architectures are U-Net, FPN, and DeepLab.

Thus, when constructing a segmentation model for a selected application, you’ll want to select an architecture and an encoder. Nonetheless, it’s difficult to decide on the most effective combination without testing several. This normally takes a protracted time because changing the model requires writing quite a lot of boilerplate code. The Segmentation Models library solves this problem. It means that you can create a model in a single line by specifying the architecture and the encoder. Then you definately only need to switch that line to vary either of them.

To put in the newest version of Segmentation Models from PyPI use:

pip install segmentation-models-pytorch

The library provides a category for many segmentation architectures and every of them will be used with any of the available encoders. In the following section, you will notice that to construct a model you’ll want to instantiate the category of the chosen architecture and pass the string of the chosen encoder as a parameter. The figure below shows the category name of every architecture provided by the library:

Class names of all of the architectures provided by the library.

The figure below shows the names of essentially the most common encoders provided by the library:

Names of essentially the most common encoders provided by the library.

There are over 400 encoders, thus it’s impossible to point out all of them, but you’ll find a comprehensive list here.

Once the architecture and the encoder have been chosen from the figures above, constructing the model could be very easy:


  • encoder_name is the name of the chosen encoder (e.g. resnet50, efficentnet-b7, mit_b5).
  • encoder_weights is the dataset of the pre-trained. If encoder_weights is the same as "imagenet" the encoder weights are initialized through the use of the ImageNet pre-trained. All of the encoders have at the least one pre-trained and a comprehensive list is out there here.
  • in_channels is the channel count of the input image (3 if RGB).
    Even when in_channels is just not 3 an ImageNet pre-trained will be used: the primary layer might be initialized by reusing the weights from the pre-trained first convolutional layer (the procedure is described here).
  • out_classes is the variety of classes within the dataset.
  • activation is the activation function for the output layer. The alternatives are None (default), sigmoid and softmax .
    Note: when using a loss function that expects logits as input, the activation function have to be None. For instance, when using the CrossEntropyLoss function, activation have to be None .

This section shows all of the code required to perform training. Nonetheless, this library doesn’t change the standard pipeline for training and validating a model. To simplify the method, the library provides the implementation of many loss functions resembling Jaccard Loss, Dice Loss, Dice Cross-Entropy Loss, Focal Loss, and metrics resembling Accuracy, Precision, Recall, F1Score, and IOUScore. For an entire list of them and their parameters, check their documentation within the Losses and Metrics sections.

The proposed training example is a binary segmentation using the Oxford-IIIT Pet Dataset (it would be downloaded by code). These are two samples from the dataset:

Finally, these are all steps to perform the sort of segmentation task:

  1. Construct the model.

Set the activation function of the last layer depending on the loss function you’re going to use.

2. Define the parameters.

Keep in mind that when using a pre-trained, the input needs to be normalized through the use of the mean and standard deviation of the information used to coach the pre-trained.

3. Define the train function.

Nothing changes here from the train function you’d have written to coach a model without using the library.

4. Define the validation function.

True positives, false positives, false negatives and true negatives from batches are all summed together to calculate metrics only at the tip of batches. Note that logits have to be converted to classes before metrics will be calculated. Call the train function to start out training.

5. Use the model.

These are some segmentations:

Concluding remarks

This library has every thing you’ll want to experiment with segmentation. It’s very easy to construct a model and apply changes, and most loss functions and metrics are provided. As well as, using this library doesn’t change the pipeline we’re used to. See the official documentation for more information. I even have also included a few of the commonest encoders and architectures within the references.

The Oxford-IIIT Pet Dataset is out there to download for industrial/research purposes under a Creative Commons Attribution-ShareAlike 4.0 International License. The copyright stays with the unique owners of the photographs.

All images, unless otherwise noted, are by the Creator. Thanks for reading, I hope you have got found this convenient.

[1] O. Ronneberger, P. Fischer and T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation (2015)

[2] Z. Zhou, Md. M. R. Siddiquee, N. Tajbakhsh and J. Liang, UNet++: A Nested U-Net Architecture for Medical Image Segmentation (2018)

[3] L. Chen, G. Papandreou, F. Schroff, H. Adam, Rethinking Atrous Convolution for Semantic Image Segmentation (2017)

[4] L. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation (2018)

[5] R. Li, S. Zheng, C. Duan, C. Zhang, J. Su, P.M. Atkinson, Multi-Attention-Network for Semantic Segmentation of Wonderful Resolution Distant Sensing Images (2020)

[6] A. Chaurasia, E. Culurciello, LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation (2017)

[7] T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature Pyramid Networks for Object Detection (2017)

[8] H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia, Pyramid Scene Parsing Network (2016)

[9] H. Li, P. Xiong, J. An, L. Wang, Pyramid Attention Network for Semantic Segmentation (2018)

[10] K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition (2014)

[11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition (2015)

[12] S. Xie, R. Girshick, P. Dollár, Z. Tu, K. He, Aggregated Residual Transformations for Deep Neural Networks (2016)

[13] J. Hu, L. Shen, S. Albanie, G. Sun, E. Wu, Squeeze-and-Excitation Networks (2017)

[14] G. Huang, Z. Liu, L. van der Maaten, K. Q. Weinberger, Densely Connected Convolutional Networks (2016)

[15] M. Tan, Q. V. Le, EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (2019)

[16] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, P. Luo, SegFormer: Easy and Efficient Design for Semantic Segmentation with Transformers (2021)



Please enter your comment!
Please enter your name here