Essential guide to transfer learning
The rapid developments in Computer Vision — image classification use cases have been further accelerated by the appearance of transfer learning. It takes a variety of computational resources and time to coach a pc vision neural network model on a big dataset of images.
Luckily, this time and resources might be shortened by utilizing pre-trained models. The strategy of leveraging feature representation from a pre-trained model is known as transfer learning. The pre-trained are generally trained using high-end computational resources and on massive datasets.
The pre-trained models might be utilized in various ways:
- Using the pre-trained weights and directly making predictions on the test data
- Using the pre-trained weights for initialization and training the model using the custom dataset
- Using only the architecture of the pre-trained network, and training it from scratch on the custom dataset
This text walks through the highest 10 state-of-the-art pre-trained models to get image embedding. All these pre-trained models might be loaded as keras models using the keras.application API.
CNN Architecture discussed in this text:
1) VGG
2) Xception
3) ResNet
4) InceptionV3
5) InceptionResNet
6) MobileNet
7) DenseNet
8) NasNet
9) EfficientNet
10) ConvNEXT
The VGG-16/19 networks were introduced on the ILSVRC 2014 conference because it is one of the crucial popular pre-trained models. It was developed by the Visual Graphics Group on the University of Oxford.
There are two variations of the VGG model: 16 and 19 layers network, VGG-19 (19-layer network) being an improvement of the VGG-16 (16-layer network) model.
Architecture:
The VGG network is straightforward and sequential in nature and uses a variety of filters. At each stage, small (3*3) filters are used to cut back the variety of parameters.
The VGG-16 network has the next:
- Convolutional Layers = 13
- Pooling Layers = 5
- Fully Connected Dense Layers = 3
Input: Image of dimensions (224, 224, 3)
Output: Image embedding of 1000-dimension
Other Details for VGG-16/19:
- Paper Link: https://arxiv.org/pdf/1409.1556.pdf
- GitHub: VGG
- Published On: April 2015
- Performance on ImageNet Dataset: 71% (Top 1 Accuracy), 90% (Top 5 Accuracy)
- Variety of Parameters: ~140M
- Variety of Layers: 16/19
- Size on Disk: ~530MB
Implementation:
tf.keras.applications.VGG16(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)
The above-mentioned code is for VGG-16 implementation, keras offers an identical API for VGG-19 implementation, for more details confer with this documentation.
Xception is a deep CNN architecture that involves depthwise separable convolutions. A depthwise separable convolution might be understood as an Inception model with a maximally large variety of towers.
Architecture:
Input: Image of dimensions (299, 299, 3)
Output: Image embedding of 1000-dimension
Other Details for Xception:
- Paper Link: https://arxiv.org/pdf/1409.1556.pdf
- GitHub: Xception
- Published On: April 2017
- Performance on ImageNet Dataset: 79% (Top 1 Accuracy), 94.5% (Top 5 Accuracy)
- Variety of Parameters: ~30M
- Depth: 81
- Size on Disk: 88MB
Implementation:
- Instantiate the Xception model using the below-mentioned code:
tf.keras.applications.Xception(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)
The above-mentioned code is for Xception implementation, for more details confer with this documentation.
The previous CNN architectures weren’t designed to scale to many convolutional layers. It resulted in a vanishing gradient problem and limited performance upon adding latest layers to the present architecture.
ResNets architecture offers to skip connections to resolve the vanishing gradient problem.
Architecture:
This ResNet model uses a 34-layer network architecture inspired by the VGG-19 model to which the shortcut connections are added. These shortcut connections then convert the architecture right into a residual network.
There are several versions of ResNet architecture:
- ResNet50
- ResNet50V2
- ResNet101
- ResNet101V2
- ResNet152
- ResNet152V2
Input: Image of dimensions (224, 224, 3)
Output: Image embedding of 1000-dimension
Other Details for ResNet models:
- Paper Link: https://arxiv.org/pdf/1512.03385.pdf
- GitHub: ResNet
- Published On: Dec 2015
- Performance on ImageNet Dataset: 75–78% (Top 1 Accuracy), 92–93% (Top 5 Accuracy)
- Variety of Parameters: 25–60M
- Depth: 107–307
- Size on Disk: ~100–230MB
Implementation:
- Instantiate the ResNet50 model using the below-mentioned code:
tf.keras.applications.ResNet50(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
**kwargs
)
The above-mentioned code is for ResNet50 implementation, keras offers an identical API to other ResNet architecture implementations, for more details confer with this documentation.
Multiple deep layers of convolutions resulted within the overfitting of the information. To avoid overfitting, the inception model uses parallel layers or multiple filters of various sizes on the identical level, to make the model wider moderately than making it deeper. The Inception V1 model is fabricated from 4 parallel layers with: (1*1), (3*3), (5*5) convolutions, and (3*3) max pooling.
Inception (V1/V2/V3) is deep learning model-based CNN network developed by a team at Google. InceptionV3 is a sophisticated and optimized version of the InceptionV1 and V2 models.
Architecture:
The InceptionV3 model is made up of 42 layers. The architecture of InceptionV3 is progressively step-by-step built as:
- Factorized Convolutions
- Smaller Convolutions
- Asymmetric Convolutions
- Auxilliary Convolutions
- Grid Size Reduction
All these concepts are consolidated into the ultimate architecture mentioned below:
Input: Image of dimensions (299, 299, 3)
Output: Image embedding of 1000-dimension
Other Details for InceptionV3 models:
Implementation:
- Instantiate the InceptionV3 model using the below-mentioned code:
tf.keras.applications.InceptionV3(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)
The above-mentioned code is for InceptionV3 implementation, for more details confer with this documentation.
InceptionResNet-v2 is a CNN model developed by researchers at Google. The goal of this model was to cut back the complexity of InceptionV3 and explore the potential of using residual networks on the Inception model.
Architecture:
Input: Image of dimensions (299, 299, 3)
Output: Image embedding of 1000-dimension
Other Details for Inception-ResNet-V2 models:
Implementation:
- Instantiate the Inception-ResNet-V2 model using the below-mentioned code:
tf.keras.applications.InceptionResNetV2(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
**kwargs
)
The above-mentioned code is for Inception-ResNet-V2 implementation, for more details confer with this documentation.
MobileNet is a streamlined architecture that uses depthwise separable convolutions to construct deep convolutional neural networks and provides an efficient model for mobile and embedded vision applications.
Architecture:
Input: Image of dimensions (224, 224, 3)
Output: Image embedding of 1000-dimension
Other Details for MobileNet models:
Implementation:
- Instantiate the MobileNet model using the below-mentioned code:
tf.keras.applications.MobileNet(
input_shape=None,
alpha=1.0,
depth_multiplier=1,
dropout=0.001,
include_top=True,
weights="imagenet",
input_tensor=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
**kwargs
)
The above-mentioned code is for MobileNet implementation, keras offers an identical API to other MobileNet architecture (MobileNet-V2, MobileNet-V3) implementation, for more details confer with this documentation.
DenseNet is a CNN model developed to enhance accuracy brought on by the vanishing gradient in high-level neural networks attributable to the long distance between input and output layers and the knowledge vanishes before reaching the destination.
Architecture:
A DenseNet architecture has 3 dense blocks. The layers between two adjoining blocks are known as transition layers and alter feature-map sizes via convolution and pooling.
Input: Image of dimensions (224, 224, 3)
Output: Image embedding of 1000-dimension
Other Details for DenseNet models:
Implementation:
- Instantiate the DenseNet121 model using the below-mentioned code:
tf.keras.applications.DenseNet121(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)
The above-mentioned code is for DenseNet implementation, keras offers an identical API to other DenseNet architecture (DenseNet-169, DenseNet-201) implementation, for more details confer with this documentation.
Google researchers designed a NasNet model that framed the issue to search out one of the best CNN architecture as a Reinforcement Learning approach. The concept is to go looking for one of the best combination of parameters of the given search space of quite a lot of layers, filter sizes, strides, output channels, etc.
Input: Image of dimensions (331, 331, 3)
Other Details for NasNet models:
- Paper Link: https://arxiv.org/pdf/1608.06993.pdf
- Published On: Apr 2018
- Performance on ImageNet Dataset: 75–83% (Top 1 Accuracy), 92–96% (Top 5 Accuracy)
- Variety of Parameters: 5–90M
- Depth: 389–533
- Size on Disk: 23–343MB
Implementation:
- Instantiate the NesNetLarge model using the below-mentioned code:
tf.keras.applications.NASNetLarge(
input_shape=None,
include_top=True,
weights="imagenet",
input_tensor=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)
The above-mentioned code is for NesNet implementation, keras offers an identical API to other NasNet architecture (NasNetLarge, NasNetMobile) implementation, for more details confer with this documentation.
EfficientNet is a CNN architecture from the researchers of Google, that may achieve higher performance by a scaling method called compound scaling. This scaling method uniformly scales all dimensions of depth/width/resolution by a set amount (compound coefficient) uniformly.
Architecture:
Other Details for EfficientNet Models:
Implementation:
- Instantiate the EfficientNet-B0 model using the below-mentioned code:
tf.keras.applications.EfficientNetB0(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
**kwargs
)
The above-mentioned code is for EfficientNet-B0 implementation, keras offers an identical API for other EfficientNet architecture (EfficientNet-B0 to B7, EfficientNet-V2-B0 to B3) implementation, for more details confer with this documentation, and this documentation.
The ConvNeXt CNN model was proposed as a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them.
Architecture:
Other Details for ConvNeXt models:
Implementation:
- Instantiate the ConvNeXt-Tiny model using the below-mentioned code:
tf.keras.applications.ConvNeXtTiny(
model_name="convnext_tiny",
include_top=True,
include_preprocessing=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)
The above-mentioned code is for ConvNeXt-Tiny implementation, keras offers an identical API of the opposite EfficientNet architecture (ConvNeXt-Small, ConvNeXt-Base, ConvNeXt-Large, ConvNeXt-XLarge) implementation, for more details confer with this documentation.


