Home Artificial Intelligence Complete Guide on Deep Learning Architectures Part 2: Autoencoders Autoencoder: Basic Ideas Keras Implementation Sparse Autoencoder Denoising Autoencoder Stacked Autoencoder

Complete Guide on Deep Learning Architectures Part 2: Autoencoders Autoencoder: Basic Ideas Keras Implementation Sparse Autoencoder Denoising Autoencoder Stacked Autoencoder

0
Complete Guide on Deep Learning Architectures Part 2: Autoencoders
Autoencoder: Basic Ideas
Keras Implementation
Sparse Autoencoder
Denoising Autoencoder
Stacked Autoencoder

Photo by Daniele Levis Pelusi on Unsplash

Autoencoder is the form of a neural network that reconstructs an input from the output. The fundamental idea here is that we now have our inputs, and we compress those inputs in such a way that we now have a very powerful features to reconstruct it back.

As humans, once we’re asked to attract a tree with the least variety of touches to the paper, (provided that we’ve seen so many trees in our lifetime) we draw a line for the tree and couple of branches on top to supply an abstraction to how trees appear to be, that is what’s being done with autoencoder.

A mean autoencoder looks like below:

Let’s take a solid case for image reconstruction.

Now we have our input layer with 784 units (assuming we give 28×28 images) and we could simply stack a layer on top with 28 units, and our output layer could have 784 units again.

The primary part is named “encoder” it encodes our inputs as latent variables, and second part is named “decoder” it’s going to reconstruct our inputs from the latent variables.

The hidden layer having less variety of units will probably be enough to do the compression and get latent variables. This is named “undercomplete autoencoder” (we also produce other forms of autoencoders but this offers the major idea, we’ll go over them as well). So in brief, it’s just one other feed forward neural network that has the next characteristics:

  • input layer, hidden layer with less variety of units and output layer
  • it’s unsupervised: we are going to pass our inputs, get the output and compare with input again
  • Our loss function will probably be comparing the input to compressed after which reconstructed version of the input and see if the model is successful or not.
  • An autoencoder that has a decoder with a linear layer essentially does the identical thing as principal component evaluation (despite the fact that training objective is to repeat the input).
  • One other core concept of autoencoder is weight tying. The weights of decoder are tied to weights of encoder. While you transpose the load matrix of encoder, you get the weights of decoder. It’s a typical practice for decoder weights to be tied to encoder weights. This protects memory (using less parameters), and reduced overfitting. I attempted to clarify my intuition below in multiple graphics. Let’s have a look.

For the autoencoder below:

Weight matrix of encoder and decoder looks like this (you may skip this in the event you know what transpose means):

Let’s implement above network using Keras Subclassing API. I left comments for every layer to walk you thru create it.


class Autoencoder(Model):

def __init__(self, latent_dim):
super(Autoencoder, self).__init__()
self.latent_dim = latent_dim

# define our encoder and decoder with Sequential API
# flatten the image and pass to latent layer to encode
self.encoder = tf.keras.Sequential([
layers.Flatten(),
layers.Dense(latent_dim, activation=’relu’),
])

# reconstruct latent outputs with one other dense layer
# reshape back to image size
self.decoder = tf.keras.Sequential([
layers.Dense(784, activation=’sigmoid’),
layers.Reshape((28, 28))
])

# give input to encoder and pass encoder outputs (latents) to decoder
def call(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return decoded

# initialize model with latent dimension of 128
autoencoder = Autoencoder(128)

# we are able to use easy MSE loss to match input with reconstruction
autoencoder.compile(optimizer=’adam’, loss=losses.MeanSquaredError())

# we don’t have a y_train given we wish output to be same as input :)
autoencoder.fit(x_train, x_train,
epochs=10,
shuffle=True,
validation_data=(x_test, x_test))

In contrast to undercomplete autoencoder, complete autoencoder has equal units, and overcomplete autoencoder has more units in latent dimension in comparison with encoder and deocder. This causes the model to not learn anything but quite overfit. In undercomplete autoencoders however, the encoder and decoder could be overcapacitated with information if hidden layer is just too small. To avoid overengineering this and add more functionalities to autoencoders, regularized autoencoders are introduced. These models have different loss functions that not only copy input to output, but make the model more robust to noisy, sparse or missing data. There are two forms of regularized autoencoders, called denoising autoencoder and sparse autoencoder. We won’t undergo them in depth on this post since implementation doesn’t differ quite a bit from the conventional autoencoder.

Sparse autoencoders are autoencoders which have loss functions with a penalty for latent dimension (added to encoder output) on top of the reconstruction loss. These sparse features might be used to make the issue supervised, where the outputs depend upon those features. This fashion, autoencoders might be used for problems like classification.

Denoising autoencoders are form of autoencoders that remove the noise from a given input. To do that, we simply train the autoencoder with corrupted version of input with a noise and ask the model to output the unique version of the input that doesn’t have the noise. You may see the comparison of loss functions below. The implementation of that is same as normal autoencoder, aside from the input.

Before ReLU existed, vanishing gradients would make it inconceivable to coach deep neural networks. For this, stacked autoencoders were created as a hacky workaround. One autoencoder was trained to learn the features of the training data, after which the decoder layer was cut and one other encoder is added on top and the brand new network is trained. At the tip, softmax layer was added to make use of these features for classification. This may very well be one in every of the early techniques to do transfer learning.

There are variational autoencoders and other types heavily utilized in generative AI. I’ll undergo them in one other blog post. Thanks quite a bit in the event you’ve read this far and let me know if there’s anything I can improve 🙂

LEAVE A REPLY

Please enter your comment!
Please enter your name here