Protected: AI Unleashed: Mastering AI at Your Pace

0 of 27 lessons complete (0%)

Autoencoder and Variational Autoencoders (VAEs)

Variational Autoencoders (VAEs)

You don’t have access to this lesson

Please register or sign in to access the course content.

Variational Autoencoder (VAE) is a type of generative model in machine learning that combines principles from probabilistic modeling and deep learning. It is used to learn efficient representations of data and generate new data samples that are similar to the training data. VAEs are particularly popular in unsupervised learning tasks, such as image generation, anomaly detection, and representation learning.

Key Concepts of a Variational Autoencoder:

  • Autoencoder Structure:
    • A VAE is based on the architecture of a traditional autoencoder, which consists of an encoder and a decoder.
    • The encoder compresses the input data into a lower-dimensional latent space representation (encoding).
    • The decoder reconstructs the data from the latent space back to the original input space.
  • Probabilistic Latent Space:
    • Unlike a standard autoencoder, which maps inputs to fixed points in the latent space, a VAE maps inputs to probability distributions (typically Gaussian distributions) in the latent space.
    • The encoder outputs the parameters of these distributions (mean and variance) rather than a single point.
  • Latent Variable Model:
    • VAEs are based on the idea of latent variable models, where the observed data is assumed to be generated from some underlying latent variables.
    • The goal is to learn the distribution of these latent variables and use them to generate new data.
  • Variational Inference:
    • VAEs use variational inference to approximate the posterior distribution of the latent variables given the input data.
    • This is done by minimizing the Kullback-Leibler (KL) divergence between the approximate posterior (output by the encoder) and the true posterior.
  • Reparameterization Trick:
    • To enable backpropagation through the stochastic sampling process, VAEs use the reparameterization trick.
    • Instead of sampling directly from the latent distribution, the model samples from a standard normal distribution and then scales and shifts the samples using the mean and variance output by the encoder.
  • Loss Function:
    • The VAE loss function consists of two terms:
      • Reconstruction Loss: Measures how well the decoder reconstructs the input data from the latent representation (e.g., mean squared error or cross-entropy).KL Divergence: Regularizes the latent space by encouraging the learned distribution to be close to a prior distribution (usually a standard normal distribution).
    The total loss is: Loss=Reconstruction 
    Loss+KL Divergence
    Loss=Reconstruction 
    Loss+KL Divergence

How a VAE Works:

  1. The input data is passed through the encoder, which outputs the parameters (mean and variance) of the latent distribution.
  2. A sample is drawn from the latent distribution using the reparameterization trick.
  3. The sample is passed through the decoder to reconstruct the input data.
  4. The model is trained by minimizing the combined reconstruction and KL divergence loss.

Applications of VAEs:

  • Data Generation: Generating new data samples (e.g., images, text, or audio) that resemble the training data.
  • Dimensionality Reduction: Learning compact representations of high-dimensional data.
  • Anomaly Detection: Identifying outliers by comparing input data to the learned latent space.
  • Image Denoising: Reconstructing clean images from noisy inputs.

Advantages of VAEs:

  • They provide a probabilistic framework for learning latent representations.
  • They can generate diverse and realistic data samples.
  • They are computationally efficient compared to other generative models like GANs (Generative Adversarial Networks).

Limitations of VAEs:

  • The generated samples may be blurrier compared to those from GANs.
  • The KL divergence term can lead to over-regularization, causing the latent space to be less expressive

real-life example of using a Variational Autoencoder (VAE) to generate handwritten digits (e.g., from the MNIST dataset). We’ll break this into three parts:

  1. Problem Statement
  2. Mathematical Formulation
  3. Code Implementation

1. Problem Statement

We want to train a VAE to generate new handwritten digits (0–9) that resemble the MNIST dataset. The MNIST dataset consists of 28×28 grayscale images of digits, and our goal is to learn a latent representation of these images and use it to generate new ones.


2. Mathematical Formulation

Key Components:

  1. Encoder: Maps input data xx (an image) to a latent space distribution q(z∣x), parameterized by mean μμ and log variance log ⁡σ2.
  2. Latent Space: A Gaussian distribution z∼N(μ,σ2)z∼N(μ,σ2).
  3. Decoder: Maps a latent vector zz back to the data space p(x∣z)p(xz), reconstructing the input.

Loss Function:

The VAE loss consists of two terms:

  1. Reconstruction Loss: Measures how well the decoder reconstructs the input xx from the latent vector z. For binary data (e.g., MNIST pixels), we use binary cross-entropy: Lrecon=−∑i=1nxilog⁡(x^i)+(1−xi)log⁡(1−x^i) where x^ is the reconstructed image.
  2. KL Divergence: Regularizes the latent space by encouraging q(z∣x)q(zx) to be close to a standard normal prior 
    p(z)=N(0,I)p(z)=N(0,I):LKL=−12∑j=1J(1+log⁡(σj2)−μj2−σj2)LKL​=−21​j=1∑J​(1+log(σj2​)−μj2​−σj2​)
    where JJ is the dimensionality of the latent space.

The total loss is:L=Lrecon+LKLL=Lrecon​+LKL​

Reparameterization Trick:

To sample zz from q(z∣x)q(zx), we use:z=μ+σ⋅ϵ,ϵ∼N(0,I)z=μ+σϵ,ϵ∼N(0,I)

This allows backpropagation through the sampling process.

from tensorflow.keras.layers import Input, Dense, Lambda
from tensorflow.keras.models import Model
from tensorflow.keras.losses import binary_crossentropy
import tensorflow.keras.backend as K
import numpy as np
import matplotlib.pyplot as plt
from keras.datasets import mnist

# Load the dataset
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))

# Define the VAE model
input_img = Input(shape=(28, 28, 1))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Flatten()(x)
x = Dense(16, activation='relu')(x)

# Latent variables
z_mean = Dense(2)(x)
z_log_var = Dense(2)(x)

# Reparameterization trick
def sampling(args):
    z_mean, z_log_var = args
    batch = K.shape(z_mean)[0]
    dim = K.int_shape(z_mean)[1]
    epsilon = K.random_normal(shape=(batch, dim))
    return z_mean + K.exp(0.5 * z_log_var) * epsilon

z = Lambda(sampling, output_shape=(2,))([z_mean, z_log_var])

# Decoder
decoder_hid = Dense(128, activation='relu')
decoder_upsample = Dense(7 * 7 * 32, activation='relu')
decoder_reshape = Reshape((7, 7, 32))
decoder_deconv1 = Conv2DTranspose(32, (3, 3), padding='same', activation='relu')
decoder_upsample1 = UpSampling2D((2, 2))
decoder_deconv2 = Conv2DTranspose(32, (3, 3), padding='same', activation='relu')
decoder_upsample2 = UpSampling2D((2, 2))
decoder_mean = Conv2D(1, (3, 3), padding='same', activation='sigmoid')

hid_decoded = decoder_hid(z)
upsample_decoded = decoder_upsample(hid_decoded)
reshape_decoded = decoder_reshape(upsample_decoded)
deconv1_decoded = decoder_deconv1(reshape_decoded)
upsample1_decoded = decoder_upsample1(deconv1_decoded)
deconv2_decoded = decoder_deconv2(upsample1_decoded)
upsample2_decoded = decoder_upsample2(deconv2_decoded)
x_decoded_mean = decoder_mean(upsample2_decoded)

# Define the VAE model
vae = Model(input_img, x_decoded_mean)

# Loss function
xent_loss = binary_crossentropy(K.flatten(input_img), K.flatten(x_decoded_mean))
kl_loss = - 0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
vae_loss = K.mean(xent_loss + kl_loss)

vae.add_loss(vae_loss)
vae.compile(optimizer='rmsprop')
vae.summary()

# Train the VAE
vae.fit(x_train, epochs=50, batch_size=256, validation_data=(x_test, None))

# Generate new images
n = 15
digit_size = 28
figure = np.zeros((digit_size * n, digit_size * n))

# Linearly spaced coordinates corresponding to the 2D plot of digit classes in the latent space
grid_x = np.linspace(-3, 3, n)
grid_y = np.linspace(-3, 3, n)

for i, yi in enumerate(grid_x):
    for j, xi in enumerate(grid_y):
        z_sample = np.array([[xi, yi]])
        x_decoded = decoder.predict
(z_sample)
digit = x_decoded[0].reshape(digit_size, digit_size)
figure[i * digit_size: (i + 1) * digit_size,
j * digit_size: (j + 1) * digit_size] = digit

plt.figure(figsize=(10, 10))
plt.imshow(figure, cmap='Greys_r')
plt.show()