Reading Time: 12 minutes


Adversarial networks (Deep Convolutional Generative Adversarial Networks) have been a very active playground lately for Deep Learning practitioners. The field of adversarial networks was established by Ian Goodfellow and his colleagues from the University of Montreal in their article Generative Adversarial Nets. Since then, new variants of the original model keep being developed and research keeps moving forward. 

The main goal of adversarial networks is to estimate generative models within an adversarial process. This process involves training two models at the same time in a one against the other approach:

  • the Generative model (typically denoted as G) is trained to capture the data distribution and generalize over data patterns to ultimately reproduce a perfect copy of the original samples;
  • The Discriminator model (denoted as D) tries to spot the fake samples coming from the Generative model. The Discriminator estimates the probability that the data is either original or generated. 

The training process targets two different, but complementary goals for both models:

  • The Generative model is trained to outsmart the Discriminator by always generating better fakes.
  • The Discriminator is trained to learn how to correctly classify the real data from the fake. 

The overall equilibrium is attained when the Generator creates perfect fakes and the Discriminator is left with 50% confidence when guessing if the output is real or fake.

Different approaches to adversarial networks

Since Ian Goodfellow’s paper created the foundation for the core mechanics of adversarial networks, several other approaches for implementing the generative model have been proposed and tested. Some of these approaches are the following:

Fully Visible Belief Networks

These networks were mostly used to recognize, cluster, and generate images, video sequences, and motion-capture. They were introduced in 2006 by Geoff Hinton.  

They are a class of explicit density models. They use the chain rule to decompose the probability distribution over a vector. The idea is to decompose the classic vector distribution into a product over each of the members of the vector. 

The most popular model in this family is an autoregressive generative model called PixelCNN

Variational Autoencoder

Autoencoders take data as input and discover some latent state representation of that data. Typically, the input vector is converted into an encoding vector where each dimension represents some learned attribute about the data.

Variational autoencoders (VAE) provide a probabilistic way to describe a specific observation in latent space. Rather than building a dedicated encoder for each single latent state attribute of the data, we instead formulate our encoder to describe the probability distribution for all latent attributes. 

A fairly simplistic example that would illustrate the difference between single discrete values and probability distributions for latent attributes in the data is shown in the image below:

As you can see, it’s better to represent latent attributes in the data in probabilistic terms so we can assess a whole range of values. 

Alec Radford used a Variational Autoencoder to generate fictional celebrity faces.

Variational Autoencoders to generate fake faces | Credit: Alec Radford

Boltzmann machines

Boltzmann machines are networks of symmetrically connected units that make stochastic decisions about whether to be active or not. They have simple learning algorithms that enable them to discover interesting features in datasets composed of binary vectors.

They can also be seen as an energy function that dispatches the probability distribution of a particular state.

Deep Convolutional GANs

Deep Convolutional Adversarial Networks are a particular kind of GANs. The main layers in the network architecture of the Generator (G) and Discriminator (D) are respectively convolutional and transpose-convolutional layers. 

These architectures were first introduced in the paper Unsupervised Representational Learning With Deep Convolutional Generative Adversarial Networks. The authors, Radford et. al., presented a peculiar implementation that entails a bunch of strided convolutional layers, batch norm layers, and LeakyReLU activations. The Generator was mostly filled with transpose-convolutional layers and conversely, to the Discriminator the activations were simple ReLU layers.

The Discriminator input is a 3x64x64 colored image and the output is a scalar probability indicating the rate of confidence of whether the input is from the real data distribution or completely made up by the Generator. 

On the other hand, the input for the Generator consists of a latent vector drawn from a standard normal distribution, and the corresponding output yields a 3x64x64 image.

Let’s get into some mathematical notation to help clarify terms that we’ll be using later in the article. 

The Discriminator network is noted as D(x) which outputs the scalar probability that x came from training data rather than the Generator. 

For the Generator, z is the latent space vector sampled from standard normal distribution. Therefore, G(z) represents the function that maps the latent vector z to data-space.

As such, D(G(z)) represents the probability that the output of the Generator G is a real image. In alignment with what we previously explained about the competition involving one model against the other, D(x) tries to maximize the probability it correctly classifies reals and fakes, which can be denoted as log(D(x)) and G(z) in the contrary, tries to minimize the probability that the fake output get spotted by the Discriminator, hence the probability is denoted as log(1-D(G(z))).

The overall GAN loss function as described in the official paper looks like this:

As previously explained, the theoretical convergence that leads to the solution for this function happens when: Pg=Pdata and the discriminator guesses if the inputs are real or fake. 

Now that you’ve come across the general concepts and you have a better foundation, we can purposefully dive into more practical concerns. 

We’ll build a DCGAN trained on image faces of famous celebrities. We’ll be breaking the steps to building the model, initializing the weights, training, and evaluating the final results. To follow along, start your Neptune experiment and connect your API token to your notebook.

Celeb-A Faces Dataset

Celebrity Attribute Faces is a large-scale open-source dataset that provides a wide range of celebrity images faces annotated with 40 attributes. Image quality is good, and the dataset proposes a rich variety of pose variations and background clutter on the actual images, making it a perfect fit for our task.

Download Link: Large-scale CelebFaces Attributes (CelebA) Dataset

Create a directory inside your notebook root path and extract the folder into it. It should be like this:

DCGAN CelebA-Dataset
  CelebA Dataset after extracting the folder

Now, we need to start the preprocessing part. Transform our data and initialize the Torch DataLoader class that will take care of shuffling and loading the data batches during training.

import torchvision.datasets as datasets
def data_preprocessing(root_dir, batch_size=128, image-size=64, num_workers=2):
  data = datasets.ImageFolder(root=root_dir,
                              transform=transforms.Compose([
                                  transforms.Resize(image_size),
                                  transforms.CenterCrop(image_size),
                                  transforms.ToTensor(),
                                  transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
                              ]))
  
  dataloader = torch.utils.data.DataLoader(dataset, batch_size, shuffle=True, num_workers)
  return dataloader

Log all the dataset details to your Neptune Run, hence you can keep track of your dataset info and the corresponding metadata.

Follow the instructions here to set up your own Neptune account to track these runs.

Start your experiment:

run = neptune.init(project='aymane.hachcham/DCGAN', api_token='ANONYMOUS') 
run['config/dataset/path'] = 'Documents/DCGAN/dataset'
run['config/dataset/size'] = 202599
run['config/dataset/transforms'] = {
    'train': transforms.Compose([
                                  transforms.Resize(hparams['image_size']),
                                  transforms.CenterCrop(hparams['image_size']),
                                  transforms.ToTensor(),
                                  transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
}

DCGAN Neptune-dataset
The dataset config in your Neptune dashboard

Model building

Once the dataset is ready and logged, we can start building the actual model. As I’ve explained earlier, we’ll try to tackle this with a step-by-step approach. We need to start with the weight initialization strategy.

Weight initialization is about the specific criteria that the model weights should meet. The official paper recommends to randomly initialize the weights from a normal distribution with mean=0 and stdev=0.02. 

We’ll create a function that takes a general model as input and reinitializes the convolutional, transpose-convolutional and batch normalization layers to fully meet this criteria. 

Note: You could follow along the tutorial by taking a look at the complete colab notebook, here -> Colab Notebook

def weights_init(model):
  model_classname = model.__class__.__name__
  
  if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
  
  elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
        nn.init.constant_(m.bias.data, 0)

Since the model argument will be replaced by either a Generator or Discriminator they will surely have Conv and BatchNorm layers. So, the function sets up for each of these layers a random weight initialization with mean=0.0 and stdev=0.02.

The Generator

The role of Generator G is to map the latent vector Z to data-space. In our case, this translates to ultimately creating RGB images with the same size and dimensions as the original ones in the data. This is accomplished by stacking a series of convolutional, transpose-convolutional, and Batch Norm layers that work in harmony to produce a 3x64x64 output that eventually looks like a human face.   

It’s worth noting that the Batch Norm layers added right after the transpose-convolutions largely contribute to help the flow of gradients during training, so they constitute an important part in the overall training performance.

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.main = nn.Sequential(
            
            nn.ConvTranspose2d(hparams["size_latent_z_vector"], 
                               hparams["size_feature_maps_generator"] * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(hparams["size_feature_maps_generator"] * 8),
            nn.ReLU(True),
            
            nn.ConvTranspose2d(hparams["size_feature_maps_generator"] * 8, 
                               hparams["size_feature_maps_generator"] * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(hparams["size_feature_maps_generator"] * 4),
            nn.ReLU(True),
            
            nn.ConvTranspose2d( hparams["size_feature_maps_generator"] * 4, 
                               hparams["size_feature_maps_generator"] * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(hparams["size_feature_maps_generator"] * 2),
            nn.ReLU(True),
            
            nn.ConvTranspose2d(hparams["size_feature_maps_generator"] * 2, 
                               hparams["size_feature_maps_generator"], 4, 2, 1, bias=False),
            nn.BatchNorm2d(hparams["size_feature_maps_generator"]),
            nn.ReLU(True),
            
            nn.ConvTranspose2d(hparams["size_feature_maps_generator"], 
                               hparams["num_channels"], 4, 2, 1, bias=False),
            nn.Tanh()
            
        )

    def forward(self, input):
        return self.main(input)

The feature maps of the Generator are propagated through all the layers. The size of the latent vector and the number of channels is set in the input section to influence the whole architecture. 

Let’s instantiate the Generator and apply the weight initialization:

model_name = "Generator"
device = "cuda"
generator = Generator().to(device)

generator.apply(weights_init)

Now we can print the General architecture and save it to the Neptune artifacts folder:

with open(f"./{model_name}_arch.txt", "w") as f:
  f.write(str(generator))


run[f"io_files/artifacts/{model_name}_arch"].upload(f"./{model_name}_arch.txt")

The Discriminator

The Discriminator D acts as a binary classification model in the sense that the input is a 3x64x64 image and output is a probability that indicates the rate of confidence for the latter image being real or fake. The image is processed through a series of Conv2, BatchNorm, and LeakyReLU layers and the final probability is assessed by a Sigmoid. 

The official paper claims that it’s a better practice to use strided convolutions over pooling in order to downsample, because it helps the network learn its own pooling function. In addition, LeakyReLU activations promote healthy gradient flow. Check this article for more info about the Dying ReLU problem and how leaky ReLU activations help overcome this issue.

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.main = nn.Sequential(
            
            nn.Conv2d(hparams["num_channels"], 
                      hparams["size_feature_maps_discriminator"], 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            
            nn.Conv2d(hparams["size_feature_maps_discriminator"], 
                      hparams["size_feature_maps_discriminator"] * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(hparams["size_feature_maps_discriminator"] * 2),
            nn.LeakyReLU(0.2, inplace=True),
            
            nn.Conv2d(hparams["size_feature_maps_discriminator"] * 2, 
                      hparams["size_feature_maps_discriminator"] * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(hparams["size_feature_maps_discriminator"] * 4),
            nn.LeakyReLU(0.2, inplace=True),
            
            nn.Conv2d(hparams["size_feature_maps_discriminator"] * 4, 
                      hparams["size_feature_maps_discriminator"] * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(hparams["size_feature_maps_discriminator"] * 8),
            nn.LeakyReLU(0.2, inplace=True),
            
            nn.Conv2d(hparams["size_feature_maps_discriminator"] * 8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )

    def forward(self, input):
        return self.main(input)

Consecutively, let’s initialize the Discriminator, apply the weights initialization, and log the architecture to the artifacts folder:

disc_name = "Discriminator"
device = "cuda"
discriminator = Discriminator().to(device)

generator.apply(weights_init)


with open(f"./{disc_name}_arch.txt", "w") as f:
  f.write(str(discriminator))


run[f"io_files/artifacts/{disc_name}_arch"].upload(f"./{disc_name}_arch.txt")

Now we have both model architectures logged into our Dashboard:

Model training and debugging

Before actually starting the training process, we’ll take some time to discuss the loss functions and optimizers that we’ll be using. 

As recommended by the paper the preferable loss function to use is the Binary Cross Entropy or BCELoss as defined in Pytorch. The convenient part with BCELoss is that it provides the calculation of both log components in the objective function, i.e logD(x) and log(1-D(G(z))).

Another convention used in the original paper is fake and real labels. It’s used when calculating the D and G losses. 
Finally, we set up two different optimizers for G and D. According to specifications in the paper, both optimizers are Adam with learning rate 0.0002 and Beta1 = 0.5, and also generate a fixed batch of latent vectors that are derived from a Gaussian distribution.

criterion = nn.BCELoss()


fixed_noise = torch.randn(64, nz, 1, 1, device=device)


real_label = 1.
fake_label = 0.


optimizerD = optim.Adam(netD.parameters(), lr=lr, betas=(beta1, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr=lr, betas=(beta1, 0.999))

The training phase

Now that we have all parts defined, we can start training. To perform training we need to meticulously follow the algorithm presented in Goodfellow’s paper. Specifically, we’ll be constructing different mini-batches for real and fake images while adjusting the objective function of the Generator to maximize logD(G(z)).

The training loop consists of two segmented parts. The first part deals with the Discriminator and the second one with the Generator.

Discriminator training 

As stated in the official paper, the goal for training the Discriminator is to “update the discriminator by ascending its stochastic gradient”. In practice what we want to achieve is to maximize the probability of correctly classifying a given input as real or fake. Therefore, we need to construct a batch of real samples from the dataset, forward pass it through D, calculate the loss, and then calculate the gradients in a backward pass. Then, we repeat the same schema for the batch of fake samples. 

Generator training  

What we want from the Generator is very much clear, we aim to train it such that it learns to generate better fakes. In his paper, Goodfellow insists on not providing sufficient gradients, especially early in the learning process. To practically accomplish the following, we’ll be classifying the Generator output from the Discriminator training part, computing the Generator loss using real label batches, computing the gradients in a backward pass, and finally updating G’s parameters with the corresponding optimizer step. 

Create the data loader:

dataloader = data_preprocessing(data_dir)

Build the training loop:

  • Discriminator training part:
 
        
        discriminator.zero_grad()
        
        real_cpu = data[0].to(device)
        b_size = real_cpu.size(0)
        label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
        
        output = discriminator(real_cpu).view(-1)
        
        errD_real = criterion(output, label)
        
        errD_real.backward()
        D_x = output.mean().item()

        
        
        noise = torch.randn(b_size, hparams["size_latent_z_vector"], 1, 1, device=device)
        
        fake = generator(noise)
        label.fill_(fake_label)
        
        output = discriminator(fake.detach()).view(-1)
        
        errD_fake = criterion(output, label)
        
        errD_fake.backward()
        D_G_z1 = output.mean().item()
        
        errD = errD_real + errD_fake
        
        optimizerD.step()
        generator.zero_grad()
        label.fill_(real_label)  
        
        output = discriminator(fake).view(-1)
        
        errG = criterion(output, label)
        
        errG.backward()
        D_G_z2 = output.mean().item()
        
        optimizerG.step()
  • Call both parts inside the training loop:
img_list = [] 
G_losses = [] 
D_losses = [] 
iters = 0


for epoch in range(hparams["num_epochs"]):
    
    for i, data in enumerate(dataloader, 0):
    	discriminator_training()
	generator_training()
	
	
	print('[%d/%d][%d/%d]tLoss_D: %.4ftLoss_G: %.4ftD(x): %.4ftD(G(z)): %.4f / %.4f'
              % (epoch, hparams["num_epochs"], i, len(training_data),
                  errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))

        
        G_losses.append(errG.item())
        D_losses.append(errD.item())

        
        run["training/batch/Gloss"].log(errG.item())
        run["training/batch/Dloss"].log(errD.item())

        
        if (iters % 500 == 0) or ((epoch == hparams["num_epochs"]-1) and (i == len(training_data)-1)):
            with torch.no_grad():
                fake = generator(fixed_noise).detach().cpu()
            img_list.append(vutils.make_grid(fake, padding=2, normalize=True))

        iters += 1

Check the G and D losses:

We can clearly observe that two losses decrease and stabilize in the end. We can perform multiple training sessions by varying the epoch number, but the changes still aren’t big. We can notice slight improvements in the losses when we increase the number of epochs, which can be seen in the following comparison:

DCGAN Comparison_losses
Left chart: D and G losses with 10 epochs, Right chart: D and G losses with 5 epochs | See in Neptune

We can also visualize the Generator and Discriminator losses overlapping each other:

DCGAN D_G_losses overlapping
D and G losses overlapping

Final results of the generator progression

Finally, we can take a look at some real and fake images created by the Generator side by side.

DCGAN results
Fake images versus real images

Concluding thoughts

We have managed to build a Deep Convolutional GAN from the ground up, explaining all different parts and components and performing training on a human face dataset. The quality of the model can always be improved by augmenting the training data and wisely tweaking the hyperparameters. 

Generative adversarial neural networks are the next step in deep learning evolution and while they hold great promise across several application domains, there are major challenges in both hardware and frameworks. Nevertheless, GANs have a great future ahead of them with an enormous range of applications like Image-to-Image Translation, Semantic-Image-to-Photo Translation, 3D Object generation, autonomous driving, Human pose generation, and so on and so forth.

As always, here are some good resources if you want to keep learning about the topic:

Feel free to email me with any questions at: hachcham.ayman@gmail.com

Don’t hesitate to check the Google Colab Notebook: Fake Faces with DCGAN


READ NEXT

Image Processing in Python: Algorithms, Tools, and Methods You Should Know

9 mins read | Author Neetika Khandelwal | Updated May 27th, 2021

Images define the world, each image has its own story, it contains a lot of crucial information that can be useful in many ways. This information can be obtained with the help of the technique known as Image Processing.

It is the core part of computer vision which plays a crucial role in many real-world examples like robotics, self-driving cars, and object detection. Image processing allows us to transform and manipulate thousands of images at a time and extract useful insights from them. It has a wide range of applications in almost every field. 

Python is one of the widely used programming languages for this purpose. Its amazing libraries and tools help in achieving the task of image processing very efficiently. 

Through this article, you will learn about classical algorithms, techniques, and tools to process the image and get the desired output.

Let’s get into it!


Continue reading ->




Source link

Spread the Word!