GANs : One of the most revolutionary and coolest developments in AI in the last two decades - Yann LeCun

Credits: Unsplash

GANs, a beautiful concept, that has been the base model for most of the generative AI applications in the industry along with diffusion models.


Developed first by Ian Goodfellow and his colleagues in June 2014, is a prominent framework for approaching generative AI. Generative AI is a term used for a class of algorithms that make it possible for machines to learn to generate new content (example: images, text, etc.), from different input types (examples: text, images, music, etc.)

What is the essence of GANs (Generative adversarial Networks) ?

This algorithm has two components: a) Discriminator b) Generator Let’s consider an analogy for the explanation to be more understandable.

  1. Consider there is a Detective (Discriminator) and a Fraud (Generator). The Fraud has only one goal in life, to fool our famous detective by generating hyper-realistic fake documents to get away from a crime. The job of our detective is naturally, to be skilled enough to tell which documents are fake and which documents are real, so that no criminal gets away!
  2. Let Discriminator be denoted as D and the Generator be denoted as G from here on to save me some typing exercise.
  3. So in essence, The Detective’s goal is to maximize its probability (ability) to classify real data as real and fake data as fake.
  4. And likewise, to counter Detective’s goal, the goal of the Fraud is to minimize the probability of the Detective to being able to distinguish correctly between what’s real and what’s fake.
  5. As any other machine learning algorithm, we have an objective function. The objective function, also known as the cost function or loss function in the context of machine learning, is a function that an algorithm optimizes during training. It’s a measure of how well the algorithm is performing. In other words, it’s a measure of the “error” or “loss” of the model’s predictions compared to the actual data. For example, in our previous logistic_regression implementation, we chose the objective function as Binary Cross-Entropy.
  6. In GAN, the Fraud(G) tries to minimize the objective function, and Detective(D) tries to maximize the objective function.

It is the probability of the D to distinguish between what’s real and what’s fake correctly.

As seen in the diagram above, this fight between the D and the G goes on, until the point where, D can no longer maximize and G can no longer minimize the objective function. So while D strives to learn the conditional probability of the output, given the input; G learns the probability distribution of the input data.


So, now we know we need 2 components to implement our 1st GAN:`

  1. A neural network to generate data (The G)
  2. A neural network to distinguish the fake data from the real data (The D)

The following are the steps involved in this implementation:

  1. First we train D to know well how a “Real Document” looks like. We do this by feeding our D with enough images of real documents.
  2. Then when it’s trained, we show it the images of “fake Documents” to see whether D can distinguish efficiently.
  3. All this while, when D was training to be good, our G was sitting idle.
  4. Now it’s time to hone the skills of G. G takes in a random input vector and out of it, will try to create a “fake Document”. The random samples are drawn from a “latent space” consisting of vectors that represent a compressed form of the generated samples.
  5. This “fake Document”, we then send to our D. Let’s see what decision D will make.
  6. This game has always a winner, either D will win by identifying the “fake” as fake, or G will win by fooling the D.
  7. The result after every such game is revealed to both G and D. Using this result, the winner remains unchanged, but the loser will be updated to become better. For example, if G loses, D will remain the same and G will update itself to give more real looking Documents. But if D loses, G will not change and D gets updated to become better.
  8. This game goes on for many iterations, until we reach a point where the G gets so good at forgery, that D can no longer beat it!
  9. Thus, we were now able to create a “Master of Forgery!” This battle between two neural networks is why this framework has the term “adversarial” in it.

Now GANs are not the only Generative AI techniques out there. We have had Boltzman machines, HMMs, GPT, Variational encoders, etc. as well. To see one cool application of GAN, visit https://this-person-does-not-exist.com/en . This application is made using StyleGAN to generate fake images of non-existent humans, and most of them are damn amazing. You might also be interested in CycleGAN, which is used for style transfer on images. When you are dealing with image data, most of the times, the neural networks are implemented using Convolutional Neural network. We will implement a basic GAN in python for the next blog.