Generative adversarial networks (GANs) are one of the modern technologies that offer a lot of potential in many use cases, from creating your aged pictures and augmenting your voice to providing various applications in medical and other industries.
This advanced technology can help you shape your products and services. It can also be used to improve image quality to preserve memories.
While GANs are a boon for many, some find it concerning.
But what is this technology exactly?
In this article, I’ll discuss what a GAN is, how it works, and its applications.
So, let’s dive right in!
What Is a Generative Adversarial Network?
A Generative Adversarial Network (GAN) is a machine learning framework consisting of two neural networks competing to produce more accurate predictions such as pictures, unique music, drawings, and so on.
GANs was designed in 2014 by a computer scientist and engineer, Ian Goodfellow, and some of his colleagues. They are unique deep neural networks capable of generating new data similar to the one they are being trained on. They contest in a zero-sum game that results in one agent losing the game while the other winning it.
Originally, GANs was proposed as a generative model for machine learning, mainly unsupervised learning. But GANs are also helpful for full-supervised learning, semi-supervised learning, and reinforcement learning.
The two blocks in competition in a GAN are:
The generator: It’s a convolutional neural network that artificially produces outputs similar to actual data.
The discriminator: It’s a deconvolutional neural network that can identify those outputs that are artificially created.
To understand the concept of GAN better, let’s quickly understand some important related concepts.
Machine learning (ML)
Machine learning is a part of artificial intelligence (AI) that involves learning and building models leveraging data to enhance performance and accuracy while performing tasks or making decisions or predictions.
ML algorithms create models based on training data, improving with continuous learning. They are used in multiple fields, including computer vision, automated decision-making, email filtering, medicine, banking, data quality, cybersecurity, speech recognition, recommendation systems, and more.
In deep learning and machine learning, the discriminating model works as a classifier to distinguish between a set of levels or two classes.
For example, differentiating between different fruits or animals.
In generative models, random samples are considered to create new realistic pictures. It learns from real images of some objects or living things to generate its own realistic yet mimicked ideas. These models are of two types:
Variational autoencoders: They utilize encoders and decoders that are separate neural networks. This works because a given realistic image passes through an encoder to represent these images as vectors in a latent space.
Next, a decoder is used to take these interpretations to produce some realistic copies of these images. At first, its image quality could be low, but it will enhance after the decoder becomes fully functional, and you can disregard the encoder.
Generative adversarial networks (GANs): As discussed above, a GAN is a deep neural network capable of generating new, similar data from the data input it’s provided with. It comes under unsupervised machine learning, which is one of the types of machine learning discussed below.
In supervised training, a machine is trained using well-labeled data. This means some data will already be tagged with the right answer. Here, the machine is given some data or examples to enable the supervised learning algorithm to analyze the training data and produce an accurate result from this labeled data.
Unsupervised learning involves training a machine with the help of data that are neither labeled nor classified. It allows the machine learning algorithm to work on that data without guidance. In this type of learning, the machine’s task is to categorize unsorted data based on the patterns, similarities, and differences with no prior data training.
So, GANs are associated with performing unsupervised learning in ML. It has two models that can automatically uncover and learn the patterns from input data. These two models are generator and discriminator.
Let’s understand them a bit more.
Parts of a GAN
The term “adversarial” is included in GAN because it has two parts – generator and denominator competing. This is done to capture, scrutinize, and replicate data variations in a dataset. Let’s have a better understanding of these two parts of a GAN.
A generator is a neural network capable of learning and generating fake data points such as images and audio that look realistic. It is used in training and gets better with continuous learning.
The data generated by the generator are used as a negative example for the other part – the denominator that we’ll see next. The generator takes a random fixed-length vector as input to produce a sample output. It aims to present the output before the discriminator so that it can classify whether it’s real or fake.
The generator is trained with these components:
- Noisy input vectors
- A generator network to transform a random input into the data instance
- A discriminator network to classify the generated data
- A generator loss to penalize the generator as it fails to fool the discriminator
The generator works like a thief to replicate and create realistic data to fool the discriminator. It aims to bypass several checks performed. Although it can fail terribly at the initial stages, it keeps improving until it generates multiple realistic, high-quality data and can avoid the tests. After this ability is attained, you can utilize just the generator without requiring a separate discriminator.
A discriminator is also a neural network that can differentiate between a fake and real image or other data types. Like a generator, it plays a vital role during the training phase.
It acts like the police to catch the thief (fake data by the generator). It aims at detecting false images and abnormalities in a data instance.
As discussed before, the generator learns and keeps improving to reach a point where it becomes self-reliant to produce high-quality images that don’t require a discriminator. When the high-quality data from the generator is passed through the discriminator, it can no longer differentiate between a real and fake image. So, you are good to go with just the generator.
How Does GAN Work?
In a generative adversarial network (GAN), three things involve:
- A generative model to describe the way data is generated.
- An adversarial setting where a model is trained.
- Deep neural networks as AI algorithms for training.
GAN’s two neural networks – generator and discriminator- are employed to play an adversarial game. The generator takes the input data, such as audio files, images, etc., to generate a similar data instance while the discriminator validates the authenticity of that data instance. The latter will determine whether or not the data instance it has reviewed is real or otherwise.
For example, you want to verify whether a given image is real or fake. You can use hand-generated data inputs to feed to the generator. It will create new, replicated images as the output.
While doing so, the generator aims that all the images it generates will be considered authentic, despite being fake. It wants to create passable outcomes to lie and avoid being caught.
Next, this output will go to the discriminator along with a set of images from real data to detect whether these images are authentic or not. It works adversely on the generator no matter how hard it tries mimicking; the discriminator will help distinguish factual data from fake ones.
The discriminator will take both fake and real data to return a probability of 0 or 1. Here, 1 represents authenticity while 0 represents fake.
There are two feedback loops in this process:
- The generator joins a feedback loop with a discriminator
- The discriminator joins another feedback loop with a set of real images
A GAN training works because both generator and discriminator are in training. The generator continuously learns by passing false inputs, while the discriminator will learn to improve detection. Here, both are dynamic.
The discriminator is a convolutional network capable of categorizing images supplied to it. It works as a binomial classifier to label images as fake or real.
On the other hand, the generator is like an inverse convolutional network taking random data samples to produce images. But, the discriminator verifies data with the help of downsampling techniques such as max-pooling.
Both networks try optimizing an opposing and different loss or objective function in an adversarial game. Their losses enable them to push against one another even harder.
Types of GANs
Generative adversarial networks are of different types based on implementation. Here are the main GAN types used actively:
- Conditional GAN (CGAN): It’s a deep learning technique that involves specific conditional parameters to help differentiate between real and fake data. It also includes an additional parameter – “y” in the generator phase to produce corresponding data. Also, labels are added to this input and are fed to the discriminator to enable it to verify whether the data it’s authentic or fake.
- Vanilla GAN: It’s a simple GAN type where the discriminator and generator are simpler and multi-layered perceptrons. Its algorithms are simple, optimizing the math equation with the help of stochastic gradient descent.
- Deep convolutional GAN (DCGAN): It’s popular and considered the most successful GAN implementation. DCGAN is made up of ConvNets rather than multi-layer perceptrons. These ConvNets are applied without using techniques like max-pooling or fully connecting the layers.
- Super Resolution GAN (SRGAN): It’s a GAN implementation that uses a deep neural network alongside an adversarial network to help produce images of high quality. SRGAN is especially useful in efficiently upscaling original low-resolution images so that their details are enhanced, and errors are minimized.
- Laplacian Pyramid GAN (LAPGAN): It’s an invertible and linear representation that includes multiple band-pass images that are placed eight spaces apart with low-frequency residues. LAPGAN utilizes several discriminator and generator networks and multiple Laplacian Pyramid levels.
LAPGAN is used widely as it produces top-notch image quality. These images are down-sampled at each pyramid layer first and then up-scaled at every layer, where ideas are given some noise until they gain the original size.
Applications of GANs
Generative adversarial networks are used in various fields, such as:
GANs can provide an accurate and faster way to model high-energy jet formation and conduct physics experiments. These networks can also be trained to estimate bottlenecks in performing simulations for particle physics that consume heavy resources.
GANs can accelerate simulation and improve simulation fidelity. In addition, GANs can help study dark matter by simulating gravitational lensing and enhancing astronomical images.
The world of video gaming has also leveraged GANs to up-scale low-resolution 2-dimensional data used in older video games. It will help you recreate such data into 4k or even higher resolutions through image training. Next, you can downsample the data or images to make them suitable for the video game’s real resolution.
Provide proper training to your GAN models. They can offer sharper and clearer 2D images of impressive quality compared to the native data while retaining the real image’s details, such as colors.
Video games that have leveraged GANs include Resident Evil Remake, Final Fantasy VIII and IX, and more.
Art and Fashion
You can use GANs to generate art, such as creating images of individuals that never have existed, in-paint photographs, producing pictures of unreal fashion models, and many more. It’s also used in drawings generating virtual shadows and sketches.
Using GANs to create and produce your ads will save time and resources. As seen above, if you want to sell your jewelry, you can create an imaginary model looking like an actual human with the help of GAN.
This way, you can make the model wear your jewelry and showcase them to your customers. It will save you from hiring a model and paying for it. You can even eliminate the extra expenses such as paying for transportation, renting a studio, arranging photographers, makeup artists, etc.
This will significantly help if you are a growing business and could not afford to hire a model or house an infrastructure for ad shoots.
You can create audio files from a set of audio clips with the help of GANs. This is also known as generative audio. Please don’t confuse this with Amazon Alexa, Apple Siri, or other AI voices where voice fragments are stitched well and produced on demand.
Instead, generative audio uses neural networks to study an audio source’s statistical properties. Next, it directly reproduces those properties in a given context. Here, modeling represents the way speech changes after each millisecond.
Advanced transfer learning studies utilize GANs in aligning the latest feature spaces like deep reinforcement learning. For this, the source’s embeddings and the aimed task are fed to the discriminator to determine the context. Next, the result is back propagated via the encoder. This way, the model keeps on learning.
Other applications of GANs include:
- Diagnosis of total or partial vision loss by detecting glaucomatous images
- Visualize industrial design, interior design, clothing items, shoes, bags, and more
- reconstruct forensic facial features of a diseased person
- create 3D models of an item from an image, produce new objects as a 3D point cloud, model motion patterns in a video
- Showcase the appearance of a person with changing age
- Data augmentation such as enhancing the DNN classifier
- Inpaint a missing feature in a map, improve street views, transfer mapping styles, and more
- Produce images, replace an image search system, etc.
- Generate control inputs to a non-linear dynamical system by using a GAN variation
- Analyze the effects of climatic change on a house
- Create a person’s face by taking their voice as the input
- Create new molecules for several protein targets in cancer, fibrosis, and inflammation
- Animate gifs from a regular image
There are many more applications of GANs in various areas, and their usage is expanding. However, there are multiple instances of its misuse as well. GAN-based human images have been used for sinister use cases such as producing fake videos and pictures.
GANs can also be used to create realistic photos and profiles of people on social media that never have existed on earth. Other concerning misuses of GNAs are the creation of fake pornography with no consent from featured individuals, distribution of counterfeit videos of political candidates, and so on.
Although GNAs can be a boon in many fields, their misuse can also be disastrous. Hence, proper guidelines must be enforced for its use.
GANs are one remarkable example of modern technology. It provides a unique and better way of generating data and aiding in functions like visual diagnosis, image synthesis, research, data augmentation, arts and science, and many more.
You may also be interested in Low code and no code machine learning platforms for building innovative applications.