OpenAI, known for its artificial intelligence research, has achieved a new success with the artificial intelligence DALL-E 2, which can produce stunning images from text descriptions. Developed on the first version released at the beginning of last year, DALL-E 2 has become the focus of attention with its ability to interpret human imagination, thanks to advanced deep learning techniques and artificial neural networks. Let’s take a closer look at DALL-E 2 with its innovations and highlights.
Born in the OpenAI lab in San Francisco, DALL-E 2’s greatest merit lies in its use of the machine learning model known as Generative Adversarial Network (GAN) . Translated to American as the Contested Generator Network in terms of the way it works, this model has witnessed tremendous developments in recent years. We can cite the well-known Deepfake as an example of these developments. GAN now powers the DALL-E 2, paving the way for creating stunning visuals that match a text description. So, let’s take a brief look at what Contested Generative Networks are and how they work.
GAN designed in 2014 by Ian Goodfellow, who works as a machine learning manager in Apple’s special projects group today, is basically two artificial nerves named generator and discriminator . network is based on competition with each other. Let’s assume that we want to have the GAN generate dog images by considering this system through an example. Since at first we had to teach the AI what dogs are, we need to provide it with lots of real dog images. Then, the producer network within GAN can start its visual production as it learns the physical structure of the dogs. Each generated image is then transferred to the discriminating network, where the real and the fake images produced by the artificial neural network are compared and distinguished. As these processes, which take place in a very short time, continue, the competition between the producer and the discriminator network heats up, and these two artificial neural networks begin to develop in terms of capability. That is, the discriminative network more and more accurately identifies fake images, while the generative network, in turn, produces more realistic fake images.
Returning to DALL-E, it is worth noting that not only GAN technology is used in this project. The real science behind DALL-E 2 stands out as two advanced deep learning techniques, CLIP and Diffusion models, which have been mentioned a lot in the last few years. With the support of these two deep learning techniques, DALL-E 2 is far ahead of its competitors, thanks to its preservation of semantic consistency in the images it creates. For example, the rendering of the above images by DALL-E 2 with the description “An astronaut on a horse” showcases artificial intelligence’s command of language. In addition, it is interesting that even the concepts that indicate the style of the visuals such as “charcoal” and “photorealistic” at the end of the explanations are taken into account. You can view other images created with OpenAI’s own tesdollarser in the interactive content here.
DALL-E 2, which has become completely different compared to its first version, is not open to everyone for now, but OpenAI has opened applications for those who want to have a chance to use it. In the coming days, those who register on the waiting list will have the right to try this artificial intelligence. Do not forget to share your thoughts with us in the comments.