Dall-E 2: why the AI ​​image generator is a revolutionary invention

Artificial intelligence has often taken on humans in creative battles. He can beat chess grandmasters, create symphoniespump heartfelt poems and now create detailed art from a simple, short, worded prompt.

The team at Open AI recently created powerful software capable of producing a wide range of images in seconds just from a string of words given to it.

This program is known as Dall-E 2 and was designed to revolutionize the way we use AI with images. We spoke to Aditya Rameshone of the main engineers of the Dall-E 2 to better understand what it does, its limits and the future it could hold.

What does the Dall-E 2 do?

In 2021, the AI ​​research development company OpenAI created a program known as “Dall-E” – a mixture of the names Salvador Dali and Wall-E. This software was able to take a written prompt and create a completely unique AI-generated image.

For example, “a fox in a tree” would bring up a photo of a fox sitting in a tree, or a search for “astronaut with a bagel in his hand” would show… well, you see where that leads.

© Open AI

© Open AI

While certainly impressive, images were often blurry, not quite sharp, and took a while to create. Now OpenAI has made vast improvements to the software, creating Dall-E 2 – a powerful new iteration that operates at a much higher level.

Besides a few other new features, the main difference with this second model is a huge improvement in image resolution, lower latencies (the time it takes to create the image) and a smarter algorithm to create the images .

The software does not just create a picture in a unique style, you can add different artistic techniques as you request, entering styles of drawing, oil painting, plasticine model, woolen knitting, drawn on a cave wall, or even like a 1960s movie poster.

“Dall-E is a very useful assistant that amplifies what a person can normally do, but it really depends on the creativity of the person using it. An artist or someone more creative can create really interesting things “, explains Ramesh.

A jack of all trades

In addition to the technology’s ability to produce images only on text prompts, Dall-E 2 has two other clever techniques: inpainting and variations. Both of these apps work similarly to the rest of Dall-E, just with a touch.

With inpainting, you can take an existing image and modify new features or modify parts of it. If you have an image of a living room, you can add a new carpet, a dog on the sofa, change the picture on the wall or even throw an elephant in the room… because it always goes well.

© Open AI

The before and after of OpenAI’s inpainting tool © OpenAI

Variations is another service that requires an existing image. Insert a photo, illustration or any other type of image and Dall-E’s variation tool will create hundreds of its own versions.

You could give him a picture of a Teletubby, and it will replicate it by creating similar versions. An old painting of a samurai will create similar images, you can even take a picture of some graffiti you see and get similar results.

You can also use this tool to combine two images into one weird collaboration. Mix a dragon and a corgi, or a rainbow and a pot to generate pots with color.

© Open AI

(Left) an original image (Right) its variation by Dall-E © OpenAI

Limits of Dall-E 2

While there’s no doubting how awesome this technology is, it’s not without its limits.

One problem you face is confusion of certain words or phrases. For example, when we typed in “a black hole inside a box”, Dall-E 2 returned a black hole inside a box, instead of the cosmic body we were looking for.

Dall-E 2 attempt for a black hole in a box © OpenAI

Dall-E 2 attempt for a black hole in a box © OpenAI

This can often happen when a word has multiple meanings, sentences can be misunderstood, or colloquialisms are used. This is to be expected from a artificial intelligence take the literal meaning of your words.

“Another thing to get used to with the system is how prompts and art styles work. When you type something, the initial image may not be correct, and while it technically matches your request, it doesn’t entirely match the feel or idea you had in mind. It can take some getting used to and some minor tweaking,” says Ramesh.

Another area where Dall-E can get confused is “variable mixing”. “If you ask the model to draw a red cube on top of a blue cube, it sometimes gets confused and does the opposite. We can fix this problem quite easily in future iterations of the system, I think,” says Ramesh.

The fight against stereotypes and the human contribution

Like all good things on the internet, it doesn’t take long for a key issue to arise: how can this technology be used unethically? And not to mention the added problem of the AI ​​story of learning certain rude behaviors from internet users.

Dall-E creates bowls of soup that are portals to another dimension © OpenAI

Dall-E creates bowls of soup that are portals to another dimension © OpenAI

When it comes to technology around AI image creation, it seems obvious that it could be manipulated in many ways: propaganda, fake news and manipulated images come to mind as the obvious ways.

To circumvent this, the OpenAI team behind Dall-E has implemented a security policy for all images on the platform that works in three stages. The first step is to filter out data that includes a major breach. This includes violence, sexual content and images that the team would consider inappropriate.

The second stage is a filter that looks for more subtle points that are difficult to detect. It can be political content or propaganda in one form or another. Finally, in its current form, every image produced by Dall-E is reviewed by a human, but this is not a viable long-term step as the product grows.

Despite using this policy, the team is clearly aware of the future of this product. They listed the risks and limitations of Dall-E, detailing the number of issues they could face.

This covers a lot of issues. For example, images can often show biases or stereotypes such as the use of the term marriage referring primarily to Western weddings. Or the search for a lawyer shows a majority of older white men, with nurses doing the same with women.

These aren’t new issues at all, and it’s something Google has been dealing with for years. Often, the generation of images can follow the prejudices seen in society.

© Open AI

Astronaut holding a flower © OpenAI

There are also ways to incentivize Dall-E to produce content that the term seeks to filter. While blood would trigger the violence filter, a user could type in “a pool of ketchup” or something similar in an attempt to bypass it.

In addition to the team’s security policy, they have a clear vision content policy users must respect.

Future of Dall-E

So the technology is there and clearly working well, but what’s next for the Dall-E 2 team? Right now, the software is slowly rolling out on a waiting list with no plans yet to open it to the general public.

By slowly releasing its product, the OpenAI Group can monitor its growth, develop its security procedures, and prepare its product for the millions of people who will soon be charging their orders.

“We want to get this research into people’s hands, but right now we’re just interested in getting feedback on how people are using the platform. We are certainly interested in a wider deployment of this technology, but we currently have no commercialization plan,” says Ramesh.

Read more:

Leave a Comment