high-quality images from simple text

Midjourney

images from simple text


Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco–based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called prompts, similar to OpenAI's DALL-E and Stability AI's Stable Diffusion. It is one of the technologies of the AI boom.is currently in open beta, which it entered on July 12, 2022.[3] The Midjourney team is led by David Holz, who co-founded Leap Motion. Holz told The Register in August 2022 that the company was already profitable. Users create artwork with Midjourney using Discord bot commands.


Midjourney is an example of generative AI that can convert natural language prompts into images. It’s only one of many machine learning-based image generators that have emerged of late. Despite that, it has risen to become one of the biggest names in AI alongside DALL-E and Stable Diffusion.


With Midjourney, you can create high-quality images from simple text-based prompts. You don’t need any specialized hardware or software to use Midjourney either as it works entirely through the Discord chat app. The only downside? You’ll have to pay at least a little bit before you can start generating images. That’s unlike much of the competition, which generally provides at least a few image generations for free.


Midjourney runs on closed-source and proprietary code, so nobody outside the company knows how it works its magic. That said, we know enough about the underlying technology to offer a general explanation.


Midjourney relies on two relatively new machine learning technologies, namely large language models and diffusion models. You may already be familiar with the former if you’ve used generative AI chatbots like ChatGPT. A large language model first helps Midjourney understand the meaning of the words you type into your prompts. This is then converted into what is known as a vector, which you can imagine as a numerical version of your prompt. Finally, this vector helps guide another complex process known as diffusion.


Diffusion has only become popular within the past decade or so, which explains the sudden barrage of AI-generated art. In a diffusion model, you have a computer gradually add random noise to its training dataset of images. Over time, it learns how to recover the original image by reversing the noise. The idea is that with enough training, such a model can learn how to generate entirely brand-new images.


So what does it look like from the perspective of an AI image generator? When you enter a text prompt like “white cats set in a post-apocalyptic Times Square,” it starts off with a field of visual noise. You can think of this first step as equivalent to television static. The image doesn’t look like anything you’ve asked for at this point. However, a trained AI model then uses latent diffusion to subtract the noise in steps. Eventually, it will yield a picture that resembles objects and ideas in the real world.