Ideogram AI, a startup founded by former Google engineers along with members from prestigious institutions such as UC Berkeley, Carnegie Mellon University, and the University of Toronto, has announced the release of the first full version of its eponymous image generator.
“We are excited to release Ideogram 1.0, our most advanced text-to-image conversion model to date,” Ideogram AI said in an official blog post. “Trained from scratch, like all Ideogram models, Ideogram 1.0 delivers cutting-edge features: artistic text rendering, unprecedented photorealism, fast compliance, and a new Magic Prompt that helps you create detailed prompts for beautiful, creative images. It’s a function.”
This release comes with news of an $80 million Series A fundraising led by Andreessen Horowitz along with Redpoint Ventures, Pear VC, and SV Angel.
We’re excited to share that Ideogram has raised $80 million in Series A funding to help people become more creative through generative AI! Thanks to @a16z to lead the round @Red Point, @pearvc, @IndexVentures, @svangel For your participation!
Ideograms 1.0 will soon be significantly improved!
– Mohammad Norouzi (@mo_norouzi) February 29, 2024
decryption We were able to test the model, and Ideogram AI’s claims were not greatly exaggerated. You can see a side-by-side comparison below. Ideogram version 1 is a clear improvement over previous versions v0.1 and v0.2. Fast compliance, image quality, and text generation capabilities are outstanding.
The model is not open source, so visibility into the pipeline is limited and there are no research papers to evaluate. However, the results obtained with this model demonstrate that it has the potential to be the best model currently available, at least until Stable Diffusion 3 is released publicly.
The new model is the most capable image generator in terms of text features, producing longer text strings with fewer errors than either Dall-E 3 or MidJourney. The current free tier also gives it an edge over competitors like Dall-E 3 and MidJourney. MidJourney does not have a free tier. Microsoft Copilot also uses Dall-E 3, but only produces square 1:1 images, while Ideogram supports a wider set of aspect ratios.
Ideogram also offers two paid plans: $7 and $15 per month. This gives you access to over 400 households per day and also provides other benefits such as an image editor, better quality downloads, img2img (allows you to edit or transform existing images) and private households. . All lower layers have their requested images publicly visible.
Introducing Ideogram 1.0: The most advanced text-to-image model now available at https://t.co/Xtv2rRbQXI!
It offers state-of-the-art text rendering, unprecedented photorealism, superior prompt compliance, and a new feature called Magic Prompt to help you with your prompts. pic.twitter.com/VOjjulOAJU
— Ideogram (@ideogram_ai) February 28, 2024
Ideogram understands long prompts and uses Stable Diffusion 3 to outperform all other image generators in the field.
One of Ideogram’s standout features is ‘Prompt Magic’, which can be turned on and off. This feature analyzes and enriches prompts to produce better quality images, essentially giving your model the ability to understand natural language like Dall-E 3. However, this feature is optional, which makes Ideogram more versatile. Because it is always on in ChatGPT Plus, it sometimes produces inaccurate results.
Finally, Ideogram is less aggressively censored than MidJourney and Dall-E 3 and can now generate images of famous people, company logos, and art styles. It doesn’t go completely NSFW, but it’s more discrete when it comes to prompt censorship.
And early testers seem to prefer Ideogram over other models. “Using the same evaluation protocol as DALL·E 3, we found that human evaluators preferred Ideogram 1.0 over DALL·E 3 and Midjourney V6 in terms of rapid alignment, image consistency, overall likeability, and text rendering quality.” said the startup.
Side by Side Comparison: Ideogram vs MidJourney vs Dall-E 3
decryption We tested Ideogram’s features and compared it to its top competitors, MidJourney and Dall-E 3. Stable Diffusion 3 and Google’s high-end ImageFX are not evaluated here because SD3 is not yet released and ImageFX is not widely used.
Create long text strings
Prompt: Futuristic Android in Cyberpunk City with sign saying “Don’t be late for the AI trend: Emerge by Decrypt”
Ideogram AI was able to represent both the requested aesthetic and text. However, there was a typo that produced “thee” instead of “the”.
MidJourney was completely unable to produce coherent text and focused on creating detailed futuristic androids. This is the main theme of the entire composition. The city is not cyberpunk at all.
Dall-E 3 is located in the middle. Futuristic robots could be created and the city was cyberpunk, but the signs didn’t say the word “Emerge”.
Interestingly, Ideogram understood that the robot was in the city and associated with signs, whereas Dall-E assumed that signs were part of the urban landscape.
Long prompt and space functions
Prompt: A surreal and interesting scene with a cat sitting on top of a television next to a sign that says “Emerge.” In the background stands a futuristic android on one side and an astronaut on the other. The walls of the room are decorated with impressive images of molecules and DNA chains.
Ideograms is by far the best overall generator. I understood every part of the prompt, generated text without typos, and understood the location of each element, including a cat on top of a TV and a sign next to it, and an android and an astronaut on either side. The background should have molecules and DNA chains.
MidJourney’s aesthetic was not surreal, but rather surreal. It generated the word ‘Emerge’ but displayed it on TV and did not generate any indication. The cat is not on the TV, but next to it. It didn’t create an android, failed to follow the prompt for the background, and instead created an android that better fit the aesthetic of the composition, giving more importance to the subject (the cat) than the overall scene.
Dall-E 3 maintained its distinctive cartoon style and was completely unable to follow the prompts. It has better spatial understanding and faster compliance than MidJourney, but significantly less than Ideogram. However, it seems to lose in terms of style. I created a cat above the TV, but failed to create an Emerge sign next to the cat. I didn’t create an android and didn’t follow the prompts when creating the background.
censorship
Prompt: A hot, sexy girl.
Prompts will not contain language that could be construed as hate speech or slurs, especially if it is sexual. After all, a “hot, sexy girl” can be fully clothed and not be aggressively sexualized.
Ideogram AI understood the prompt and created an image that matched the instructions. However, ideograms have AI moderators that are triggered when more explicit words are used (e.g. slang for genitals or tags like nude, naked, etc.) which immediately leads to a generation of censorship.
Meanwhile, both MidJourney and Dall-E 3 failed to generate images and banned words despite not making it to the NSFW generation.
Ideographs appear to be more subject to censorship and allow images generated (NSFW or otherwise questionable) to be viewed before being pulled from the application.
Celebrities and copyrighted images
Prompt: A happy Joe Biden and Vladimir Putin hold hands and stand in front of a wall with the word “Decrypt” written on it.
Ideographic AI generated the images, the text is accurate, the scenarios are realistic, and the characters are easy to identify (even if not 100% accurate).
Although Dall-E 3 created the image, Biden cannot be easily identified, and Trump can only be identified because of his unique hairstyle. The text is incorrect and the scenery is not realistic but rather cartoonish.
MidJourney declined to produce the image.
conclusion
Free and widely available, Ideogram may be the best image generator on the market today. They have excellent natural language understanding, excellent spatial skills, and quick adaptability. It is also the best text generator currently available.
If aesthetics are your most important consideration (to the point where adhesion and text are less important), MidJourney may remain a solid contender for certain use cases. Although it’s not particularly powerful and censorship-heavy, Dall-E 3 may still make sense as part of a ChatGPT Plus subscription.
Ideogram AI is currently at the top of our image generator toolbox.
Edited by Ryan Ozawa.