I examined the most well-liked AI picture turbines to find their biggest strengths and weaknesses.
At Ahrefs, we’ve got a crew of extraordinarily expert (and really human) designers, however not everybody has that luxurious. I needed to know: are AI picture turbines helpful for spinning up fast social media posts, creating weblog publish graphics, or saving a couple of bucks on costly inventory pictures?
So I examined out the most well-liked cloud-based text-to-image instruments: DALL-E 3 (accessible in ChatGPT), Midjourney, Canva’s Magic Media, Adobe Firefly, and the very new Gemini for Workspace.
All these instruments generate photos in a couple of clicks, while not having to do something sophisticated like coaching customized fashions or operating applications domestically in your laptop.
The very best AI picture generator is, for my part, Adobe Firefly. All of the fashions had their very own strengths, however Firefly supplied most management over picture era and picture enhancing.
Listed here are the professionals and cons (and lots of, many photos) sharing my expertise with every.
AI picture generator | Greatest for… | Pricing |
---|---|---|
Adobe Firefly | Greatest for optimum management over photos | 25 free credit monthly; $4.99 for 100 credit |
Midjourney | Greatest for lovely photos | From $10/m for 200 generations |
DALL-E 3 / ChatGPT | Greatest for information visualization | 2 free photos per day on the Free plan; full entry begins at $20/m on the Plus plan |
Canva Magic Media | Greatest for producing vector photos | 50 photos accessible for Canva Free customers; 500 photos monthly for paid customers (from $14.99/m) |
Gemini for Workspace | Greatest for fast concepting | Accessible as a Google Workspace add-on from $20/m |
I needed to check every AI picture generator in a variety of various situations, so I created tons of prompts throughout three essential classes:
- Inventory pictures (e.g. “Inventory picture of an attractive minimalist house workplace with a view of timber outdoors”)
- Graphics and illustrations (e.g. “A cartoon character with ginger hair carrying an enormous golden key to signify ‘key phrase analysis’”)
- Knowledge visualizations (e.g “Graph of web site site visitors information: January 946, February 1071, March…”)
I examined completely different ranges of immediate complexity, however stored my prompts usually easy. The entire level of those text-to-image instruments is to explain one thing that you really want and have the AI create it for you, so I purposefully prevented PhD-level immediate engineering or skilled design lingo.
Right here’s a photograph of me operating these exams:
I then judged every AI picture generator’s output throughout a couple of key dimensions:
- Accuracy: how effectively did the picture generator comply with my route?
- Ease of enhancing: how straightforward was it to edit and refine the output?
- Uncanniness: did the output look bizarre or clearly AI-generated?
- Legibility of textual content: how effectively did the mannequin deal with textual content era?
- Consistency: might I reproduce related photos on a number of events?
- Usefulness: might I truly use the output in actual life?
Listed here are my findings.
Adobe Firefly has—by far—the very best enhancing controls of the picture turbines I examined. This isn’t stunning, contemplating that Adobe makes Photoshop, and Illustrator, and Lightroom, and dozens of different market-leading design instruments.
Right here’s an instance. The immediate “A cartoon character with ginger hair carrying an enormous golden key to signify ‘key phrase analysis’” generated a sequence of okay-but-not-great photos. However in a couple of clicks, I used to be in a position to repair the largest issues and dramatically enhance the end result.
Right here’s the earlier than:
In a couple of minutes utilizing Firefly, I used to be in a position to:
- Resize the side ratio from 1:1 to 4:3 utilizing generative fill.
- Repair a lacking hand by prompting Firefly to regenerate that particular portion of the picture.
- Upscale the small, low-quality picture to a way more helpful 2k decision.
And right here’s the after:
Adobe Firefly additionally offers you a ton of management over the image-generation course of. A giant plus: you should utilize present photos as type and composition references, making it a lot simpler to generate a sequence of photos with a cohesive type.
Right here’s the immediate “A cartoon character with ginger hair carrying an enormous magnifying glass to signify ‘competitor analysis’”, however utilizing my earlier picture era as a reference:
The type is barely completely different, however they really feel recognisably related. It’s also possible to specify explicit reference kinds, compositions, content material sorts (like artwork versus picture), and even results (color, lighting, bokeh, digicam angles, you identify it).
Which means you should utilize the identical immediate however get very completely different outcomes. Right here’s the end result for the immediate “Lovely minimalist house workplace with a view of timber outdoors” once I’ve specified golden hour lighting and heat tones:
And right here I’ve used the identical immediate however requested for low lighting and cool tones for a really completely different vibe:
And since Firefly is made by Adobe, you possibly can import your generated photos into different Adobe merchandise so as to add textual content or edit additional. Fairly helpful.
Midjourney is gorgeous. I’ve been a paying Midjourney buyer for 3 years for the easy motive that the whole lot it generates is attractive, and extra aesthetically pleasing than some other AI mannequin I’ve examined.
I exploit Midjourney as an example my artistic writing, and it excels at fantasy-style illustration. Right here’s a picture I created for certainly one of my novels, with no enhancing or manipulation:
It’s additionally fairly helpful for photorealism too. Right here’s the immediate “Inventory picture of an attractive minimalist house workplace with a view of timber outdoors”:
There are a few AI-isms (what number of wheels does that chair have?!), however I need to forgive them as a result of the picture is so rattling lovely.
Right here’s “Inventory picture of a considerate particular person in a gathering at a software program firm”, that includes an AI-generated man so good-looking I didn’t need to look in a mirror for the remainder of the day:
Even Midjourney cartoon illustrations look elegant, and virtually adequate to be plucked from the frames of a Pixar movie:
Midjourney does have weaknesses. It categorically can not do information visualization. Feed it even easy information and it’ll generate nonsense (however it’s going to at the very least be lovely nonsense):
Midjourney’s enhancing workflows are significantly better than they was, however nonetheless not very subtle. In addition to producing 4 photos for each immediate, you’ve got the choice to:
- Differ any single picture, both robust or delicate (mainly regenerate a picture that’s similar to the earlier).
- Upscale photos you want to larger decision.
- Take away elements of the picture (however not specify what you’d like to exchange it with).
- Change the side ratio (sq., 4:3, 16:9, and so on).
Right here’s an instance of various a picture. There are small, delicate variations between every picture, just like the variety of wheels on the chair—useful for minimizing any bizarre AI-isms in photos you like:
These choices are nowhere close to as exact as Adobe Firefly’s enhancing workflow, however given Midjourney’s capacity to make usually lovely photos from easy, single prompts, this workflow creates surprisingly helpful photos.
(And as a last bonus, you not must depend on a janky Discord server to generate photos—Midjourney’s net app works very effectively.)
Given the recognition of ChatGPT, DALL-E 3—the picture era mannequin supplied as a part of ChatGPT—will probably be most individuals’s first introduction to AI picture turbines. That’s a disgrace, as a result of it’s one of many worst.
To make this level, right here’s what occurred once I requested for a “Inventory picture of somebody engaged on their laptop computer in a New York espresso store”:
That is fairly consultant of DALL-E 3: most of its photos feel and look like they’re AI-generated.
Search for a second and also you’ll spot nonsense textual content, furnishings mixing into the background, a bizarre uncanny-valley glow to the primary character, straight strains which might be by no means straight… and most of ChatGPT’s photos undergo from the identical points.
Right here’s ChatGPT attempting to gaslight me into believing that this can be a {photograph} of a house workplace (the timber appear like a freaking pointillism portray):
These points are at the very least much less apparent in cartoon imagery. Right here’s our character holding a key once more:
Not dangerous, regardless of a few AI-isms, just like the double-ended key and bizarre summary backpack appeal. Sadly, I couldn’t take away these little quirks, as a result of though ChatGPT lately added the flexibility to spotlight elements of the picture to selectively edit, this characteristic was tremendous unreliable once I examined it.
On one event, ChatGPT even determined that, truly, no, it didn’t need me to do any picture enhancing:
With out a lot management over picture era or enhancing, DALL-E 3 is a little bit of a crapshoot, and it’s just about not possible to hold constant kinds throughout photos.
Once I tried to make a brand new picture with the identical cartoon character, it modified type radically:
You’ll be able to’t simply upscale your photos both, and once I requested ChatGPT to resize a YouTube thumbnail to 16:9 decision, it determined to write a Python script to stretch the picture to panorama format.
Which, err… didn’t look good:
Once I tried to refine the immediate to mirror Ahrefs’ model tips, it gave me a lecture on designing thumbnails, and didn’t truly make an picture.
Producing photos with ChatGPT jogs my memory enjoying the online game DOOM on a calculator. It’d technically be doable, however you in all probability shouldn’t do it.
ChatGPT had one huge redeeming advantage, the place its penchant for Python was extraordinarily helpful: information visualization. It was the solely AI picture generator able to truly turning an inventory of information factors into an correct graph:
And it could actually deal with extra advanced information visualisations too:
This can be a completely different kind of “picture era”, however for somebody like me who wrangles information each day, extremely helpful, and a characteristic I exploit all of the time.
Canva’s Magic Media is an AI picture generator embedded straight inside the primary Canva app. To get began, you’re supplied a alternative of picture, graphic, or video.
It handles inventory pictures fairly effectively: right here’s our immediate for an attractive house workplace:
You’ll be able to decide certainly one of round two dozen particular kinds to emulate, and pre-set the side ratio of the picture. Right here’s our New York espresso store with the Moody type utilized:
Right here, we start to see Magic Media’s greatest weak point creeping in: uncanny valley photorealism.
Right here’s one other inventory picture try that virtually seems to be good… aside from the deformed palms, complicated arm physics, and background ensemble of melty-faced monsters:
It’s helpful for producing vector artwork too, and the photographs might be exported straight as PNGs with no background, however the photos themselves are a bit amateurish.
Right here’s our key-holding cartoon determine once more, this time holding a superbly easy key in a single hand and a smaller, seemingly melted key within the different:
Right here’s the terrifying results of utilizing the identical immediate with the 3D Chrome type utilized:
As a result of Magic Media is embedded in Canva, it’s extremely straightforward so as to add textual content, resize the completed picture, or add results to the generated photos. That’s a giant plus, however for my part, not sufficient to compensate for amateurish high quality of the picture era.
Right here’s an instance of how briskly AI instruments are creating. As I used to be penning this weblog publish, Google added AI picture era capabilities straight into Google Docs. Now, you should utilize the @picture command and choose “Assist me create an picture.”
It’s fairly easy. You need to use certainly one of three side ratios and specify certainly one of six pre-determined kinds, and Google returns 4 photos to select from.
Right here’s an honest little picture for the immediate “A cartoon character with ginger hair carrying an enormous magnifying glass”:
And right here’s “A cartoon character with ginger hair carrying an enormous golden key” with the Watercolor type utilized:
Though these cartoons are respectable, Gemini appears to have a particular ability: pictures. It rendered lovely scenes for my house workplace immediate with the Images type chosen:
And Gemini for Workspace appears to deal with pictures of individuals even higher. Right here’s a very practical rendition of “Inventory picture of somebody engaged on their laptop computer in a New York espresso store”—even all the way down to the Apple emblem on the laptop computer:
And right here’s “Photograph of a girl giving a chat on stage”. I can not inform this picture was AI-generated:
These photos are small and low-resolution, however as a giant plus, you possibly can generate them within the circulation of labor—fairly helpful for including in a fast mock-up or placeholder to cross on to your design crew or enhance sooner or later.
That is clearly a really new characteristic (once I examined it, picture era failed for me about 70% of the time), however I’d anticipate it to enhance fairly shortly and develop into a significant contender for finest AI picture generator.
Remaining ideas
AI text-to-image turbines are at their finest if you ask for easy designs and don’t have a very robust opinion of the precise picture you need to see. If you’d like a fast inventory picture or weblog illustration, and don’t have to fret about pesky model tips, most of those instruments are as much as the duty (other than possibly ChatGPT… sorry).
However the extra particular element you need from the picture—phrases, numbers, explicit model tips—and the stronger your opinion about what you need the ultimate picture to appear like, the extra irritating these instruments develop into.
I feel Adobe Firefly is the very best AI picture generator as a result of it sits on the intersection between generative AI and conventional design instruments. It pairs all of the artistic advantages of AI with the enhancing management of Photoshop or Illustrator. Which means it could actually deal with sophisticated design workflows, like making a sequence of cohesive characters, or making use of explicit kinds or compositions. In case you’re severe about utilizing AI picture turbines to your model or enterprise, I’d begin with Firefly.
I’ll preserve updating this publish as new AI picture turbines are launched and present instruments proceed to get up to date. Need to ask me to evaluation a software for you? Let me know on LinkedIn.