The AI Image Generator Map
The category of "AI image generator" contains tools that are doing fundamentally different things. A diffusion model generating an image from a text prompt is not doing the same job as an upscaler, an in-painting editor, or a video generator. Here's how the space breaks down, with a card for each major type and the tools that best represent it.
Text-to-Image (Diffusion Models)
The main category. You provide a text prompt; the model iteratively denoises random pixels into a coherent image matching the description. The quality and character of the output depends heavily on training data and model architecture.
Stable Diffusion
Open-source. Runs locally. The model weights are publicly downloadable, which means it can be fine-tuned, modified, and extended by anyone. The trade-off: setup friction, hardware requirements, and a community ecosystem that assumes technical competence. The best option for anyone who wants control over what the model produces and doesn't want outputs stored on a third-party server.
Open source · LocalMidjourney
Web and Discord-based. The strongest default aesthetic of any text-to-image model — images that look "designed" rather than generated, with coherent lighting and composition. The downside is opacity: you can't fine-tune it, can't run it locally, and the prompt behavior is idiosyncratic in ways that take time to learn. Best for people who want good-looking results without much setup.
Hosted · SubscriptionDALL-E 3
OpenAI's model, integrated into ChatGPT and available via API. The distinguishing feature is prompt adherence: DALL-E 3 follows text prompts more literally than most alternatives, which is useful when you want something specific rather than something that looks good. The style is less distinctive than Midjourney and the content filters are more aggressive, but the instruction-following is the best in the category.
API · ChatGPT integratedFlux
Released by Black Forest Labs in 2024. Currently the strongest open-weight model in terms of image quality and prompt adherence — better than Stable Diffusion 1.5 or SDXL in most benchmarks. Available in several variants from a distilled fast version to a full pro version. The go-to for people who want local inference at a quality level that competes with hosted models.
Open weight · Local or APIImage-to-Image and Editing
Starting from an existing image and modifying it. The tools in this category accept image input and use it as a starting condition for generation, allowing targeted editing, style transfer, or structural modification.
In-painting
Mask a region of an image; the model fills it with generated content that blends with the surrounding pixels. Used for removing objects, extending images beyond their original borders (out-painting), or replacing specific elements while preserving the rest. Available in Stable Diffusion, DALL-E, and most major interfaces. The skill is in making the mask boundaries invisible.
Technique · Multiple toolsControlNet
An extension for Stable Diffusion that adds additional conditioning inputs: pose, depth map, edge detection, or sketch. Lets you generate images that follow a specific structural layout — useful when you need consistent composition across multiple images or want to generate variations of a specific pose or structure. No hosted equivalent of comparable flexibility exists; this is the main reason people run Stable Diffusion locally.
Extension · SD onlyAdobe Firefly
Adobe's model, integrated into Photoshop as the generative fill feature. The main selling point is provenance: Firefly was trained on licensed content only, which makes it the defensible option for commercial use cases where intellectual property origin matters. The image quality is competitive. The integration with existing Photoshop workflows is seamless in a way that standalone generators aren't. Requires Creative Cloud subscription.
Commercial use · CC subscriptionUpscaling and Enhancement
Taking a low-resolution or degraded image and generating higher-resolution detail. A different task from generation — these models are adding information rather than creating it, constrained by the source image.
Real-ESRGAN
Open-source upscaler trained specifically on degraded real-world images. Good at recovering detail from compressed photographs, old scans, and low-resolution screenshots. Multiple model variants for different use cases: anime-style images, photographs, and general content each have optimized versions. Runs locally. The standard recommendation for anyone who needs batch upscaling without a subscription.
Open source · LocalTopaz Gigapixel AI
The commercial option. Better results than Real-ESRGAN on portraits and faces, with a specific face recovery model. Expensive (one-time purchase, with major versions charged separately). The interface is simple and requires no technical knowledge. The recommended option for professional photography work where quality matters more than cost and you don't want to configure anything.
Commercial · One-time purchaseVideo Generation
Generating video from text prompts or image inputs. A rapidly moving category — these tools are improving faster than anything else in the space, and the limitations that existed six months ago may no longer apply.
Sora
OpenAI's video generator. Produces clips of up to 20 seconds with physically coherent motion and consistent objects across frames. The benchmark output quality as of early 2026 is still the best available for general video generation. Available via ChatGPT Plus subscription with a credit-based generation limit. The prompting skill is different from image generation — motion descriptions and camera movement language matter significantly.
Hosted · ChatGPT PlusRunway Gen-3
The professional-grade option. Strong temporal consistency, useful camera control, and an image-to-video mode that animates still images convincingly. Used in commercial production. The pricing model is credits-based and adds up quickly for heavy use. The API is available for integration into production pipelines. The go-to for anyone doing this professionally rather than experimentally.
Hosted · Credit-basedKling
Developed by Kuaishou, a Chinese video platform. Produces video with notably good physical simulation — fabric movement, water, hair. Free tier available with watermark. Competitive with Runway on motion quality and significantly cheaper for high-volume use. Access is web-based with an API in beta. Worth knowing about if you need generated video at scale and Runway's pricing doesn't make sense for your budget.
Hosted · Free tier available