Search in Co-Wiki

Imagen (text-to-image model)

game-theory 490 tokens 1 outbound links

Imagen (text-to-image model)

The original version of the model was first discussed in a paper from May 2022. The tool produces high-quality images and is available to all users with a Google account through services including Gemini, ImageFX, and Vertex AI. The standout feature was text and logo generation. Imagen 3 was released in August 2024. Google claims that the newest version provides better detail and lighting on generated images. On 20 May 2025 at Google I/O 2025 the company released an improved model, Imagen 4.

Technology Imagen uses two key technologies. The first is the use of transformer-based large language models, notably T5, to understand text and subsequently encode text for image synthesis. The second is the use of cascaded diffusion models providing high-fidelity image generation. Imagen generates image in three stages, starting from a base of 64x64, then upsampled to 256x256 and 1024x1024.

Capabilities Imagen can generate photorealistic images from text prompts. It can also create various styles, such as cinematic, 35mm film, illustration, and surreal. Like most text-to-image generative AI models, Imagen has difficulty rendering human fingers, text, ambigrams and other forms of typography.

The model can generate images in five aspect ratios, namely 9:16, 3:4, 1:1, 4:3, and 16:9. Imagen can also refine already generated images by editing existing text prompts.

See also * Artificial intelligence art * Computer art * Generative art * DALL-E * Midjourney * Recraft * Stable Diffusion

References <references />

External links * [Imagen website](https://deepmind.google/models/imagen/)