For many of photography’s roughly 200-year history, altering a photograph convincingly required either a darkroom, some Photoshop expertise, or, at minimum, a gentle hand with scissors and glue. On Tuesday, OpenAI released a tool that reduces the method to typing a sentence.
It’s not the primary company to accomplish that. While OpenAI had a conversational image-editing model within the works since GPT-4o in 2024, Google beat OpenAI to market in March with a public prototype, then refined it to a preferred model called Nano Banana image model (and Nano Banana Pro). The enthusiastic response to Google’s image-editing model within the AI community got OpenAI’s attention.
OpenAI’s latest GPT Image 1.5 is an AI image synthesis model that reportedly generates images as much as 4 times faster than its predecessor and costs about 20 percent less through the API. The model rolled out to all ChatGPT users on Tuesday and represents one other step toward making photorealistic image manipulation an informal process that requires no particular visual skills.
The “Galactic Queen of the Universe” added to a photograph of a room with a settee using GPT Image 1.5 in ChatGPT.
GPT Image 1.5 is notable since it’s a “native multimodal” image model, meaning image generation happens contained in the same neural network that processes language prompts. (In contrast, DALL-E 3, an earlier OpenAI image generator previously built into ChatGPT, used a unique technique called diffusion to generate images.)
This newer style of model, which we covered in additional detail in March, treats images and text as the identical type of thing: chunks of information called “tokens” to be predicted, patterns to be accomplished. In case you upload a photograph of your dad and kind “put him in a tuxedo at a marriage,” the model processes your words and the image pixels in a unified space, then outputs latest pixels the identical way it might output the subsequent word in a sentence.
Using this method, GPT Image 1.5 can more easily alter visual reality than earlier AI image models, changing someone’s pose or position, or rendering a scene from a rather different angle, with various degrees of success. It may also remove objects, change visual styles, adjust clothing, and refine specific areas while preserving facial likeness across successive edits. You may converse with the AI model a couple of photograph, refining and revising, the identical way you may workshop a draft of an email in ChatGPT.
