'Stable Diffusion 3' revealed…”Introduction of transformer architecture just like Sora”

Artificial Intelligence

'Stable Diffusion 3' revealed…”Introduction of transformer architecture just like Sora”

admin

February 24, 2024

'Stable Diffusion 3' revealed…”Introduction of transformer architecture just like Sora”

An epic animation work through which a wizard on top of a mountain at night casts a cosmic spell at midnight sky with the words 'Stable Diffusion 3' fabricated from colourful energy (Photo = Stability AI)

Stability AI has unveiled its next-generation image generation artificial intelligence (AI) model. It’s characterised by the introduction of a 'Diffusion Transformer' architecture just like the video creation AI 'Sora' recently released by OpenAI.

Enterprise Beat reported on the twenty second (local time) that Stability AI has released 'Stable Diffusion 3', a latest architecture-based next-generation image generation AI model, as open source. It’s currently within the preview stage and a waiting list is being accepted.

Based on this, Stable Diffusion 3 has improved quality and accuracy in comparison with ‘SDXL’, which was released in July of last 12 months. We offer models of assorted sizes, from 800 million to eight billion parameters, so that they can run on a wide range of devices. A nice adjustment function can be supported so that you would be able to create the image you would like.

As well as, Stable Diffusion 3 is built based on a latest style of architecture, 'diffusion transformer', just like OpenAI's video generation AI model 'Sora'.

Diffusion Transformer is a latest architecture that replaces the U-Net backbone utilized in the diffusion model, a standard image generation AI model, with a transformer that’s the idea of the text generation model. Diffusion transformer architectures can use compute more efficiently and produce higher quality images than other types of diffusion imaging.

As well as, through the use of 'flow matching', a technology to create AI models, you possibly can quickly learn the generated model. The reason is that this permits for straightforward generalization by providing the model with essentially the most optimal path to select from when learning from unstructured data, especially various images.

Particularly, typography, which accurately generates words from generated images and spells them higher, has been greatly improved.

“That is due to the diffusion transformer architecture and extra text encoders,” said Emad Mushtaq, CEO of Stability AI. “Complete sentences are actually possible in addition to consistent styles.”

Moreover, Stability predicted that Stable Diffusion 3 could turn into the idea for brand new models akin to video creation and 3D image creation in the long run.

“We create open source models that could be used anywhere and adapted to any need,” said CEO Mushtaq. “Stable Diffusion 3 is a series of models of assorted sizes, supporting the event of the following generation of visual models, including video, 3D, and more. “I’ll do it,” he said.

Meanwhile, CEO Mustak praised OpenAI CEO Sam Altman a couple of days ago by saying, “You might be a magician” in response to a video created with 'Sora'.

Reporter Park Chan cpark@aitimes.com

LEAVE A REPLY Cancel reply