Launching the Artificial Evaluation Text to Image Leaderboard & Arena

In two short years because the advent of diffusion-based image generators, AI image models have achieved near-photographic quality. How do these models compare? Are the open-source alternatives on par with their proprietary counterparts?

The Artificial Evaluation Text to Image Leaderboard goals to reply these questions with human preference based rankings. The ELO rating is informed by over 45,000 human image preferences collected within the Artificial Evaluation Image Arena. The leaderboard features the leading open-source and proprietary image models : the most recent versions of Midjourney, OpenAI’s DALL·E, Stable Diffusion, Playground and more.

Check-out the leaderboard here: https://huggingface.co/spaces/ArtificialAnalysis/Text-to-Image-Leaderboard

You too can participate within the Text to Image Arena, and get your personalized model rating after 30 votes!

Methodology

Comparing the standard of image models has traditionally been even more difficult than evaluations in other AI modalities comparable to language models, largely attributable to the inherent variability in people’s preferences for a way images should look. Early objective metrics have given solution to expensive human preference studies as image models approach very high accuracy. Our Image Arena represents a crowdsourcing approach to gathering human preference data at scale, enabling comparison between key models for the primary time.

We calculate an ELO rating for every model via a regression of all preferences, much like Chatbot Arena. Participants are presented with a prompt and two images, and are asked select the image that best reflects the prompt. To make sure the evaluation reflects a wide-range of use-cases we generate >700 images for every model. Prompts span diverse styles and categories including human portraits, groups of individuals, animals, nature, art and more.

Early Insights From the Results 👀

While proprietary models lead, open source is increasingly competitive: Proprietary models including Midjourney, Stable Diffusion 3 and DALL·E 3 HD lead the leaderboard. Nonetheless, quite a few open-source models, currently led by Playground AI v2.5, are gaining ground and surpass even OpenAI’s DALL·E 3.
The space is rapidly advancing: The landscape of image generation models is rapidly evolving. Just last 12 months, DALL·E 2 was a transparent leader within the space. Now, DALL·E 2 is chosen in the world lower than 25% of the time and is amongst the bottom ranked models.
Stable Diffusion 3 Medium being open sourced can have a big effect on the community: Stable Diffusion 3 is a contender to the highest position on the present leaderboard and Stability AI’s CTO recently announced during a presentation with AMD that Stable Diffusion 3 Medium can be open sourced June 12. Stable Diffusion 3 Medium may offer lower quality performance in comparison with the Stable Diffusion 3 model served by Stability AI currently (presumably the full-size variant), but the brand new model could also be a serious boost to the open source community. As we’ve got seen with Stable Diffusion 1.5 and SDXL, it is probably going we’ll see many advantageous tuned versions released by the community.

How one can contribute or get in contact

To see the leaderboard, try the space on Hugging Face here: https://huggingface.co/spaces/ArtificialAnalysis/Text-to-Image-Leaderboard

To take part in the rating and contribute your preferences, select the ‘Image Arena’ tab and select the image which you think best represents the prompt. After 30 images, select the ‘Personal Leaderboard’ tab to see your individual personalized rating of image models based in your selections.

For updates, please follow us on Twitter and LinkedIn. (We also compare the speed and pricing of Text to Image model API endpoints on our website at https://artificialanalysis.ai/text-to-image).

We welcome all feedback! We’re available via message on Twitter, in addition to on **our website** via our contact form.

Other Image Model Quality Initiatives

The Artificial Evaluation Text to Image leaderboard shouldn’t be the one quality image rating or crowdsourced preference initiative. We built our leaderboard to concentrate on covering each proprietary and open source models to present a full picture of how leading Text to Image models compare.

Take a look at the next for other great initiatives:

Source link

Launching the Artificial Evaluation Text to Image Leaderboard & Arena

Methodology

Early Insights From the Results 👀

How one can contribute or get in contact

Other Image Model Quality Initiatives

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

RAG with Hybrid Search: How Does Keyword Search Work?

5 Ways to Implement Variable Discretization

Stop Tuning Hyperparameters. Start Tuning Your Problem.

Tuning Flash Attention for Peak Performance in NVIDIA CUDA Tile

Bridging the operational AI gap

Launching the Artificial Evaluation Text to Image Leaderboard & Arena

Methodology

Early Insights From the Results 👀

How one can contribute or get in contact

Other Image Model Quality Initiatives

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.