Why image-generating AI makes spelling mistakes….”Even LLM can't actually read text”

Results of requesting the creation of a ‘sign with Hello’ written on ‘Dali’

The reason is that the rationale generative artificial intelligence (AI) is weak at ‘typography’, which creates words inside images, is since it doesn’t actually recognize letters. Large Language Models (LLM) may compose poetry and write papers, but they don’t understand the text itself.

On the twenty second (local time), TechCrunch quoted experts and provided an in depth explanation on the issue of frequent spelling mistakes in words in image-generating AI output.

Based on this, most current image generation AIs are likely to mess up spelling when generating letters in images. Although 'Idiogram', which is taken into account one of the best on this field, and more recently 'Dali' and 'Stable Diffusion' have solved many problems, they’re still not perfect.

For instance, 'Hello' is output as 'HeLIo' or 'HEELLLLOOOO'. This becomes a significant issue if the generated image is used for business purposes.

Experts indicate that this problem is on account of the operating principle of generative AI.

“Image generators are likely to perform well on artifacts like cars or human faces, but they perform poorly on small things like fingers or writing,” said co-founder Asmelash Hagur San.

Although the underlying techniques of image generators and text generators are different, each models have similar difficulties with details akin to spelling.

The image generator uses a 'diffusion model' to remove noise and reconstruct the image. Founder Hagu said, “Image generators mainly learn patterns that include a big portion of pixels,” and “the text contained throughout the image isn’t recognized as an important part.”

Within the case of text, a big language model (LLM) may appear to read and reply to prompts like a human, but in point of fact it simply uses mathematical principles to discover patterns and line up the most certainly ones. Because of this, LLM can be called ‘probabilistic parrot’.

Moreover, the AI model was created to breed something just like what it saw within the training data, but it surely doesn’t fundamentally know the foundations for spelling words or the variety of fingers.

“Until last 12 months, image generation models had trouble properly implementing finger counts, and in principle, text had the identical problem,” said Matthew Guzdial, an AI researcher and professor on the University of Alberta.

To unravel this problem, developers are improving the issue by augmenting datasets with training models specifically designed to show AI things like what hands appear to be. But experts didn't expect the spelling problem to be resolved so quickly.

“We will improve performance by training the model, but unfortunately English is basically complex,” Guzdial said. When you expand to other languages, the quantity of learning increases enormously.

Subsequently, some models, akin to Adobe Firefly, learn to not generate text in images in any respect. Once you specify typography, only white marks are output. Nevertheless, the reason is that these guardrails will be bypassed by simply entering enough information within the prompt.

He identified, “Plus, text is way more difficult. For this reason I can't even spell 'ChatGPT' properly.”

For instance, 'ASCII' art was used. ASCII art refers to imitating pictures or words with letters.

In reality, there are numerous videos posted on YouTube and X (Twitter) where attempts to create ASCII art using ChatGPT failed. That is evidence that ChatGPT doesn’t understand the word itself.

“LLM is predicated on a Transformer architecture that doesn't actually read text. “Once you enter a prompt, it’s converted to an encoding,” he said. “In other words, there’s an encoding for what ‘the’ means, but it surely doesn’t know what ‘T’ ‘H’ ‘E’ means.”

“The issue isn’t nearly spelling or the variety of fingers,” he added. “If developers work hard to unravel the finger problem, problems akin to guitar strings being output as 7 strings or the white and black parts of the piano keyboard being misplaced can be highlighted.” “He said.

He identified that although generative AI models are improving at an incredible rate, these problems will proceed to arise because realistically, technology capability cannot proceed to expand.

“Models like this are at all times creating little problems, it’s just that we’re particularly well-tuned to acknowledge a few of them,” Guzdial said.

Founder Hagu also said, “AI is advancing, there isn’t a doubt about it.” “But this technology is overhyped,” he emphasized.

Reporter Lim Da-jun ydj@aitimes.com

Why image-generating AI makes spelling mistakes….”Even LLM can't actually read text”

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Construct a Real-Time Visual Inspection Pipeline with NVIDIA TAO 6 and NVIDIA DeepStream 8

Custom Policy Enforcement with Reasoning: Faster, Safer AI Applications

Gemini 3 Pro Image model from Google DeepMind

OpenAI’s ‘Code Red’ scramble

With Nova Forge, AWS gives firms a path to construct foundation-class models without GPUs

Why image-generating AI makes spelling mistakes….”Even LLM can't actually read text”

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.