JPEG AI Blurs the Line Between Real and Synthetic

-

In February of this yr, the JPEG AI international standard was published, after several years of research geared toward using machine learning techniques to supply a smaller and more easily transmissible and storable image codec, with out a loss in perceptual quality.

Source: https://jpeg.org/jpegai/documentation.html

One possible reason why this advent made few headlines is that the core PDFs for this announcement were (satirically) not available through free-access portals akin to Arxiv. Nonetheless, Arxiv had already recommend various studies examining the importance of JPEG AI across several features, including the tactic’s unusual compression artifacts and its significance for forensics.

One study compared compression artefacts, including those of an earlier draft of JPEG AI, finding that the new method had a tendency to blur text – not a minor matter in cases where the codec might contribute to an evidence chain. Source: https://arxiv.org/pdf/2411.06810

Source: https://arxiv.org/pdf/2411.06810

Because JPEG AI alters images in ways in which mimic the artifacts of synthetic image generators, existing forensic tools have difficulty differentiating real from fake imagery:

After JPEG AI compression, state-of-the-art algorithms can no longer reliably separate authentic content from manipulated regions in localization maps, according to a recent paper (March 2025). The source examples seen on the left are manipulated/fake images, wherein the tampered regions are clearly delineated under standard forensic techniques (center image). However, JPEG AI compression lends the fake images a layer of credibility (image on far right). Source: https://arxiv.org/pdf/2412.03261

Source: https://arxiv.org/pdf/2412.03261

One reason is that JPEG AI is trained using a model architecture much like those utilized by generative systems that forensic tools aim to detect:

The new paper illustrates the similarity between the methodologies of Ai-driven image compression and actual AI-generated images. Source: https://arxiv.org/pdf/2504.03191

Source: https://arxiv.org/pdf/2504.03191

Due to this fact each models may produce some similar underlying visual characteristics, from a forensic standpoint.

Quantization

This cross-over occurs due to , common to each architectures, and which is utilized in machine learning each as a technique of converting continuous data into discrete data points, and as an optimization technique that may significantly slim down the file-size of a trained model (casual image synthesis enthusiasts shall be accustomed to the wait between an unwieldy official model release, and a community-led quantized version that may run on local hardware).

On this context, quantization refers back to the means of converting the continual values within the image’s latent representation into fixed, discrete steps. JPEG AI uses this process to to store or transmit a picture by simplifying the inner numerical representation.

Though quantization makes encoding more efficient, it also imposes structural regularities that may resemble the artifacts left by generative models – adequately subtle to evade perception, but disruptive to forensic tools.

In response, the authors of a latest work titled propose interpretable, non-neural techniques that detect JPEG AI compression; determine if a picture has been recompressed; and distinguish compressed real images from those generated entirely by AI.

Method

Color Correlations

The paper proposes three ‘forensic cues’ tailored to JPEG AI images: , introduced during JPEG AI’s preprocessing steps; across repeated compressions that reveal recompression events; and that help distinguish between images compressed by JPEG AI and people generated by AI models.

Regarding the colour correlation-based approach, JPEG AI’s preprocessing pipeline introduces statistical dependencies between the image’s color channels, making a signature that may function a forensic cue.

JPEG AI converts RGB images to the YUV color space and performs 4:2:0 chroma subsampling, which involves downsampling the chrominance channels before compression. This process results in subtle correlations between the high-frequency residuals of the red, green, and blue channels – correlations that are usually not present in uncompressed images, and which differ in strength from those produced by traditional JPEG compression or synthetic image generators.

A comparison of how JPEG AI compression alters color correlations in images, using the red channel as an example. Panel (a) compares uncompressed images to JPEG AI-compressed ones, showing that compression significantly increases inter-channel correlation. Panel (b) isolates the effect of JPEG AI’s preprocessing–just the color conversion and subsampling–demonstrating that even this step alone raises correlations noticeably. Panel (c) shows that traditional JPEG compression also increases correlations slightly, but not to the same degree. Panel (d) examines synthetic images, with Midjourney-V5 and Firefly displaying moderate correlation increases, while others remain closer to uncompressed levels.

Above we will see a comparison from the paper illustrating how JPEG AI compression alters color correlations in images, using the red channel for example.

Panel A compares uncompressed images to JPEG AI-compressed ones, showing that compression significantly increases inter-channel correlation; panel B isolates the effect of JPEG AI’s preprocessing – just the colour conversion and subsampling – demonstrating that even this step alone raises correlations noticeably; panel C shows that traditional JPEG compression also increases correlations barely, but to not the identical degree; and Panel D examines synthetic images, with Midjourney-V5 and Adobe Firefly displaying moderate correlation increases, while others remain closer to uncompressed levels.

Rate-Distortion

The speed-distortion cue identifies JPEG AI recompression by tracking how image quality, measured by Peak Signal-to-Noise Ratio (PSNR), declines in a predictable pattern across multiple compression passes.

The research contends that repeatedly compressing a picture with JPEG AI results in progressively smaller, but still measurable, losses in image quality, as quantified by PSNR, and that this gradual degradation forms the idea of a forensic cue for detecting whether a picture has been recompressed.

Unlike traditional JPEG, where earlier methods tracked changes in specific image blocks, JPEG AI requires a distinct approach, as a consequence of its neural compression architecture; due to this fact the authors propose monitoring how each bitrate and PSNR evolve over successive compressions. Each round of compression alters the image lower than the one prior, and this diminishing change (when plotted against bitrate) can reveal whether a picture has passed through multiple compression stages:

An illustration of how repeated compression affects image quality across different codecs shows that JPEG AI and neural codec developed at https://arxiv.org/pdf/1802.01436 both exhibit a steady decline in PSNR with each additional compression – even at lower bitrates. In contrast, traditional JPEG maintains relatively stable quality across multiple compressions unless the bitrate is high. This pattern serves as an example of how recompression leaves a measurable trace in AI-based codecs, offering a potential forensic signal.

Within the image above, we see charted rate-distortion curves for JPEG AI; a second AI-based codec; and traditional JPEG, finding that JPEG AI and the neural codec show a consistent PSNR decline across all bitrates, while traditional JPEG only shows noticeable degradation at much higher bitrates. This behavior provides a quantifiable signal that could be used to flag recompressed JPEG AI images.

By extracting how bitrate and image quality evolve over multiple compression rounds, the authors similarly constructed a signature that helps flag whether a picture has been recompressed, affording a possible practical forensic cue within the context of JPEG AI.

Quantization

As we saw earlier, considered one of the tougher forensic problems raised by JPEG AI is its visual similarity to synthetic images generated by diffusion models. Each systems use encoder–decoder architectures that process images in a compressed latent space and sometimes leave behind subtle upsampling artifacts.

These shared traits can confuse detectors – even those retrained on JPEG AI images. Nonetheless, a key structural difference stays: JPEG AI applies quantization, a step that rounds latent values to discrete levels for efficient compression, while generative models typically don’t.

The brand new paper uses this distinction to design a forensic cue that obliquely tests for the presence of quantization. The strategy analyzes how the latent representation of a picture responds to rounding, on the belief that if a picture has already been quantized, its latent structure will exhibit a measurable pattern of alignment with rounded values.

These patterns, while invisible to the attention, produce statistical differences that can assist separate compressed real images from fully synthetic ones.

An example of average Fourier spectra reveals that both JPEG AI-compressed images and those generated by diffusion models like Midjourney-V5 and Stable Diffusion XL exhibit regular grid-like patterns in the frequency domain – artifacts commonly linked to upsampling. By contrast, real images lack these patterns. This overlap in spectral structure helps explain why forensic tools often confuse compressed real images with synthetic ones.

Importantly, the authors show that this cue works across different generative models and stays effective even when compression is powerful enough to zero out entire sections of the latent space. In contrast, synthetic images show much weaker responses to this rounding test, offering a practical option to distinguish between the 2.

The result is meant as a light-weight and interpretable tool targeting the core difference between compression and generation, slightly than counting on brittle surface artifacts.

Data and Tests

Compression

To judge whether their color correlation cue could reliably detect JPEG AI compression (i.e., a primary pass from uncompressed source), the authors tested it on high-quality uncompressed images from the RAISE dataset, compressing these at quite a lot of bitrates, using the JPEG AI reference implementation.

They trained an easy random forest on the statistical patterns of color channel correlations (particularly how residual noise in each channel aligned with the others)  and compared this to a ResNet50 neural network trained directly on the image pixels.

Detection accuracy of JPEG AI compression using color correlation features, compared across multiple bitrates. The method is most effective at lower bitrates, where compression artifacts are stronger, and shows better generalization to unseen compression levels than the baseline ResNet50 model.

While the ResNet50 achieved higher accuracy when the test data closely matched its training conditions, it struggled to generalize across different compression levels. The correlation-based approach, although far simpler, proved more consistent across bitrates, especially at lower compression rates where JPEG AI’s preprocessing has a stronger effect.

These results suggest that even without deep learning, it is feasible to detect JPEG AI compression using statistical cues that remain interpretable and resilient.

Recompression

To judge whether JPEG AI compression could be reliably detected, the researchers tested the rate-distortion cue on a set of images compressed at diverse bitrates – some just once and others a second time using JPEG AI.

This method involved extracting a 17-dimensional feature vector to trace how the image’s bitrate and PSNR evolved across three compression passes. This feature set captured how much quality was lost at each step, and the way the latent and hyperprior rates behave—metrics that traditional pixel-based methods can’t easily access.

The researchers trained a random forest on these features and compared its performance to a ResNet50 trained on image patches:

Results for the classification accuracy of a random forest trained on rate-distortion features for detecting whether a JPEG AI image has been recompressed. The method performs best when the initial compression is strong (i.e., at lower bitrates), and then consistently outperforms a pixel-based ResNet50 – especially in cases where the second compression is milder than the first.

The random forest proved notably effective when the initial compression was strong (i.e., at lower bitrates), revealing clear differences between single and double-compressed images. As with the prior cue, the ResNet50 iteration struggled to generalize, particularly when tested on compression levels it had not seen during training.

The speed-distortion features, against this, remained stable across a wide selection of scenarios. Notably, the cue worked even when applied to a distinct AI-based codec, suggesting that the approach generalizes beyond JPEG AI.

JPEG AI and Synthetic Images

For the ultimate testing round, the authors tested whether their quantization-based features can distinguish between JPEG AI-compressed images and fully synthetic images generated by models akin to Midjourney, Stable Diffusion, DALL-E 2, Glide, and Adobe Firefly.

For this, the researchers used a subset of the Synthbuster dataset, mixing real photos from the RAISE database with generated images from a variety of diffusion and GAN-based models.

Examples of synthetic images in Synthbuster, generated using text prompts inspired by natural photographs from the RAISE-1k dataset. The images were created with various diffusion models, with prompts designed to produce photorealistic content and textures rather than stylized or artistic renderings, reflecting the dataset’s focus on testing methods for distinguishing real from generated images.

Source: https://ieeexplore.ieee.org/document/10334046

The actual images were compressed using JPEG AI at several bitrate levels, and classification was posed as a two-way task: either JPEG AI versus a selected generator, or a selected bitrate versus Stable Diffusion XL.

The quantization features (correlations extracted from latent representations) were calculated from a hard and fast 256×256 region and fed to a random forest classifier. As a baseline, a ResNet50 was trained on pixel patches from the identical data.

Classification accuracy of a random forest using quantization features to separate JPEG AI-compressed images from synthetic images.

Across most conditions, the quantization-based approach outperformed the ResNet50 baseline, particularly at low bitrates where compression artifacts were stronger.

The authors state:

A projection of the feature space using UMAP showed clear separation between JPEG AI and artificial images, with lower bitrates increasing the space between classes. One consistent outlier was Glide, whose images clustered otherwise and had the bottom detection accuracy of any generator tested.

Two-dimensional UMAP visualization of JPEG AI-compressed and synthetic images based on quantization features. The left plot shows that lower JPEG AI bitrates create greater separation from synthetic images; the right plot, how images from different generators cluster distinctly within the feature space.

Finally, the authors evaluated how well the features held up under typical post-processing, akin to JPEG recompression or downsampling. While performance declined with heavier processing, the drop was gradual, suggesting that the approach retains some robustness even under degraded conditions.

Evaluation of quantization feature robustness under postprocessing, including JPEG recompression (JPG) and image resizing (RS).

Conclusion

It’s not guaranteed that JPEG AI will enjoy wide adoption. For one thing, there’s enough infrastructural debt at hand to impose friction on latest codec; and even a ‘conventional’ codec with a fantastic pedigree and broad consensus as to its value, akin to AV1, has a tough time dislodging long-established incumbent methods.

With reference to the system’s potential clash with AI generators, the characteristic quantization artifacts that help the generation of AI image detectors could also be diminished or ultimately replaced by traces of a distinct kind, in later systems (assuming that AI generators will at all times leave forensic residue, which isn’t certain).

This is able to mean that JPEG AI’s own quantization characteristics, perhaps together with other cues identified by the brand new paper, may not find yourself colliding with the forensic trail of essentially the most effective latest generative AI systems.

If, nonetheless, JPEG AI continues to operate as a ‘AI wash’, significantly blurring the excellence between real and generated images, it might be hard to make a convincing case for its uptake.

 

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x