The AI tools for Art Newsletter

-


Linoy Tsaban's avatar



First issue 🎉

The AI space is moving so fast it’s hard to imagine that a 12 months ago we still struggled to generate individuals with the right amount of fingers 😂.

The last couple of years have been pivotal for open source models and tools for artistic usage.
AI tools for creative expression have never been more accessible, and we’re only scratching the surface.
Join us as we glance back at the important thing milestones, tools, and breakthroughs in AI & Arts from 2024,
and forward for what’s to are available 2025 (spoiler 👀: we’re starting a brand new monthly roundup 👇).



Table of Contents



Major Releases of 2024

What were the standout releases of creative AI tools in 2024? We’ll highlight the foremost releases across creative and
artistic fields, with a specific concentrate on open-source developments in popular tasks like image and video generation.

2024 highlights



Image Generation

Over 2 years because the OG stable diffusion was released and made waves in image generation with open source models, it’s now protected to say that on the subject of image generation from text, image editing and controlled image generation – open source models are giving closed source models a run for his or her money.
2024 highlights



Text-to-image generation

flux
2024 was the 12 months we shifted paradigms of diffusion models – from the normal Unet based architecture to Diffusion Transformer (DiT), in addition to an objective switch to flow matching.

TD;LR – diffusion models and Gaussian flow matching are equivalent. Flow matching proposes a vector field parametrization of the network output that’s different in comparison with those commonly utilized in diffusion models previously.

  • We recommend this great blog by Google DeepMind in the event you’re inquisitive about learning more about flow matching and the reference to diffusion models

Back to practice: First to announce the shift was Stability AI with Stable Diffusion 3, nonetheless it was HunyuanDiT that became the primary open source model with DiT architecture.
This trend continued with the releases of AuraFlow, Flux.1 and Stable Diffusion 3.5.

Amongst many pivotal moments within the (not so long) history of open source image generation models, it’s protected to say that the discharge of Flux.1 was one in all them. Flux [dev] achieved a brand new state-of-the-art, surpassing popular closed source models like Midjourney v6.0, DALL·E 3 (HD) on various benchmarks.



Personalization & stylization

A positive side effect of advancements in image models is the numerous improvement in personalization techniques for text-to-image models and controlled generation.

Back in August 2022, transformative works like Textual Inversion and DreamBooth enhanced our ability to teach and introduce latest concepts to text-to-image models, drastically expanding what might be done with them. These opened the door to a stream of improvements and enhancements constructing on top of those techniques (resembling LoRA for diffusion models).

textual inversion - dreambooth

Nonetheless, an upper certain to the standard of the fine-tuned models is of course the bottom model from which it was fine-tuned. In that sense, we will’t neglect Stable Diffusion XL, which was also a major marker in personalization for open source image generation models. An affidavit to that’s that even now, lots of the popular techniques and models for personalization and controlled generation are based on SDXL. The advanced abilities of SDXL (and models that were released after with similar quality) along with the growing understanding of the semantic roles of various components within the diffusion model architecture raises the query –
what can we achieve without further optimization?

cue within the rain of zero shot techniques – 2024 was definitely the 12 months when generating prime quality portraits from
reference photos was made possible with as little as a single reference image & with none optimization. Training free
techniques like IP adapter FaceID, InstantID, Photomaker and more got here out and demonstrated competitive if
not even superior abilities to those of fine-tuned models.

instantid

Similarly, image editing and controlled generation – resembling image generation with canny / depth / pose constraints made progress too – each because of the growing quality of the bottom models and the community’s growing understanding of the semantic roles different components have (Fast Style, B-LoRA)

So what’s next? because the shift of paradigms to DiT and flow matching objectives,
additional models got here out attempting to utilize DiT-based models like Flux and SD3.5 for similar purposes, but to date not quite beating the standard of the SDXL-based ones despite the superior quality of the underlying base model. This might be attributed to the relative lack of knowledge of semantic roles of various components of the DiT in comparison with the Unet. 2025 might be the 12 months once we discover those roles in DiTs as well, unlocking more possibilities with the following generation of image generation models.



Video Generation


Versus image generation, with video we still have a approach to go.
But, it’s protected to say that we’re very far-off from where we were a 12 months ago. While we’re all about open-source,
the credit for (some) of the numerous leap in AI video generation goes to OpenAI’s sora for changing our
expectations of video model capabilities quite radically. And as fofr put nicely in AI video is having its Stable Diffusion moment (which we recommend reading 🙂) – it
made everyone realize what is feasible.

The recent surge of open-source video generation models, including CogVideoX, Mochi, Allegro, LTX Video,
and HunyuanVideo, has also been noteworthy. Video generation is inherently tougher than image generation as a result of the necessity for motion quality, coherence, and consistency. Moreover, video generation requires substantial computational and memory resources, resulting in significant generation latency. This often hinders local usage, making many latest open video models inaccessible to community hardware without extensive memory optimizations and quantization approaches that impact each inference latency and the standard of generated videos. Nevertheless the open source community has made remarkable progress – which was recently covered on this blog on the state of open video generation models.

While this means that almost all community members are still unable to experiment and develop with open-source video models, it also suggests that we will expect significant advancements in 2025.



Audio Generation

Audio generation has progressed significantly previously 12 months going from easy sounds to finish songs with lyrics.
Despite challenges – Audio signals are complex and multifaceted, require more sophisticated mathematical models than
models that generate text or images and training data quite scarce – 2024 saw open source releases like OuteTTS and
IndicParlerTTS for text to speech and openai’s Whisper large v3 turbo for audio speech recognition.
The 12 months 2025 is already shaping as much as be a breakthrough 12 months for audio models, with a remarkable variety of releases
in January alone. We have seen the discharge of three latest text-to-speech models: Kokoro, LLasa TTS and OuteTTS 0.3,
in addition to two latest music models: JASCO and YuE. With this pace, we will expect much more exciting developments in
the audio space all year long.

This song👇 was generated with YuE 🤯




Creative Tools that Shined in 2024

The fantastic thing about open source is that it allows the community to experiment, find latest usages for existing models / pipelines, improve on and construct latest tools together. Most of the creative AI tools that were popular this 12 months are the fruit of joint community effort.

Listed below are a few of our favorites:



Flux fine-tuning

Most of the amazing Flux fine-tunes created within the last 12 months were trained because of the AI-toolkit by ostris.



Face to all

Inspired by fofr’s face-to-many, Face to All combines the viral Fast ID model with added ControlNet depth constraints and community fine-tuned SDXL LoRAs to create training-free and high-quality portraits in creative stylizations.

face to all



Flux style shaping

Based on a ComfyUI workflow by Nathan Shipley, Flux style shaping combines Flux [dev] Redux and Flux [dev] Depth for style transfer and optical illusion creation.

style shaping



Outpainting with diffusers

Diffusers Image Outpaint makes use of the diffusers Stable Diffusion XL Fill Pipeline along with an SDXL union controlnet to seamlessly expand an input image.



Live portrait, Face Poke

Adding mimics to a static portrait was never easier with Live Portrait and Face Poke.




TRELLIS

TRELLIS is a 3D generation model for versatile and high-quality 3D asset creation that took over the 3D landscape with a bang.




IC Light

IC-Light, which stands for “Imposing Consistent Light”, is a tool for relighting with foreground condition.



What should we expect for AI & Art in 2025?

2025 is the 12 months for open-source to atone for video, movement, and audio models, making room for more modalities. With advancements in efficient computing and quantization, we will expect significant leaps in open-source video models. As we approach a (natural) plateau in image generation models, we will shift our focus to other tasks and modalities.



Commencing strong – Open source releases of January 25

  1. YuE – series of open-source music foundation models for full song generation.
    YuE is possibly the perfect open source model for music generation (with an Apache 2.0 license!), achieving competitive results to closed source models like Suno.

    try it out & read more: demo, model weights.


  1. Hunyuan 3D-2 , SPAR3D, DiffSplat – 3D generation models.
    3D models are coming in hot – not long after the discharge of TRELLIS, Hunyuan 3D-2, SPAR3D and DiffSplat are here to
    take over the 3D landscape.

    try it out & read more:

  2. Lumina-Image 2.0 – text to image model.
    Lumina is a 2B parameter model competitive with the 12B Flux.1 [dev] and with an Apache 2.0 license(!!).

    try it out & read more: demo, model weights.

  3. ComfyUI-to-Gradio
    a step-by-step guide on easy methods to convert a posh ComfyUI workflow to an easy Gradio application, and easy methods to deploy this application on Hugging Face Spaces ZeroGPU serverless structure, which allows for it to be deployed and run totally free in a serverless manner
    read more here.



Announcing Our Newsletter 🗞️

Kicking off with this blog, we (Poli & Linoy) shall be bringing you a monthly roundup of the newest within the creative
AI world. In such a fast-evolving space, it’s tough to remain on top of all the brand new developments,
let alone sift through them. That’s where we are available & hopefully this manner we will make creative AI tools more accessible



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x