The Rise of Hunyuan Video Deepfakes

-

Something noteworthy is currently happening within the AI synthesis community, though its significance may take some time to turn into clear. Hobbyists are training generative AI video models to breed the likenesses of individuals, using video-based LoRAs on Tencent’s recently released open source Hunyuan Video framework.*

Sources: civit.ai

Within the video shown above, the likenesses of actresses Natalie Portman, Christina Hendricks and Scarlett Johansson, along with tech leader Elon Musk, have been trained into relatively small add-on files for the Hunyuan generative video system, which will be installed without content filters (resembling NSFW filters) on a user’s computer.

The creator of the Christina Hendricks LoRA shown above states that only 16 images from the TV show were needed to develop the model (which is a mere 307mb download); multiple posts from the Stable Diffusion community at Reddit and Discord confirm that LoRAs of this sort don’t require high amounts of coaching data, or high training times, most often.

C

Hunyuan LoRAs will be trained on either static images or videos, though training on videos requires greater hardware resources and increased training time.

The Hunyuan Video model features 13 billion parameters, exceeding Sora’s 12 billion parameters, and much exceeding the less-capable Hunyuan-DiT model released to open source in summer of 2024, which has only one.5 billion parameters.

As was the case two and a half years ago with Stable Diffusion and LoRA (see examples of Stable Diffusion 1.5’s ‘native’ celebrities here), the muse model in query has a way more limited understanding of celebrity personalities, in comparison with the extent of fidelity that will be obtained through ‘ID-injected’ LoRA implementations.

Effectively, a customized, personality-focused LoRA gets a ‘free ride’ on the numerous synthesis capabilities of the bottom Hunyuan model, offering a notably simpler human synthesis than will be obtained either by 2017-era autoencoder deepfakes or by attempting so as to add movement to static images via systems resembling the feted LivePortrait.

All of the LoRAs depicted here will be downloaded freely from the highly popular Civit community, while the more abundant variety of older custom-made ‘static-image’ LoRAs may also potentially create ‘seed’ images for the video creation process (i.e., image-to-video, a pending release for Hunyuan Video, though workarounds are possible, for the moment).

As I write, the Civit website offers 128 search results for ‘Hunyuan’*. Nearly all of those are in a roundabout way NSFW models; 22 depict celebrities; 18 are designed to facilitate the generation of hardcore pornography; and only seven of them depict men quite than women.

So What’s Latest?

As a result of the evolving nature of the term , and limited public understanding of the (quite severe) limitations of AI human video synthesis frameworks so far, the importance of the Hunyuan LoRA just isn’t easy to know for an individual casually following the generative AI scene. Let’s review a number of the key differences between Hunyuan LoRAs and prior approaches to identity-based AI video generation.

1: Unfettered Local Installation

A very powerful aspect of Hunyuan Video is the undeniable fact that it may well be downloaded locally, and that it puts a really powerful and AI video generation system within the hands of the casual user, in addition to the VFX community (to the extent that licenses may allow across realms).

The last time this happened was the arrival of the discharge to open source of the Stability.ai Stable Diffusion model in the summertime of 2022. At the moment, OpenAI’s DALL-E2 had captured the general public imagination, though DALLE-2 was a paid service with notable restrictions (which grew over time).

When Stable Diffusion became available, and Low-Rank Adaptation then made it possible to generate images of the identity of person (celebrity or not), the massive locus of developer and consumer interest helped Stable Diffusion to eclipse the recognition of DALLE-2; though the latter was a more capable system out-of-the-box, its censorship routines were seen as onerous by a lot of its users, and customization was impossible.

Arguably, the identical scenario now applies between Sora and Hunyuan – or, more accurately, between proprietary generative video systems, and open source rivals, of which Hunyuan is the primary – but probably not the last (here, consider that Flux would eventually gain significant ground on Stable Diffusion).

Users who want to create Hunyuan LoRA output, but who lack effectively beefy equipment, can, as ever, offload the GPU aspect of coaching to online compute services resembling RunPod. This just isn’t the identical as creating AI videos at platforms resembling Kaiber or Kling, since there isn’t any semantic or image-based filtering (censoring) entailed in renting a web-based GPU to support an otherwise local workflow.

2: No Need for ‘Host’ Videos and High Effort

When deepfakes burst onto the scene at the tip of 2017, the anonymously-posted code would evolve into the mainstream forks DeepFaceLab and FaceSwap (in addition to the DeepFaceLive real-time deepfaking system).

This method required the painstaking curation of 1000’s of face images of every identity to be swapped; the less effort put into this stage, the less effective the model could be. Moreover, training times varied between 2-14 days, depending on available hardware, stressing even capable systems in the long run.

When the model was finally ready, it could only impose faces into existing video, and frequently needed a ‘goal’ (i.e., real) identity that was close in appearance to the superimposed identity.

More recently, ROOP, LivePortrait and various similar frameworks have provided similar functionality with far less effort, and infrequently with superior results – but with no capability to generate accurate full-body deepfakes – or any element apart from faces.

Sources: https://www.youtube.com/watch?v=i39xeYPBAAM and https://www.youtube.com/watch?v=QGatEItg2Ns

In contrast, Hunyuan LoRAs (and the same systems that may inevitably follow) allow for unfettered creation of entire worlds, including full-body simulation of the user-trained LoRA identity.

3: Massively Improved Temporal Consistency

Temporal consistency has been the Holy Grail of diffusion video for several years now. The usage of a LoRA, along with apposite prompts, gives a Hunyuan video generation a relentless identity reference to stick to. In theory (these are early days), one could train multiple LoRAs of a specific identity, each wearing specific clothing.

Under those auspices, the clothing too is less more likely to ‘mutate’ throughout the course of a video generation (for the reason that generative system bases the following frame on a really limited window of prior frames).

4: Access to the ‘Human Experiment’

As I recently observed, the proprietary and FAANG-level generative AI sector now appears to be so wary of potential criticism referring to the human synthesis capabilities of its projects, that actual rarely appear in project pages for major announcements and releases. As an alternative, related publicity literature increasingly tends to indicate ‘cute’ and otherwise ‘non-threatening’ subjects in synthesized results.

With the arrival of Hunyuan LoRAs, for the primary time, the community has a chance to push the boundaries of LDM-based human video synthesis in a highly capable (quite than marginal) system, and to totally explore the topic that the majority interests nearly all of us – people.

Implications

Since a seek for ‘Hunyuan’ on the Civit community mostly shows celebrity LoRAs and ‘hardcore’ LoRAs, the central implication of the arrival of Hunyuan LoRAs is that they shall be used to create AI pornographic (or otherwise defamatory) videos of real people – celebs and unknowns alike.

For compliance purposes, the hobbyists who create Hunyuan LoRAs and who experiment with them on diverse Discord servers are careful to ban examples of real people from being posted. The truth is that even -based deepfakes are actually severely weaponized; and the prospect of adding truly realistic videos into the combination may finally justify the heightened fears which have been recurrent within the media during the last seven years, and which have prompted recent regulations.

The Driving Force

As ever, porn stays the driving force for technology. Whatever our opinion of such usage, this relentless engine of impetus drives advances within the state-of-the-art that may ultimately profit more mainstream adoption.

On this case, it is feasible that the value shall be higher than usual, for the reason that open-sourcing of hyper-realistic video creation has obvious implications for criminal, political and ethical misuse.

One Reddit group (which I won’t name here) dedicated to AI generation of NSFW video content has an associated, open Discord server where users are refining ComfyUI workflows for Hunyuan-based video porn generation. Day by day, users post examples of NSFW clips – a lot of which might reasonably be termed ‘extreme’, or a minimum of straining the restrictions stated in forum rules.

This community also maintains a considerable and well-developed GitHub repository featuring tools that may download and process pornographic videos, to offer training data for brand spanking new models.

Since the preferred LoRA trainer, Kohya-ss, now supports Hunyuan LoRA training, the barriers to entry for unbounded generative video training are lowering every day, together with the hardware requirements for Hunyuan training and video generation.

The crucial aspect of dedicated training schemes for porn-based AI (quite than -based models, resembling celebrities) is that a normal foundation model like Hunyuan just isn’t specifically trained on NSFW output, and should due to this fact either perform poorly when asked to generate NSFW content, or fail to disentangle learned concepts and associations in a performative or convincing manner.

By developing fine-tuned NSFW foundation models and LoRAs, it would be increasingly possible to project trained identities right into a dedicated ‘porn’ video domain; in any case, this is barely the video version of something that has already occurred for still images during the last two and a half years.

VFX

The large increase in temporal consistency that Hunyuan Video LoRAs offer is an obvious boon to the AI visual effects industry, which leans very heavily on adapting open source software.

Though a Hunyuan Video LoRA approach generates a complete frame and environment, VFX corporations have almost actually begun to experiment with isolating the temporally-consistent human faces that will be obtained by this method, to be able to superimpose or integrate faces into real-world source footage.

Just like the hobbyist community, VFX corporations must wait for Hunyuan Video’s image-to-video and video-to-video functionality, which is potentially essentially the most useful bridge between LoRA-driven, ID-based ‘deepfake’ content; or else improvise, and use the interval to probe the outer capabilities of the framework and of potential adaptations, and even proprietary in-house forks of Hunyuan Video.

Though the license terms for Hunyuan Video technically allow the depiction of real individuals as long as permission is given, they prohibit its use within the EU, United Kingdom, and in South Korea. On the ‘stays in Vegas’ principle, this doesn’t necessarily mean that Hunyuan Video won’t be utilized in these regions; nonetheless, the prospect of external data audits, to implement a growing regulations around generative AI, could make such illicit usage dangerous.

One other potentially ambiguous area of the license terms states:

This clause is clearly aimed toward the multitude of corporations which can be more likely to ‘middleman’ Hunyuan Video for a comparatively tech-illiterate body of users, and who shall be required to chop Tencent into the motion, above a certain ceiling of users.

Whether or not the broad phrasing could also cover usage (i.e., via the supply of Hunyuan-enabled visual effects output in popular movies and TV) might have clarification.

Conclusion

Since deepfake video has existed for a very long time, it will be easy to underestimate the importance of Hunyuan Video LoRA as an approach to identity synthesis, and deepfaking; and to assume that the developments currently manifesting on the Civit community, and at related Discords and subreddits, represent a mere incremental nudge towards truly controllable human video synthesis.

More likely is that the present efforts represent only a fraction of Hunyuan Video’s potential to create completely convincing full-body and full-environment deepfakes; once the image-to-video component is released (rumored to be occurring this month), a way more granular level of generative power will turn into available to each the hobbyist and skilled communities.

When Stability.ai released Stable Diffusion in 2022, many observers couldn’t determine why the corporate would just give away what was, on the time, such a precious and powerful generative system. With Hunyuan Video, the profit motive is built directly into the license – albeit that it could prove difficult for Tencent to find out when an organization triggers the profit-sharing scheme.

In any case, the result is similar because it was in 2022: dedicated development communities have formed immediately and with intense fervor around the discharge. Among the roads that these efforts will absorb the following 12 months are surely set to prompt recent headlines.

 

*

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x