Controlled diffusion model can change material properties in images

-

Researchers from the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and Google Research can have just performed digital sorcery — in the shape of a diffusion model that may change the fabric properties of objects in images.

Dubbed Alchemist, the system allows users to change 4 attributes of each real and AI-generated pictures: roughness, metallicity, albedo (an object’s initial base color), and transparency. As an image-to-image diffusion model, one can input any photo after which adjust each property inside a continuous scale of -1 to 1 to create a brand new visual. These photo editing capabilities could potentially extend to improving the models in video games, expanding the capabilities of AI in visual effects, and enriching robotic training data.

The magic behind Alchemist starts with a denoising diffusion model: In practice, researchers used Stable Diffusion 1.5, which is a text-to-image model lauded for its photorealistic results and editing capabilities. Previous work built on the favored model to enable users to make higher-level changes, like swapping objects or altering the depth of images. In contrast, CSAIL and Google Research’s method applies this model to deal with low-level attributes, revising the finer details of an object’s material properties with a singular, slider-based interface that outperforms its counterparts.

While prior diffusion systems could pull a proverbial rabbit out of a hat for a picture, Alchemist could transform that very same animal to look translucent. The system could also make a rubber duck appear metallic, remove the golden hue from a goldfish, and shine an old shoe. Programs like Photoshop have similar capabilities, but this model can change material properties in a more straightforward way. For example, modifying the metallic look of a photograph requires several steps within the widely used application.

“While you take a look at a picture you’ve created, often the result isn’t exactly what you’ve got in mind,” says Prafull Sharma, MIT PhD student in electrical engineering and computer science, CSAIL affiliate, and lead creator on a brand new paper describing the work. “You would like to control the image while editing it, but the present controls in image editors are usually not capable of change the materials. With Alchemist, we capitalize on the photorealism of outputs from text-to-image models and tease out a slider control that permits us to switch a selected property after the initial picture is provided.”

Precise control

“Text-to-image generative models have empowered on a regular basis users to generate images as effortlessly as writing a sentence. Nonetheless, controlling these models will be difficult,” says Carnegie Mellon University Assistant Professor Jun-Yan Zhu, who was not involved within the paper. “While generating a vase is easy, synthesizing a vase with specific material properties resembling transparency and roughness requires users to spend hours trying different text prompts and random seeds. This will be frustrating, especially for skilled users who require precision of their work. Alchemist presents a practical solution to this challenge by enabling precise control over the materials of an input image while harnessing the data-driven priors of large-scale diffusion models, inspiring future works to seamlessly incorporate generative models into the present interfaces of commonly used content creation software.”

Alchemist’s design capabilities could help tweak the looks of various models in video games. Applying such a diffusion model on this domain could help creators speed up their design process, refining textures to suit the gameplay of a level. Furthermore, Sharma and his team’s project could assist with altering graphic design elements, videos, and movie effects to boost photorealism and achieve the specified material appearance with precision.

The tactic could also refine robotic training data for tasks like manipulation. By introducing the machines to more textures, they will higher understand the various items they’ll grasp in the actual world. Alchemist may even potentially help with image classification, analyzing where a neural network fails to acknowledge the fabric changes of a picture.

Sharma and his team’s work exceeded similar models at faithfully editing only the requested object of interest. For instance, when a user prompted different models to tweak a dolphin to max transparency, only Alchemist achieved this feat while leaving the ocean backdrop unedited. When the researchers trained comparable diffusion model InstructPix2Pix on the identical data as their method for comparison, they found that Alchemist achieved superior accuracy scores. Likewise, a user study revealed that the MIT model was preferred and seen as more photorealistic than its counterpart.

Keeping it real with synthetic data

In accordance with the researchers, collecting real data was impractical. As an alternative, they trained their model on an artificial dataset, randomly editing the fabric attributes of 1,200 materials applied to 100 publicly available, unique 3D objects in Blender, a well-liked computer graphics design tool.

“The control of generative AI image synthesis has thus far been constrained by what text can describe,” says Frédo Durand, the Amar Bose Professor of Computing within the MIT Department of Electrical Engineering and Computer Science (EECS) and CSAIL member, who’s a senior creator on the paper. “This work opens latest and finer-grain control for visual attributes inherited from many years of computer-graphics research.”

“Alchemist is the type of technique that is needed to make machine learning and diffusion models practical and useful to the CGI community and graphic designers,” adds Google Research senior software engineer and co-author Mark Matthews. “Without it, you are stuck with this type of uncontrollable stochasticity. It’s perhaps fun for some time, but sooner or later, you want to get real work done and have it obey a creative vision.”

Sharma’s latest project comes a 12 months after he led research on Materialistic, a machine-learning method that may discover similar materials in a picture. This previous work demonstrated how AI models can refine their material understanding skills, and like Alchemist, was fine-tuned on an artificial dataset of 3D models from Blender.

Still, Alchemist has a couple of limitations in the intervening time. The model struggles to appropriately infer illumination, so it occasionally fails to follow a user’s input. Sharma notes that this method sometimes generates physically implausible transparencies, too. Picture a hand partially inside a cereal box, for instance — at Alchemist’s maximum setting for this attribute, you’d see a transparent container without the fingers reaching in.

The researchers would really like to expand on how such a model could improve 3D assets for graphics at scene level. Also, Alchemist could help infer material properties from images. In accordance with Sharma, one of these work could unlock links between objects’ visual and mechanical traits in the long run.

MIT EECS professor and CSAIL member William T. Freeman can also be a senior creator, joining Varun Jampani, and Google Research scientists Yuanzhen Li PhD ’09, Xuhui Jia, and Dmitry Lagun. The work was supported, partly, by a National Science Foundation grant and gifts from Google and Amazon. The group’s work might be highlighted at CVPR in June.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x