Computer-aided design (CAD) systems are tried-and-true tools used to design lots of the physical objects we use every day. But CAD software requires extensive expertise to master, and lots of tools incorporate such a high level of detail they don’t lend themselves to brainstorming or rapid prototyping.
In an effort to make design faster and more accessible for non-experts, researchers from MIT and elsewhere developed an AI-driven robotic assembly system that enables people to construct physical objects by simply describing them in words.
Their system uses a generative AI model to construct a 3D representation of an object’s geometry based on the user’s prompt. Then, a second generative AI model reasons in regards to the desired object and figures out where different components should go, in response to the item’s function and geometry.
The system can routinely construct the item from a set of prefabricated parts using robotic assembly. It will probably also iterate on the design based on feedback from the user.
The researchers used this end-to-end system to fabricate furniture, including chairs and shelves, from two varieties of premade components. The components could be disassembled and reassembled at will, reducing the quantity of waste generated through the fabrication process.
They evaluated these designs through a user study and located that greater than 90 percent of participants preferred the objects made by their AI-driven system, as in comparison with different approaches.
While this work is an initial demonstration, the framework could possibly be especially useful for rapid prototyping complex objects like aerospace components and architectural objects. In the long run, it could possibly be utilized in homes to fabricate furniture or other objects locally, without the necessity to have bulky products shipped from a central facility.
“Ultimately, we wish to have the opportunity to speak and check with a robot and AI system the identical way we check with one another to make things together. Our system is a primary step toward enabling that future,” says lead creator Alex Kyaw, a graduate student within the MIT departments of Electrical Engineering and Computer Science (EECS) and Architecture.
Kyaw is joined on the paper by Richa Gupta, an MIT architecture graduate student; Faez Ahmed, associate professor of mechanical engineering; Lawrence Sass, professor and chair of the Computation Group within the Department of Architecture; senior creator Randall Davis, an EECS professor and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); in addition to others at Google Deepmind and Autodesk Research. The paper was recently presented on the Conference on Neural Information Processing Systems.
Generating a multicomponent design
While generative AI models are good at generating 3D representations, referred to as meshes, from text prompts, most don’t produce uniform representations of an object’s geometry which have the component-level details needed for robotic assembly.
Separating these meshes into components is difficult for a model because assigning components will depend on the geometry and functionality of the item and its parts.
The researchers tackled these challenges using a vision-language model (VLM), a robust generative AI model that has been pre-trained to know images and text. They task the VLM with determining how two varieties of prefabricated parts, structural components and panel components, should fit together to form an object.
“There are lots of ways we are able to put panels on a physical object, however the robot must see the geometry and reason over that geometry to make a choice about it. By serving as each the eyes and brain of the robot, the VLM enables the robot to do that,” Kyaw says.
A user prompts the system with text, perhaps by typing “make me a chair,” and provides it an AI-generated image of a chair to begin.
Then, the VLM reasons in regards to the chair and determines where panel components go on top of structural components, based on the functionality of many example objects it has seen before. As an example, the model can determine that the seat and backrest must have panels to have surfaces for somebody sitting and leaning on the chair.
It outputs this information as text, corresponding to “seat” or “backrest.” Each surface of the chair is then labeled with numbers, and the knowledge is fed back to the VLM.
Then the VLM chooses the labels that correspond to the geometric parts of the chair that ought to receive panels on the 3D mesh to finish the design.
Human-AI co-design
The user stays within the loop throughout this process and may refine the design by giving the model a brand new prompt, corresponding to “only use panels on the backrest, not the seat.”
“The design space may be very big, so we narrow it down through user feedback. We consider that is the most effective technique to do it because people have different preferences, and constructing an idealized model for everybody can be inconceivable,” Kyaw says.
“The human‑in‑the‑loop process allows the users to steer the AI‑generated designs and have a way of ownership within the end result,” adds Gupta.
Once the 3D mesh is finalized, a robotic assembly system builds the item using prefabricated parts. These reusable parts could be disassembled and reassembled into different configurations.
The researchers compared the outcomes of their method with an algorithm that places panels on all horizontal surfaces which are facing up, and an algorithm that places panels randomly. In a user study, greater than 90 percent of people preferred the designs made by their system.
Additionally they asked the VLM to clarify why it selected to place panels in those areas.
“We learned that the vision language model is capable of understand some extent of the functional points of a chair, like leaning and sitting, to know why it’s placing panels on the seat and backrest. It isn’t just randomly spitting out these assignments,” Kyaw says.
In the longer term, the researchers want to reinforce their system to handle more complex and nuanced user prompts, corresponding to a table made out of glass and metal. As well as, they need to include additional prefabricated components, corresponding to gears, hinges, or other moving parts, so objects could have more functionality.
“Our hope is to drastically lower the barrier of access to design tools. Now we have shown that we are able to use generative AI and robotics to show ideas into physical objects in a quick, accessible, and sustainable manner,” says Davis.
