Expanding robot perception

Robots have come a great distance because the Roomba. Today, drones are beginning to deliver door to door, self-driving cars are navigating some roads, robo-dogs are aiding first responders, and still more bots are doing backflips and helping out on the factory floor. Still, Luca Carlone thinks the most effective is yet to return.

Carlone, who recently received tenure as an associate professor in MIT’s Department of Aeronautics and Astronautics (AeroAstro), directs the SPARK Lab, where he and his students are bridging a key gap between humans and robots: perception. The group does theoretical and experimental research, all toward expanding a robot’s awareness of its environment in ways in which approach human perception. And perception, as Carlone often says, is greater than detection.

While robots have grown by leaps and bounds by way of their ability to detect and discover objects of their surroundings, they still have rather a lot to learn in relation to making higher-level sense of their environment. As humans, we perceive objects with an intuitive sense of not only of their shapes and labels but in addition their physics — how they is likely to be manipulated and moved — and the way they relate to one another, their larger environment, and ourselves.

That type of human-level perception is what Carlone and his group are hoping to impart to robots, in ways in which enable them to soundly and seamlessly interact with people of their homes, workplaces, and other unstructured environments.

Since joining the MIT faculty in 2017, Carlone has led his team in developing and applying perception and scene-understanding algorithms for various applications, including autonomous underground search-and-rescue vehicles, drones that may pick up and manipulate objects on the fly, and self-driving cars. They may additionally be useful for domestic robots that follow natural language commands and potentially even anticipate human’s needs based on higher-level contextual clues.

“Perception is a giant bottleneck toward getting robots to assist us in the true world,” Carlone says. “If we will add elements of cognition and reasoning to robot perception, I imagine they will do a whole lot of good.”

Expanding horizons

Carlone was born and raised near Salerno, Italy, near the scenic Amalfi coast, where he was the youngest of three boys. His mother is a retired elementary school teacher who taught math, and his father is a retired history professor and publisher, who has all the time taken an analytical approach to his historical research. The brothers could have unconsciously adopted their parents’ mindsets, as all three went on to be engineers — the older two pursued electronics and mechanical engineering, while Carlone landed on robotics, or mechatronics, because it was known on the time.

He didn’t come around to the sphere, nonetheless, until late in his undergraduate studies. Carlone attended the Polytechnic University of Turin, where he focused initially on theoretical work, specifically on control theory — a field that applies mathematics to develop algorithms that routinely control the behavior of physical systems, comparable to power grids, planes, cars, and robots. Then, in his senior 12 months, Carlone signed up for a course on robotics that explored advances in manipulation and the way robots could be programmed to maneuver and performance.

“It was love at first sight. Using algorithms and math to develop the brain of a robot and make it move and interact with the environment is probably the most fulfilling experiences,” Carlone says. “I immediately decided that is what I would like to do in life.”

He went on to a dual-degree program on the Polytechnic University of Turin and the Polytechnic University of Milan, where he received master’s degrees in mechatronics and automation engineering, respectively. As a part of this program, called the Alta Scuola Politecnica, Carlone also took courses in management, through which he and students from various academic backgrounds needed to team as much as conceptualize, construct, and draw up a marketing pitch for a brand new product design. Carlone’s team developed a touch-free table lamp designed to follow a user’s hand-driven commands. The project pushed him to take into consideration engineering from different perspectives.

“It was like having to talk different languages,” he says. “It was an early exposure to the necessity to look beyond the engineering bubble and take into consideration easy methods to create technical work that may impact the true world.”

The following generation

Carlone stayed in Turin to finish his PhD in mechatronics. During that point, he was given freedom to decide on a thesis topic, which he went about, as he recalls, “a bit naively.”

“I used to be exploring a subject that the community considered to be well-understood, and for which many researchers believed there was nothing more to say.” Carlone says. “I underestimated how established the subject was, and thought I could still contribute something recent to it, and I used to be lucky enough to only try this.”

The subject in query was “simultaneous localization and mapping,” or SLAM — the issue of generating and updating a map of a robot’s environment while concurrently keeping track of where the robot is inside that environment. Carlone got here up with a option to reframe the issue, such that algorithms could generate more precise maps without having to begin with an initial guess, as most SLAM methods did on the time. His work helped to crack open a field where most roboticists thought one couldn’t do higher than the present algorithms.

“SLAM is about determining the geometry of things and the way a robot moves amongst those things,” Carlone says. “Now I’m a part of a community asking, what’s the following generation of SLAM?”

In quest of a solution, he accepted a postdoc position at Georgia Tech, where he dove into coding and computer vision — a field that, on reflection, could have been inspired by a brush with blindness: As he was ending up his PhD in Italy, he suffered a medical complication that severely affected his vision.

“For one 12 months, I could have easily lost an eye fixed,” Carlone says. “That was something that got me excited about the importance of vision, and artificial vision.”

He was capable of receive good medical care, and the condition resolved entirely, such that he could proceed his work. At Georgia Tech, his advisor, Frank Dellaert, showed him ways to code in computer vision and formulate elegant mathematical representations of complex, three-dimensional problems. His advisor was also considered one of the primary to develop an open-source SLAM library, called GTSAM, which Carlone quickly recognized to be a useful resource. More broadly, he saw that making software available to all unlocked an enormous potential for progress in robotics as a complete.

“Historically, progress in SLAM has been very slow, because people kept their codes proprietary, and every group needed to essentially start from scratch,” Carlone says. “Then open-source pipelines began popping up, and that was a game changer, which has largely driven the progress we have now seen over the past 10 years.”

Spatial AI

Following Georgia Tech, Carlone got here to MIT in 2015 as a postdoc within the Laboratory for Information and Decision Systems (LIDS). During that point, he collaborated with Sertac Karaman, professor of aeronautics and astronautics, in developing software to assist palm-sized drones navigate their surroundings using little or no on-board power. A 12 months later, he was promoted to research scientist, after which in 2017, Carlone accepted a college position in AeroAstro.

“One thing I fell in love with at MIT was that every one decisions are driven by questions like: What are our values? What’s our mission? It’s never about low-level gains. The motivation is actually about easy methods to improve society,” Carlone says. “As a mindset, that has been very refreshing.”

Today, Carlone’s group is developing ways to represent a robot’s surroundings, beyond characterizing their geometric shape and semantics. He’s utilizing deep learning and huge language models to develop algorithms that enable robots to perceive their environment through a higher-level lens, so to talk. During the last six years, his lab has released greater than 60 open-source repositories, that are utilized by hundreds of researchers and practitioners worldwide. The majority of his work suits right into a larger, emerging field generally known as “spatial AI.”

“Spatial AI is like SLAM on steroids,” Carlone says. “In a nutshell, it has to do with enabling robots to think and understand the world as humans do, in ways in which could be useful.”

It’s an enormous undertaking that would have wide-ranging impacts, by way of enabling more intuitive, interactive robots to assist out at home, within the workplace, on the roads, and in distant and potentially dangerous areas. Carlone says there will likely be loads of work ahead, with a view to come near how humans perceive the world.

“I even have 2-year-old twin daughters, and I see them manipulating objects, carrying 10 different toys at a time, navigating across cluttered rooms with ease, and quickly adapting to recent environments. Robot perception cannot yet match what a toddler can do,” Carlone says. “But we have now recent tools within the arsenal. And the long run is shiny.”

Expanding robot perception

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

The Next Generation of HumanEval

The Rundown’s 2025 12 months in review

How Prezi is leveraging the Hub and the Expert Support Program to speed up their ML roadmap

A Look Back and Forward

High quality-tuning Florence-2 – Microsoft’s Cutting-edge Vision Language Models

Expanding robot perception

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.