Creating bespoke programming languages for efficient visual AI systems

-

A single photograph offers glimpses into the creator’s world — their interests and feelings a couple of subject or space. But what about creators behind the technologies that help to make those images possible? 

MIT Department of Electrical Engineering and Computer Science Associate Professor Jonathan Ragan-Kelley is one such person, who has designed every thing from tools for visual effects in movies to the Halide programming language that’s widely utilized in industry for photo editing and processing. As a researcher with the MIT-IBM Watson AI Lab and the Computer Science and Artificial Intelligence Laboratory, Ragan-Kelley focuses on high-performance, domain-specific programming languages and machine learning that enable 2D and 3D graphics, visual effects, and computational photography.

“The only biggest thrust through loads of our research is developing latest programming languages that make it easier to write down programs that run really efficiently on the increasingly complex hardware that’s in your computer today,” says Ragan-Kelley. “If we would like to maintain increasing the computational power we are able to actually exploit for real applications — from graphics and visual computing to AI — we want to vary how we program.”

Finding a middle ground

Over the past twenty years, chip designers and programming engineers have witnessed a slowing of Moore’s law and a marked shift from general-purpose computing on CPUs to more varied and specialized computing and processing units like GPUs and accelerators. With this transition comes a trade-off: the power to run general-purpose code somewhat slowly on CPUs, for faster, more efficient hardware that requires code to be heavily adapted to it and mapped to it with tailored programs and compilers. Newer hardware with improved programming can higher support applications like high-bandwidth cellular radio interfaces, decoding highly compressed videos for streaming, and graphics and video processing on power-constrained cellphone cameras, to call just a few applications.

“Our work is essentially about unlocking the facility of the most effective hardware we are able to construct to deliver as much computational performance and efficiency as possible for these sorts of applications in ways in which that traditional programming languages don’t.”

To perform this, Ragan-Kelley breaks his work down into two directions. First, he sacrifices generality to capture the structure of particular and necessary computational problems and exploits that for higher computing efficiency. This may be seen within the image-processing language Halide, which he co-developed and has helped to rework the image editing industry in programs like Photoshop. Further, since it is specially designed to quickly handle dense, regular arrays of numbers (tensors), it also works well for neural network computations. The second focus targets automation, specifically how compilers map programs to hardware. One such project with the MIT-IBM Watson AI Lab leverages Exo, a language developed in Ragan-Kelley’s group.

Through the years, researchers have worked doggedly to automate coding with compilers, which could be a black box; nevertheless, there’s still a big need for explicit control and tuning by performance engineers. Ragan-Kelley and his group are developing methods that straddle each technique, balancing trade-offs to realize effective and resource-efficient programming. On the core of many high-performance programs like video game engines or cellphone camera processing are state-of-the-art systems which can be largely hand-optimized by human experts in low-level, detailed languages like C, C++, and assembly. Here, engineers make specific decisions about how this system will run on the hardware.

Ragan-Kelley notes that programmers can go for “very painstaking, very unproductive, and really unsafe low-level code,” which could introduce bugs, or “more secure, more productive, higher-level programming interfaces,” that lack the power to make advantageous adjustments in a compiler about how this system is run, and frequently deliver lower performance. So, his team is trying to search out a middle ground. “We’re attempting to determine the way to provide control for the important thing issues that human performance engineers need to have the ability to regulate,” says Ragan-Kelley, “so, we’re attempting to construct a brand new class of languages that we call user-schedulable languages that give safer and higher-level handles to regulate what the compiler does or control how this system is optimized.”

Unlocking hardware: high-level and underserved ways

Ragan-Kelley and his research group are tackling this through two lines of labor: applying machine learning and modern AI techniques to routinely generate optimized schedules, an interface to the compiler, to realize higher compiler performance. One other uses “exocompilation” that he’s working on with the lab. He describes this method as a option to “turn the compiler inside-out,” with a skeleton of a compiler with controls for human guidance and customization. As well as, his team can add their bespoke schedulers on top, which might help goal specialized hardware like machine-learning accelerators from IBM Research. Applications for this work span the gamut: computer vision, object recognition, speech synthesis, image synthesis, speech recognition, text generation (large language models), etc.

A giant-picture project of his with the lab takes this one other step further, approaching the work through a systems lens. In work led by his advisee and lab intern William Brandon, in collaboration with lab research scientist Rameswar Panda, Ragan-Kelley’s team is rethinking large language models (LLMs), finding ways to vary the computation and the model’s programming architecture barely in order that the transformer-based models can run more efficiently on AI hardware without sacrificing accuracy. Their work, Ragan-Kelley says, deviates from the usual ways of considering in significant ways with potentially large payoffs for cutting costs, improving capabilities, and/or shrinking the LLM to require less memory and run on smaller computers.

It’s this more avant-garde considering, in terms of computation efficiency and hardware, that Ragan-Kelley excels at and sees value in, especially in the long run. “I feel there are areas [of research] that have to be pursued, but are well-established, or obvious, or are conventional-wisdom enough that a number of people either are already or will pursue them,” he says. “We try to search out the ideas which have each large leverage to practically impact the world, and at the identical time, are things that would not necessarily occur, or I feel are being underserved relative to their potential by the remainder of the community.”

The course that he now teaches, 6.106 (Software Performance Engineering), exemplifies this. About 15 years ago, there was a shift from single to multiple processors in a tool that caused many academic programs to start teaching parallelism. But, as Ragan-Kelley explains, MIT realized the importance of scholars understanding not only parallelism but in addition optimizing memory and using specialized hardware to realize the most effective performance possible.

“By changing how we program, we are able to unlock the computational potential of recent machines, and make it possible for people to proceed to rapidly develop latest applications and latest ideas which can be able to take advantage of that ever-more complicated and difficult hardware.”

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x