Sarah Alnegheimish’s research interests reside on the intersection of machine learning and systems engineering. Her objective: to make machine learning systems more accessible, transparent, and trustworthy.
Alnegheimish is a PhD student in Principal Research Scientist Kalyan Veeramachaneni’s Data-to-AI group in MIT’s Laboratory for Information and Decision Systems (LIDS). Here, she commits most of her energy to developing Orion, an open-source, user-friendly machine learning framework and time series library that’s able to detecting anomalies without supervision in large-scale industrial and operational settings.
Early influence
The daughter of a university professor and a teacher educator, she learned from an early age that knowledge was meant to be shared freely. “I believe growing up in a house where education was highly valued is an element of why I need to make machine learning tools accessible.” Alnegheimish’s own personal experience with open-source resources only increased her motivation. “I learned to view accessibility as the important thing to adoption. To strive for impact, recent technology must be accessed and assessed by those that need it. That’s the entire purpose of doing open-source development.”
Alnegheimish earned her bachelor’s degree at King Saud University (KSU). “I used to be in the primary cohort of computer science majors. Before this program was created, the one other available major in computing was IT [information technology].” Being an element of the primary cohort was exciting, but it surely brought its own unique challenges. “All of the school were teaching recent material. Succeeding required an independent learning experience. That’s once I first time got here across MIT OpenCourseWare: as a resource to show myself.”
Shortly after graduating, Alnegheimish became a researcher on the King Abdulaziz City for Science and Technology (KACST), Saudi Arabia’s national lab. Through the Center for Complex Engineering Systems (CCES) at KACST and MIT, she began conducting research with Veeramachaneni. When she applied to MIT for graduate school, his research group was her top selection.
Creating Orion
Alnegheimish’s master thesis focused on time series anomaly detection — the identification of unexpected behaviors or patterns in data, which might provide users crucial information. For instance, unusual patterns in network traffic data could be a sign of cybersecurity threats, abnormal sensor readings in heavy machinery can predict potential future failures, and monitoring patient vital signs will help reduce health complications. It was through her master’s research that Alnegheimish first began designing Orion.
Orion uses statistical and machine learning-based models which are repeatedly logged and maintained. Users don’t must be machine learning experts to utilize the code. They’ll analyze signals, compare anomaly detection methods, and investigate anomalies in an end-to-end program. The framework, code, and datasets are all open-sourced.
“With open source, accessibility and transparency are directly achieved. You may have unrestricted access to the code, where you may investigate how the model works through understanding the code. We have now increased transparency with Orion: We label every step within the model and present it to the user.” Alnegheimish says that this transparency helps enable users to start trusting the model before they ultimately see for themselves how reliable it’s.
“We’re attempting to take all these machine learning algorithms and put them in a single place so anyone can use our models off-the-shelf,” she says. “It’s not only for the sponsors that we work with at MIT. It’s getting used by a whole lot of public users. They arrive to the library, install it, and run it on their data. It’s proving itself to be an amazing source for people to search out a number of the latest methods for anomaly detection.”
Repurposing models for anomaly detection
In her PhD, Alnegheimish is further exploring progressive ways to do anomaly detection using Orion. “After I first began my research, all machine-learning models needed to be trained from scratch in your data. Now we’re in a time where we are able to use pre-trained models,” she says. Working with pre-trained models saves time and computational costs. The challenge, though, is that point series anomaly detection is a brand-new task for them. “Of their original sense, these models have been trained to forecast, but not to search out anomalies,” Alnegheimish says. “We’re pushing their boundaries through prompt-engineering, with none additional training.”
Because these models already capture the patterns of time-series data, Alnegheimish believes they have already got every little thing they should enable them to detect anomalies. Thus far, her current results support this theory. They don’t surpass the success rate of models which are independently trained on specific data, but she believes they’ll sooner or later.
Accessible design
Alnegheimish talks at length in regards to the efforts she’s undergone to make Orion more accessible. “Before I got here to MIT, I used to think that the crucial a part of research was to develop the machine learning model itself or improve on its current state. With time, I spotted that the one way you may make your research accessible and adaptable for others is to develop systems that make them accessible. During my graduate studies, I’ve taken the approach of developing my models and systems in tandem.”
The important thing element to her system development was finding the correct abstractions to work together with her models. These abstractions provide universal representation for all models with simplified components. “Any model may have a sequence of steps to go from raw input to desired output. We’ve standardized the input and output, which allows the center to be flexible and fluid. Thus far, all of the models we’ve run have been in a position to retrofit into our abstractions.” The abstractions she uses have been stable and reliable for the last six years.
The worth of concurrently constructing systems and models might be seen in Alnegheimish’s work as a mentor. She had the chance to work with two master’s students earning their engineering degrees. “All I showed them was the system itself and the documentation of find out how to use it. Each students were in a position to develop their very own models with the abstractions we’re conforming to. It reaffirmed that we’re taking the correct path.”
Alnegheimish also investigated whether a big language model (LLM) may very well be used as a mediator between users and a system. The LLM agent she has implemented is ready to hook up with Orion without users needing to know the small details of how Orion works. “Consider ChatGPT. You may have no idea what the model is behind it, but it surely’s very accessible to everyone.” For her software, users only know two commands: Fit and Detect. Fit allows users to coach their model, while Detect enables them to detect anomalies.
“The last word goal of what I’ve tried to do is make AI more accessible to everyone,” she says. Thus far, Orion has reached over 120,000 downloads, and over a thousand users have marked the repository as certainly one of their favorites on Github. “Traditionally, you used to measure the impact of research through citations and paper publications. Now you get real-time adoption through open source.”