Jason Knight is Co-founder and VP of ML at OctoAI – Interview Series

-

Jason Knight is Co-founder and Vice President of Machine Learning at OctoAI, the platform delivers an entire stack for app builders to run, tune, and scale their AI applications within the cloud or on-premises.

OctoAI was spun out of the University of Washington by the unique creators of Apache TVM, an open source stack for ML portability and performance. TVM enables ML models to run efficiently on any hardware backend, and has quickly turn into a key a part of the architecture of popular consumer devices like Amazon Alexa.

Are you able to share the inspiration behind founding OctoAI and the core problem you aimed to resolve?

AI has traditionally been a fancy field accessible only to those comfortable with the mathematics and high-performance computing required to make something with it. But AI unlocks the final word computing interfaces, that of text, voice, and imagery programmed by examples and feedback, and brings the complete power of computing to everyone on Earth. Before AI, only programmers were in a position to get computers to do what they wanted by writing arcane programming language texts.

OctoAI was created to speed up our path to that reality in order that more people can use and profit from AI. And folks, in turn, can use AI to create yet more advantages by accelerating the sciences, medicine, art, and more.

Reflecting in your experience at Intel, how did your previous roles prepare you for co-founding and leading the event at OctoAI?

Intel and the AI hardware and biotech startups before it gave me the attitude to see how hard AI is for even essentially the most sophisticated of technology firms, and yet how useful it could be to those that have found out learn how to use it. And seeing that the gap between those benefiting from AI in comparison with those that aren’t yet is primarily considered one of infrastructure, compute, and best practices—not magic.

What differentiates OctoStack from other AI deployment solutions available out there today?

OctoStack is the industry’s first complete technology stack designed specifically for serving generative AI models anywhere. It offers a turnkey production platform that gives highly optimized inference, model customization, and asset management at an enterprise scale.

OctoStack allows organizations to attain AI autonomy by running any model of their preferred environment with full control over data, models, and hardware. It also delivers unmatched performance and value efficiency, with savings of as much as 12X in comparison with other solutions like GPT-4.

Are you able to explain the benefits of deploying AI models in a personal environment using OctoStack?

Models today are ubiquitous, but assembling the proper infrastructure to run those models and apply them along with your own data is where the business-value flywheel truly starts to spin. Using these models in your most sensitive data, after which turning that into insights, higher prompt engineering, RAG pipelines, and fine-tuning is where you’ll be able to get essentially the most value out of generative AI. However it’s still difficult for all but essentially the most sophisticated firms to do that alone, which is where a turnkey solution like OctoStack can speed up you and convey the perfect practices together in a single place on your practitioners.

Deploying AI models in a personal environment using OctoStack offers several benefits, including enhanced security and control over data and models. Customers can run generative AI applications inside their very own VPCs or on-premises, ensuring that their data stays secure and inside their chosen environments. This approach also provides businesses with the flexibleness to run any model, be it open-source, custom, or proprietary, while benefiting from cost reductions and performance improvements.

What challenges did you face in optimizing OctoStack to support a big selection of hardware, and the way were these challenges overcome?

Optimizing OctoStack to support a big selection of hardware involved ensuring compatibility and performance across various devices, reminiscent of NVIDIA and AMD GPUs and AWS Inferentia. OctoAI overcame these challenges by leveraging its deep AI systems expertise, developed through years of research and development, to create a platform that repeatedly updates and supports additional hardware types, GenAI use cases, and best practices. This enables OctoAI to deliver market-leading performance and value efficiency.

Moreover, getting the newest capabilities in generative AI, reminiscent of multi-modality, function calling, strict JSON schema following, efficient fine-tune hosting, and more into the hands of your internal developers will speed up your AI takeoff point.

OctoAI has a wealthy history of leveraging Apache TVM. How has this framework influenced your platform’s capabilities?

We created Apache TVM to make it easy for classy developers to jot down efficient AI libraries for GPUs and accelerators more easily. We did this because getting essentially the most performance from GPU and accelerator hardware was critical for AI inference then because it is now.

We’ve since leveraged that very same mindset and expertise for your complete Gen AI serving stack to deliver automation for a broader set of developers.

Are you able to discuss any significant performance improvements that OctoStack offers, reminiscent of the 10x performance boost in large-scale deployments?

OctoStack offers significant performance improvements, including as much as 12X savings in comparison with other models like GPT-4 without sacrificing speed or quality. It also provides 4X higher GPU utilization and a 50 percent reduction in operational costs, enabling organizations to run large-scale deployments efficiently and cost-effectively.

Are you able to share some notable use cases where OctoStack has significantly improved AI deployment on your clients?

A notable use case is Apate.ai, a worldwide service combating telephone scams using generative conversational AI. Apate.ai leveraged OctoStack to efficiently run their suite of language models across multiple geographies, benefiting from OctoStack’s flexibility, scale, and security. This deployment allowed Apate.ai to deliver custom models supporting multiple languages and regional dialects, meeting their performance and security-sensitive requirements.

As well as, we serve a whole lot of fine-tunes for our customer OpenPipe. Were they to spin up dedicated instances for every of those, their customers’ use cases could be infeasible as they grow and evolve their use cases and repeatedly re-train their parameter-efficient fine-tunes for optimum output quality at cost-effective prices.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x