Moshe Tanach, CEO and Co-Founder at NeuReality – Interview Series

-

Moshe Tanach is the CEO & co-founder of NeuReality. Before founding NeuReality, Moshe served as Director of Engineering at Marvell and Intel, where he led the event of complex wireless and networking products to mass production. He also served as AVP of R&D at DesignArt Networks (later acquired by Qualcomm), where he contributed to the event of 4G base station products.

NeuReality’s mission is to simplify AI adoption. By taking a system-level approach to AI, NeuReality’s team of industry experts delivers AI inference holistically, identifying pain points and providing purpose-built, silicon-to-software AI inference solutions that make AI each inexpensive and accessible.

Along with your extensive experience leading engineering projects at Marvell, Intel, and DesignArt-Networks, what inspired you to co-found NeuReality, and the way did your previous roles influence the vision and direction of the corporate?

NeuReality was built from inception to unravel for the long run cost, complexity and climate problems that will be inevitable AI inferencing – which is the deployment of trained AI models and software into production-level AI data centers. Where AI training is how AI is created; AI inference is the way it is used and the way it interacts with billions of individuals and devices around the globe.

We’re a team of systems engineers, so we have a look at all angles, all of the multiple facets of end-to-end AI inferencing including GPUs and all classes of purpose-built AI accelerators. It became clear to us going back to 2015 that CPU-reliant AI chips and systems – which is every GPU, TPU, LPU, NRU, ASIC and FPGA on the market – would hit a major wall by 2020. Its system limitations where the AI accelerator has develop into higher and faster when it comes to raw performance, however the underlying infrastructure didn’t sustain.

Because of this, we decided to interrupt away from the massive giants riddled with bureaucracy that protect successful businesses, like CPU and NIC manufacturers, and disrupt the industry with a greater AI architecture that’s open, agnostic, and purpose-built for AI inference. Considered one of the conclusions of reimagining ideal AI inference is that in boosting GPU utilization and system-level efficiency, our latest AI compute and network infrastructure – powered by our novel NR1 server-on-chip that replaces the host CPU and NICs. As an ingredient brand and companion to any GPU or AI accelerator, we will remove market barriers that deter 65% of organizations from innovating and adopting AI today – underutilized GPUs which ends up in buying greater than what’s really needed (because they run idle > 50% of the time) – all of the while reducing energy consumption, AI data center real-estate challenge, and operational costs.

This can be a once in a lifetime opportunity to essentially transform AI system architecture for the higher based on every thing I learned and practiced for 30 years, opening the doors for brand spanking new AI innovators across industries and removing CPU bottlenecks, complexity, and carbon footprints.

NeuReality’s mission is to democratize AI. Are you able to elaborate on what “AI for All” means to you and the way NeuReality plans to attain this vision?

Our mission is to democratize AI by making it more accessible and inexpensive to all organizations big and small – by unleashing the utmost capability of any GPU or any AI accelerator so that you get more out of your investment; in other words, get MORE from the GPUs you purchase, slightly than buying more GPUs that run idle >50% of the time. We are able to boost AI accelerators as much as 100% full capability, while delivering as much as 15X energy-efficiency and slashing system costs by as much as 90%. These are order of magnitude improvements. We plan to attain this vision with our NR1 AI Inference Solution, the world’s first data center system architecture tailored for the AI age. It runs high-volume, high-variety AI data pipelines affordably and efficiently with the additional benefit of a reduced carbon footprint.

Achieving AI for all also means making it easy to make use of. At NeuReality, we simplify AI infrastructure deployment, management, and scalability, enhance business processes and profitability, and advance sectors similar to public health, safety, law enforcement and customer support. Our impact spans sectors similar to medical imaging, clinical trials, fraud detection, AI content creation and plenty of more.

Currently, our first commercially available NR1-S AI Inference Appliances can be found with Qualcomm Cloud AI 100 Ultra accelerators and thru Cirrascale, a cloud service provider.

The NR1 AI Inference Solution is touted as the primary data center system architecture tailored for the AI age, and purpose-built for AI inference. What were the important thing innovations and breakthroughs that led to the event of the NR1?

NR1™ is the name of your complete silicon-to-software system architecture we’ve designed and delivered to the AI industry – as an open, fully compatible AI compute and networking infrastructure that fully complements any AI accelerator and GPUs. If I had to interrupt it all the way down to the top-most unique and exciting innovations that led to this end-to-end NR1 Solution and differentiates us, I’d say:

  • Optimized AI Compute Graphs: The team designed a Programmable Graph Execution Accelerator to optimize the processing of Compute Graphs, that are crucial for AI and various other workloads like media processing, databases, and more. Compute Graphs represent a series of operations with dependencies, and this broader applicability positions NR1 as potentially disruptive beyond just super boosting GPUs and other AI accelerators. It simplifies AI model deployment by generating optimized Compute Graphs (CGs) based on pre-processed AI data and software APIs, resulting in significant performance gains.
  • NR1 NAPU™ (Network Addressable Processing Unit): Our AI inference architecture is powered by the NR1 NAPU™ – a 7nm server-on-chip that permits direct network access for AI pre- and post-processing. We pack 6.5x more punch on a smaller NR1 chip than a typical general-purpose, host CPU. Traditionally, pre-processing tasks (like data cleansing, formatting, and have extraction) and post-processing tasks (like result interpretation and formatting) are handled by the CPU. By offloading these tasks to the NR1 NAPU™, we displace each the CPUs and NIC. This reduces bottlenecks allowing for faster overall processing, lightning-fast response times and lower cost per AI query. This reduces bottlenecks and allows for faster overall processing.
  • NR1™ AI-Hypervisor™ technology: The NR1’s patented hardware-based AI-Hypervisor™ optimizes AI task orchestration and resource utilization, improving efficiency and reducing bottlenecks.
  • NR1™ AI-over-Fabric™ Network Engine: The NR1 incorporates a singular AI-over-Fabric™ network engine that ensures seamless network connectivity and efficient scaling of AI resources across multiple NR1 chips – that are coupled with any GPU or AI Accelerator – throughout the same inference server or NR1-S AI inference appliance.

NeuReality’s recent performance data highlights significant cost and energy savings. Could you provide more details on how the NR1 achieves as much as 90% cost savings and 15x higher energy efficiency in comparison with traditional systems?

NeuReality’s NR1 slashes the fee and energy consumption of AI inference by as much as 90% and 15x, respectively. That is achieved through:

  • Specialized Silicon: Our purpose-built AI inference infrastructure is powered by the NR1 NAPU™ server-on-chip, which absorbs the functionality of the CPU and NIC into one – and eliminates the necessity for CPUs in inference. Ultimately the NR1 maximizes the output of any AI accelerator or GPU in essentially the most efficient way possible.
  • Optimized Architecture: By streamlining AI data flow and incorporating AI pre- and post-processing directly throughout the NR1 NAPU™, we offload and replace the CPU. This ends in reduced latency, linear scalability, and lower cost per AI query.
  • Flexible Deployment: You’ll be able to buy the NR1 in two primary ways: 1) contained in the NR1-M™ Module which is a PCIe card that houses multiple NR1 NAPUs (typically 10) designed to pair together with your existing AI accelerator cards. 2) contained in the NR1-S™ Appliance, which pairs NR1 NAPUs with an equal variety of AI accelerators (GPU, ASIC, FPGA, etc.) as a ready-to-go AI Inference system.

At Supercomputing 2024 in November, you will note us display an NR1-S Appliance with 4x NR1 chips per 16x Qualcomm Cloud AI 100 Ultra accelerators. We’ve tested the identical with Nvidia AI inference chips. NeuReality is revolutionizing AI inference with its open, purpose-built architecture.

 How does the NR1-S AI Inference Appliance match up with Qualcomm® Cloud AI 100 accelerators compare against traditional CPU-centric inference servers with Nvidia® H100 or L40S GPUs in real-world applications?

NR1, combined with Qualcomm Cloud AI 100 or NVIDIA H100 or L40S GPUs, delivers a considerable performance boost over traditional CPU-centric inference servers in real-world AI applications across large language models like Llama 3, computer vision, natural language processing and speech recognition. In other words, running your AI inference system with NR1 optimizes the performance, system cost, energy efficiency and response times across images, sound, language, and text – each individually (single modality) or together (multi-modality).

The top-result? When paired with NR1, a customer gets MORE from the expensive GPU investments they make, slightly than BUYING more GPUs to attain desired performance.

Beyond maximizing GPU utilization, the NR1 delivers exceptional efficiency, leading to 50-90% higher price/performance and as much as 13-15x greater energy efficiency. This translates to significant cost savings and a reduced environmental footprint to your AI infrastructure.

The NR1-S demonstrates linear scalability with no performance drop-offs. Are you able to explain the technical features that allow such seamless scalability?

The NR1-S Appliance, coupling our NR1 chips with AI accelerators of any type or quantity, redefines AI infrastructure. We have moved beyond CPU-centric limitations to attain a brand new level of performance and efficiency.

As a substitute of the normal NIC-to-CPU-to-accelerator bottleneck, the NR1-S integrates direct network access, AI pre-processing, and post-processing inside our Network Addressable Processing Units (NAPUs). With typically 10 NAPUs per system, each handling tasks like vision, audio, and DSP processing, and our AI-Hypervisor™ orchestrating workloads, streamlined AI data flow is achieved. This translates to linear scalability: add more accelerators, get proportionally more performance.

The result? 100% utilization of AI accelerators is consistently observed. While overall cost and energy efficiency vary depending on the precise AI chips used, maximized hardware investment, and improved performance are consistently delivered. As AI inference needs scale, the NR1-S provides a compelling alternative to traditional architectures.

NeuReality goals to deal with the barriers to widespread AI adoption. What are essentially the most significant challenges businesses face when adopting AI, and the way does your technology help overcome these?

When poorly implemented, AI software and solutions can develop into troublesome. Many businesses cannot adopt AI on account of the fee and complexity of constructing and scaling AI systems. Today’s AI solutions will not be optimized for inference, with training pods typically having poor efficiency and inference servers having high bottlenecks. To tackle this challenge and make AI more accessible, we’ve developed the primary complete AI inference solution – a compute and networking infrastructure powered by our NAPU – which makes essentially the most of its companion AI accelerator and reduces market barriers around excessive cost and energy consumption.

Our system-level approach to AI inference – versus attempting to develop a greater GPU or AI accelerator where there may be already plenty of innovation and competition – means we’re filling a major industry gap for dozens of AI inference chip and system innovators. Our team attacked the shortcomings in AI Inference systemically and holistically, by determining pain points, architecture gaps and AI workload projections — to deliver the primary purpose-built, silicon-to-software, CPU-free AI inference architecture. And by developing a top-to-bottom AI software stack with open standards from Python and Kubernetes combined with NeuReality Toolchain, Provisioning, and Inference APIs, our integrated set of software tools combines all components right into a single high-quality UI/UX.

In a competitive AI market, what sets NeuReality aside from other AI inference solution providers?

To place it simply, we’re open and accelerator-agnostic. Our NR1 inference infrastructure supercharges AI accelerator – GPU, TPU, LPU, ASIC, you name it – creating a really optimized end-to-end system. AI accelerators were initially brought in to CPUs handle the demands of neural networks and machine learning at large, but now the AI accelerators have develop into so powerful, they’re now held back by the very CPUs they were meant to help.

Our solution? The NR1. It’s a whole, reimagined AI inference architecture. Our secret weapon? The NR1 NAPU™ was designed as a co-ingredient to maximise AI accelerator performance without guzzling extra power or breaking the bank. We have built an open ecosystem, seamlessly integrating with any AI inference chip and popular software frameworks like Kubernetes, Python, TensorFlow, and more.

NeuReality’s open approach means we’re not competing with the AI landscape; we’re here to enhance it through strategic partnerships and technology collaboration. We offer the missing piece of the puzzle: a purpose-built, inference architecture that not only unlocks AI accelerators to benchmark performance, but in addition makes it easier for businesses and governments to adopt AI. Imagine unleashing the complete power of NVIDIA H100s, Google TPUs, or AMD MI300s – giving them the infrastructure they deserve.

NeuReality’s open, efficient architecture levels the playing field, making AI more accessible and inexpensive for everybody. I’m enthusiastic about seeing different industries – fintech, biotech, healthtech – experience the NR1 advantage firsthand. Compare your AI solutions on traditional CPU-bound systems versus the trendy NR1 infrastructure and witness the difference. Today, only 35% of companies and governments have adopted AI and that is predicated on incredibly low qualifying criteria. Let’s make it possible for over 50% of enterprise customers to adopt AI by this time next 12 months without harming the planet or breaking the bank.

Looking ahead, what’s NeuReality’s long-term vision for the role of AI in society, and the way do you see your organization contributing to this future?

I envision a future where AI advantages everyone, fostering innovation and improving lives. We’re not only constructing technology; we’re constructing the inspiration for a greater future.

Our NR1 is essential to that vision. It’s a whole AI inference solution that starts to shatter the fee and complexity barriers hindering mass AI business adoption. We have reimagined each the infrastructure the architecture, delivering a revolutionary system that maximizes the output of any GPU, any AI accelerator, without increasing operational costs or energy consumption.

The business model really matters to scale and provides end-customers real decisions over concentrated AI autocracy as I’ve written on before. So as a substitute, we’re constructing an open ecosystem where our silicon works other silicon, not against it. That’s why we designed NR1 to integrate seamlessly with all AI accelerators and with open models and software, making it as easy as possible to put in, manage and scale.

But we’re not stopping there. We’re collaborating with partners to validate our technology across various AI workloads and deliver “inference-as-a-service” and “LLM-as-a-service” through cloud service providers, hyper scalers, and directly with companion chip makers. We have the desire to make advanced AI accessible and inexpensive to all.

Imagine the chances if we could boost AI inference performance, energy efficiency, and affordability by double-digit percentages. Imagine a sturdy, AI-enabled society with more voices and decisions becoming a reality. So, we must all do the demanding work of proving business impact and ROI when AI is implemented in day by day data center operations. Let’s concentrate on revolutionary AI implementation, not only AI model capability.

That is how we contribute to a future where AI advantages everyone – a win for profit margins, people, and the planet.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x