Layers of the AI Stack, Explained Simply

-

of Contents


Introduction

The AI space is an enormous and sophisticated landscape. Matt Turck famously does his Machine Learning, AI, and Data (MAD) landscape yearly, and it all the time seems to get crazier and crazier. Try the latest one made for 2024.

Overwhelming, to say the least. 

Nonetheless, we are able to use abstractions to assist us make sense of this crazy landscape of ours. The first one I might be discussing and breaking down in this text is the concept of an AI stack. A stack is just a mixture of technologies which can be used to construct applications. Those of you accustomed to web development likely know of the LAMP stack: Linux, Apache, MySQL, PHP. That is the stack that powers WordPress. Using a catchy acronym like LAMP is a superb solution to help us humans grapple with the complexity of the net application landscape. Those of you in the information field likely have heard of the Modern Data Stack: typically dbt, Snowflake, Fivetran, and Looker (or the Post-Modern Data Stack. IYKYK). 

The AI stack is analogous, but in this text we are going to stay a bit more conceptual. I’m not going to specify specific technologies you need to be using at each layer of the stack, but as a substitute will simply name the layers, and let you choose where you slot in, in addition to what tech you’ll use to realize success in that layer. 

There are many ways to describe the AI stack. I prefer simplicity; so here is the AI stack in 4 layers, organized from furthest from the tip user (bottom) to closest (top):

  • Infrastructure Layer (Bottom): The raw physical hardware obligatory to coach and do inference with AI. Think GPUs, TPUs, cloud services (AWS/Azure/GCP).
  • Data Layer (Bottom): The information needed to coach machine learning models, in addition to the databases needed to store all of that data. Think ImageNet, TensorFlow Datasets, Postgres, MongoDB, Pinecone, etc. 
  • Model and Orchestration Layer (Middle): This refers back to the actual large language, vision, and reasoning models themselves. Think GPT, Claude, Gemini, or any machine learning model. This also includes the tools developers use to construct, deploy, and observe models. Think PyTorch/TensorFlow, Weights & Biases, and LangChain.
  • Application Layer (Top): The AI-powered applications which can be utilized by customers. Think ChatGPT, GitHub copilot, Notion, Grammarly.
Layers within the AI stack. Image by creator.

Many firms dip their toes in several layers. For instance, OpenAI has each trained GPT-4o and created the ChatGPT web application. For help with the infrastructure layer they’ve partnered with Microsoft to make use of their Azure cloud for on-demand GPUs. As for the information layer, they built web scrapers to assist pull in tons of natural language data to feed to their models during training, not without controversy.

The Virtues of the Application Layer

I agree very much with Andrew Ng and many others within the space who say that the applying layer of AI is the place to be

Why is that this? Let’s start with the infrastructure layer. This layer is prohibitively expensive to interrupt into unless you’ve gotten a whole bunch of hundreds of thousands of dollars of VC money to burn. The technical complexity of attempting to create your individual cloud service or craft a brand new kind of GPU may be very high. There may be a reason why tech behemoths like Amazon, Google, Nvidia, and Microsoft dominate this layer. Ditto on the muse model layer. Corporations like OpenAI and Anthropic have armies of PhDs to innovate here. As well as, that they had to partner with the tech giants to fund model training and hosting. Each of those layers are also rapidly becoming commoditized. Because of this one cloud service/model kind of performs like one other. They’re interchangeable and might be easily replaced. They mostly compete on price, convenience, and brand name.

The information layer is interesting. The appearance of generative AI has led to a quite a number of firms staking their claim as the most well-liked vector database, including Pinecone, Weaviate, and Chroma. Nonetheless, the client base at this layer is way smaller than at the applying layer (there are far less developers than there are individuals who will use AI applications like ChatGPT). This area can be quickly grow to be commoditized. Swapping Pinecone for Weaviate will not be a difficult thing to do, and if for instance Weaviate dropped their hosting prices significantly many developers would likely make the switch from one other service. 

It’s also necessary to notice innovations happening on the database level. Projects similar to pgvector and sqlite-vec are taking tried and true databases and making them capable of handle vector embeddings. That is an area where I would love to contribute. Nonetheless, the trail to profit will not be clear, and excited about profit here feels a bit icky (I ♥️ open-source!)

That brings us to the applying layer. That is where the little guys can notch big wins. The flexibility to take the most recent AI tech innovations and integrate them into web applications is and can proceed to be in high demand. The trail to profit is clearest when offering products that individuals love. Applications can either be SaaS offerings or they might be custom-built applications tailored to an organization’s particular use case. 

Do not forget that the businesses working on the muse model layer are consistently working to release higher, faster, and cheaper models. For example, in case you are using the gpt-4o model in your app, and OpenAI updates the model, you don’t need to do a thing to receive the update. Your app gets a pleasant bump in performance for nothing. It’s just like how iPhones get regular updates, except even higher, because no installation is required. The streamed chunks getting back from your API provider are only magically higher.

If you must change to a model from a brand new provider, just change a line or two of code to begin getting improved responses (remember, commoditization). Consider the recent DeepSeek moment; what could also be frightening for OpenAI is thrilling for application builders. 

It will be significant to notice that the applying layer will not be without its challenges. I’ve noticed quite a bit of hand wringing on social media about SaaS saturation. It could feel difficult to get users to register for an account, let alone pull out a bank card. It could feel as if you would like VC funding for marketing blitzes and yet one more in-vogue black-on-black marketing website. The app developer also needs to be careful not to construct something that may quickly be cannibalized by considered one of the massive model providers. Take into consideration how Perplexity initially built their fame by combining the ability of LLMs with search capabilities. On the time this was novel; nowadays hottest chat applications have this functionality built-in.

One other hurdle for the applying developer is obtaining domain expertise. Domain expertise is a elaborate term for knowing about a distinct segment field like law, medicine, automotive, etc. The entire technical skill on the planet doesn’t mean much if the developer doesn’t have access to the obligatory domain expertise to make sure their product actually helps someone. As a straightforward example, one can theorize how a document summarizer may help out a legal company, but without actually working closely with a lawyer, any usability stays theoretical. Use your network to grow to be friends with some domain experts; they can assist power your apps to success.

A substitute for partnering with a website expert is constructing something specifically for yourself. In case you benefit from the product, likely others will as well. You’ll be able to then proceed to dogfood your app and iteratively improve it.

Thick Wrappers

Early applications with gen AI integration were derided as “thin wrappers” around language models. It’s true that taking an LLM and slapping a straightforward chat interface on it won’t succeed. You might be essentially competing with ChatGPT, Claude, etc. in a race to the underside. 

The canonical thin wrapper looks something like:

  • A chat interface
  • Basic prompt engineering
  • A feature that likely might be cannibalized by considered one of the massive model providers soon or can already be done using their apps

An example could be an “AI writing assistant” that just relays prompts to ChatGPT or Claude with basic prompt engineering. One other could be an “AI summarizer tool” that passes a text to an LLM to summarize, with no processing or domain-specific knowledge. 

With our experience in developing web apps with AI integration, we at Los Angeles AI Apps have provide you with the next criterion for tips on how to avoid making a thin wrapper application:

If the app can’t best ChatGPT with search by a major factor, then it’s too thin.

Just a few things to notice here, starting with the concept of a “significant factor”. Even in case you are capable of exceed ChatGPT’s capability in a specific domain by a small factor, it likely won’t be enough to make sure success. You actually need to be so much higher than ChatGPT for people to even think about using the app. 

Let me motivate this insight with an example. Once I was learning data science, I created a movie advice project. It was an awesome experience, and I learned quite a bit about RAG and web applications. 

film search
My old film advice app. Good times! Image by creator.

Would it not be a superb production app? No. 

Regardless of what query you ask it, ChatGPT will likely offer you a movie advice that’s comparable. Despite the indisputable fact that I used to be using RAG and pulling in a curated dataset of movies, it’s unlikely a user will find the responses far more compelling than ChatGPT + search. Since users are accustomed to ChatGPT, they’d likely stick to it for movie recommendations, even when the responses from my app were 2x or 3x higher than ChatGPT (after all, defining “higher” is difficult here.)

Let me use one other example. One app we had considered constructing out was an online app for city government web sites. These sites are notoriously large and hard to navigate. We thought if we could scrape the contents of the web site domain after which use RAG we could craft a chatbot that might effectively answer user queries. It worked fairly well, but ChatGPT with search capabilities is a beast. It oftentimes matched or exceeded the performance of our bot. It will take extensive iteration on the RAG system to get our app to consistently beat ChatGPT + search. Even then, who would need to go to a brand new domain to get answers to city questions, when ChatGPT + search would yield similar results? Only by selling our services to town government and having our chatbot integrated into town website would we get consistent usage.

One solution to differentiate yourself is via proprietary data. If there’s private data that the model providers aren’t aware about, then that might be worthwhile. On this case the worth is in the gathering of the information, not the innovation of your chat interface or your RAG system. Consider a legal AI startup that gives its models with a big database of legal files that can’t be found on the open web. Perhaps RAG might be done to assist the model answer legal questions over those private documents. Can something like this outdo ChatGPT + search? Yes, assuming the legal files can’t be found on Google. 

Going even further, I imagine the easiest way have your app stand out is to forego the chat interface entirely. Let me introduce two ideas:

  • Proactive AI
  • Overnight AI

The Return of Clippy

I read an excellent article from the Evil Martians that highlights the innovation beginning to occur at the applying level. They describe how they’ve forgone a chat interface entirely, and as a substitute are attempting something they call proactive AI. Recall Clippy from Microsoft Word. As you were typing out your document, it will butt in with suggestions. These were oftentimes not helpful, and poor Clippy was mocked. With the arrival of LLMs, you may imagine making a far more powerful version of Clippy. It wouldn’t wait for a user to ask it a matter, but as a substitute could proactively gives users suggestions. This is analogous to the coding Copilot that comes with VSCode. It doesn’t wait for the programmer to complete typing, but as a substitute offers suggestions as they code. Done with care, this sort of AI can reduce friction and improve user satisfaction.

In fact there are necessary considerations when creating proactive AI. You don’t want your AI pinging the user so often that they grow to be frustrating. One may also imagine a dystopian future where LLMs are consistently nudging you to purchase low cost junk or spend time on some mindless app without your prompting. In fact, machine learning models are already doing this, but putting human language on it might make it much more insidious and annoying. It’s imperative that the developer ensures their application is used to profit the user, not swindle or influence them.

Getting Stuff Done While You Sleep

Overnight AI
Image of AI working overnight. Image from GPT-4o

One other alternative to the chat interface is to make use of the LLMs offline quite than online. For example, imagine you desired to create a newsletter generator. This generator would use an automatic scraper to drag in leads from a wide range of sources. It will then create articles for leads it deems interesting. Each recent issue of your newsletter could be kicked off by a background job, perhaps each day or weekly. The necessary detail here: there is no such thing as a chat interface. There is no such thing as a way for the user to have any input; they simply get to enjoy the most recent issue of the newsletter. Now we’re really beginning to cook!

I call this overnight AI. The hot button is that the user never interacts with the AI in any respect. It just produces a summary, a proof, an evaluation etc. overnight if you are sleeping. Within the morning, you get up and get to benefit from the results. There must be no chat interface or suggestions in overnight AI. In fact, it might be very useful to have a human-in-the-loop. Imagine that the problem of your newsletter involves you with proposed articles. You’ll be able to either accept or reject the stories that go into your newsletter. Perhaps you may construct in functionality to edit an article’s title, summary, or cover photo in case you don’t like something the AI generated. 

Summary

In this text, I covered the fundamentals behind the AI stack. This covered the infrastructure, data, model/orchestration, and application layers. I discussed why I imagine the applying layer is the most effective place to work, mainly as a result of the shortage of commoditization, proximity to the tip user, and opportunity to construct products that profit from work done in lower layers. We discussed tips on how to prevent your application from being just one other thin wrapper, in addition to tips on how to use AI in a way that avoids the chat interface entirely.

Partly two, I’ll discuss why the most effective language to learn if you must construct web applications with AI integration will not be Python, but Ruby. I can even break down why the microservices architecture for AI apps will not be the most effective solution to construct your apps, despite it being the default that almost all go along with. 

🔥

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x