You’ve heard of OpenAI and Nvidia, but do you recognize who else is involved within the AI wave and the way all of them fit together?
Several months ago, I visited the MoMA in NYC and saw the work Anatomy of an AI System by Kate Crawford and Vladan Joler. The work examines the Amazon Alexa supply chain from raw resource extraction to plan disposal. This made me to take into consideration every part that goes into producing today’s generative AI (GenAI) powered applications. By digging into this query, I got here to grasp the numerous layers of physical and digital engineering that GenAI applications are built upon.
I’ve written this piece to introduce readers to the key components of the GenAI value chain, what role each plays, and who the key players are at each stage. Along the best way, I hope as an instance the range of companies powering the expansion of AI, how different technologies construct upon one another, and where vulnerabilities and bottlenecks exist. Starting with the user-facing applications emerging from technology giants like Google and the newest batch of startups, we’ll work backward through the worth chain all the way down to the sand and rare earth metals that go into computer chips.
Technology giants, corporate IT departments, and legions of recent startups are within the early phases of experimenting with potential use cases for GenAI. These applications often is the start of a brand new paradigm in computer applications, marked by radical recent systems of human-computer interaction and unprecedented capabilities to grasp and leverage unstructured and previously untapped data sources (e.g., audio).
Lots of essentially the most impactful advances in computing have come from advances in human-computer interaction (HCI). From the event of the GUI to the mouse to the touch screen, these advances have greatly expanded the leverage users gain from computing tools. GenAI models will further remove friction from this interface by equipping computers with the facility and adaptability of human language. Users will have the opportunity to issue instructions and tasks to computers just as they may a reliable human assistant. Some examples of products innovating within the HCI space are:
- Siri (AI Voice Assistant) — Enhances Apple’s mobile assistant with the potential to grasp broader requests and questions
- Palantir’s AIP (Autonomous Agents) — Strips complexity from large powerful tools through a chat interface that directs users to the specified functionality and actions
- Lilac Labs (Customer Service Automation) — Automates drive-through customer ordering with voice AI
GenAI equips computer systems with agency and adaptability that was previously unimaginable when sets of preprogrammed procedures guided their functionality and their data inputs needed to suit well-defined rules established by the programmer. This flexibility allows applications to perform more complex and open ended knowledge tasks that were previously strictly within the human domain. Some examples of recent applications leveraging this flexibility are:
- GitHub Copilot (Coding Assistant) — Amplifies programmer productivity by implementing code based on the user’s intent and existing code base
- LenAI (Knowledge Assistant) — Saves knowledge staff time by summarizing meetings, extracting critical insights from discussions, and drafting communications
- Perplexity (AI Search) — Answers user questions reliably with citations by synthesizing traditional web searches with AI-generated summaries of web sources
A various group of players is driving the event of those use cases. Hordes of startups are bobbing up, with 86 of Y Combinator’s W24 batch focused on AI technologies. Major tech firms like Google have also introduced GenAI products and features. For example, Google is leveraging its Gemini LLM to summarize leads to its core search products. Traditional enterprises are launching major initiatives to grasp how GenAI can complement their strategy and operations. JP Morgan CEO Jamie Dimon said AI is “unbelievable for marketing, risk, fraud. It’ll allow you to do your job higher.” As firms understand how AI can solve problems and drive value, use cases and demand for GenAI will multiply.
With the discharge of OpenAI’s ChatGPT (powered by the GPT-3.5 model) in late 2022, GenAI exploded into the general public consciousness. Today, models like Claude (Anthropic), Gemini (Google), and Llama (Meta) have challenged GPT for supremacy. The model provider market and development landscape are still of their infancy, and plenty of open questions remain, corresponding to:
- Will smaller domain/task-specific models proliferate, or will large models handle all tasks?
- How far can model sophistication and capability advance under the present transformer architecture?
- How will capabilities advance as model training approaches the limit of all human-created text data?
- Which players will challenge the present supremacy of OpenAI?
While speculating concerning the capability limits of artificial intelligence is beyond the scope of this discussion, the marketplace for GenAI models is probably going large (many outstanding investors actually value it highly). What do model builders do to justify such high valuations and a lot excitement?
The research teams at firms like OpenAI are liable for making architectural selections, compiling and preprocessing training datasets, managing training infrastructure, and more. Research scientists on this field are rare and highly valued; with the average engineer at OpenAI earning over $900k. Not many firms can attract and retain individuals with this highly specialized skillset required to do that work.
Compiling the training datasets involves crawling, compiling, and processing all text (or audio or visual) data available on the web and other sources (e.g., digitized libraries). After compiling these raw datasets, engineers layer in relevant metadata (e.g., tagging categories), tokenize data into chunks for model processing, format data into efficient training file formats, and impose quality control measures.
While the marketplace for AI model-powered services and products could also be price trillions inside a decade, many barriers to entry prevent all but essentially the most well-resourced firms from constructing cutting-edge models. The very best barrier to entry is the hundreds of thousands to billions of capital investment required for model training. To coach the newest models, firms must either construct their very own data centers or make significant purchases from cloud service providers to leverage their data centers. While Moore’s law continues to rapidly lower the value of computing power, that is greater than offset by the rapid scale up in model sizes and computation requirements. Training the newest cutting-edge models requires billions in data center investment (in March 2024, media reports described an investment of $100B by OpenAI and Microsoft on data centers to coach next gen models). Few firms can afford to allocate billions toward training an AI model (only tech giants or exceedingly well-funded startups like Anthropic and Secure Superintelligence).
Finding the suitable talent can be incredibly difficult. Attracting this specialized talent requires greater than a 7-figure compensation package; it requires connections with the suitable fields and academic communities, and a compelling value proposition and vision for the technology’s future. Existing players’ high access to capital and domination of the specialized talent market will make it difficult for brand new entrants to challenge their position.
Knowing a bit concerning the history of the AI model market helps us understand the present landscape and the way the market may evolve. When ChatGPT burst onto the scene, it felt like a breakthrough revolution to many, but was it? Or was it one other incremental (albeit impressive) improvement in an extended series of advances that were invisible outside of the event world? The team that developed ChatGPT built upon many years of research and publicly available tools from industry, academia, and the open-source community. Most notable is the transformer architecture itself — the critical insight driving not only ChatGPT, but most AI breakthroughs prior to now five years. First proposed by Google of their 2017 paper Attention is All You Need, the transformer architecture is the inspiration for models like Stable Diffusion, GPT-4, and Midjourney. The authors of that 2017 paper have founded a few of the most outstanding AI startups (e.g., CharacterAI, Cohere).
Given the common transformer architecture, what is going to enable some models to “win” against others? Variables like model size, input data quality/quantity, and proprietary research differentiate models. Model size has shown to correlate with improved performance, and one of the best funded players could differentiate by investing more in model training to further scale up their models. Proprietary data sources (corresponding to those possessed by Meta from its user base and Elon Musk’s xAI from Tesla’s driving videos) could help some models learn what other models don’t have access to. GenAI remains to be a highly energetic area of ongoing research — research breakthroughs at firms with one of the best talent will partially determine the pace of advancement. It’s also unclear how strategies and use cases will create opportunities for various players. Perhaps application builders leverage multiple models to cut back dependency risk or to align a model’s unique strengths with specific use cases (e.g., research, interpersonal communications).
We discussed how model providers invest billions to construct or rent computing resources to coach these models. Where is that spending going? Much of it goes to cloud service providers like Microsoft’s Azure (utilized by OpenAI for GPT) and Amazon Web Services (utilized by Anthropic for Claude).
Cloud service providers (CSPs) play an important role within the GenAI value chain by providing the vital infrastructure for model training (in addition they often provide infrastructure to the tip application builders, but this section will give attention to their interactions with the model builders). Major model builders primarily don’t own and operate their very own computing facilities (often called data centers). As a substitute, they rent vast amounts of computing power from the hyper-scaler CSPs (AWS, Azure, and Google Cloud) and other providers.
CSPs produce the resource computing power (manufactured by inputting electricity to a specialized microchip, 1000’s of which comprise a knowledge center). To coach their models, engineers provide the computers operated by CSPs with instructions to make computationally expensive matrix calculations over their input datasets to calculate billions of parameters of model weights. This model training phase is liable for the high upfront cost of investment. Once these weights are calculated (i.e., the model is trained), model providers use these parameters to answer user queries (i.e., make predictions on a novel dataset). This can be a less computationally expensive process often called inference, also done using CSP computing power.
The cloud service provider’s role is constructing, maintaining, and administering data centers where this “computing power” resource is produced and utilized by model builders. CSP activities include acquiring computer chips from suppliers like Nvidia, “racking and stacking” server units in specialized facilities, and performing regular physical and digital maintenance. In addition they develop your entire software stack to administer these servers and supply developers with an interface to access the computing power and deploy their applications.
The principal operating expense for data centers is electricity, with AI-fueled data center expansion prone to drive a big increase in electricity usage in the approaching many years. For perspective, a normal query to ChatGPT uses ten times as much energy as a mean Google Search. Goldman Sachs estimates that AI demand will double the info center’s share of worldwide electricity usage by the last decade’s end. Just as significant investments should be made in computing infrastructure to support AI, similar investments should be made to power this computing infrastructure.
Looking ahead, cloud service providers and their model builder partners are in a race to construct the most important and strongest data centers capable of coaching the following generation models. The information centers of the longer term, like those under development by the partnership of Microsoft and OpenAI, would require 1000’s to hundreds of thousands of recent cutting-edge microchips. The substantial capital expenditures by cloud service providers to construct these facilities are actually driving record profits at the businesses that help construct those microchips, notably Nvidia (design) and TSMC (manufacturing).
At this point, everyone’s likely heard of Nvidia and its meteoric, AI-fueled stock market rise. It’s turn into a cliche to say that the tech giants are locked in an arms race and Nvidia is the one supplier, but is it true? For now, it’s. Nvidia designs a type of computer microchip often called a graphical processing unit (GPU) that’s critical for AI model training. What’s a GPU, and why is it so crucial for GenAI? Why are most conversations in AI chip design centered around Nvidia and never other microchip designers like Intel, AMD, or Qualcomm?
Graphical processing units (because the name suggests) were initially used to serve the pc graphics market. Graphics for CGI movies like Jurassic Park and video games like Doom require expensive matrix computations, but these computations might be done in parallel slightly than in series. Standard computer processors (CPUs) are optimized for fast sequential computation (where the input to 1 step may very well be output from a previous step), but they can’t do large numbers of calculations in parallel. This optimization for “horizontally” scaled parallel computation slightly than accelerated sequential computation was well-suited for computer graphics, and it also got here to be perfect for AI training.
Given GPUs served a distinct segment market until the rise of video games within the late 90s, how did they arrive to dominate the AI hardware market, and the way did GPU makers displace Silicon Valley’s original titans like Intel? In 2012, this system AlexNet won the ImageNet machine learning competition through the use of Nvidia GPUs to speed up model training. They showed that the parallel computation power of GPUs was perfect for training ML models because like computer graphics, ML model training relied on highly parallel matrix computations. Today’s LLMs have expanded upon AlexNet’s initial breakthrough to scale as much as quadrillions of arithmetic computations and billions of model parameters. With this explosion in parallel computing demand since AlexNet, Nvidia has positioned itself because the only potential chip for machine learning and AI model training due to heavy upfront investment and clever lock-in strategies.
Given the large marketing opportunity in GPU design, it is cheap to ask why Nvidia has no significant challengers (on the time of this writing, Nvidia holds 70–95% of the AI chip market share). Nvidia’s early investments within the ML and AI market before ChatGPT and before even AlexNet were key in establishing a hefty lead over other chipmakers like AMD. Nvidia allocated significant investment in research and development for the scientific computing (to turn into ML and AI) market segment before there was a transparent business use case. Due to these early investments, Nvidia had already developed one of the best supplier and customer relationships, engineering talent, and GPU technology when the AI market took off.
Perhaps Nvidia’s most vital early investment and now its deepest moat against competitors is its CUDA programming platform. CUDA is a low-level software tool that allows engineers to interface with Nvidia’s chips and write parallel native algorithms. Many models, corresponding to LlaMa, leverage higher-level Python libraries built upon these foundational CUDA tools. These lower level tools enable model designers to give attention to higher-level architecture design selections without worrying concerning the complexities of executing calculations on the GPU processor core level. With CUDA, Nvidia built a software solution to strategically complement their hardware GPU products by solving many software challenges AI builders face.
CUDA not only simplifies the strategy of constructing parallelized AI and machine learning models on Nvidia chips, it also locks developers onto the Nvidia system, raising significant barriers to exit for any firms trying to switch to Nvidia’s competitors. Programs written in CUDA cannot run on competitor chips, which suggests that to modify off Nvidia chips, firms must rebuild not only the functionality of the CUDA platform, they have to also rebuild any parts of their tech stack depending on CUDA outputs. Given the large stack of AI software built upon CUDA over the past decade, there may be a considerable switching cost for anyone trying to move to competitors’ chips.
Firms like Nvidia and AMD design chips, but they don’t manufacture them. As a substitute, they depend on semiconductor manufacturing specialists often called foundries. Modern semiconductor manufacturing is probably the most complex engineering processes ever invented, and these foundries are a great distance from most individuals’s image of a conventional factory. As an instance, transistors on the newest chips are only 12 Silicon atoms long, shorter than the wavelength of visible light. Modern microchips have trillions of those transistors packed onto small silicon wafers and etched into atom-scale integrated circuits.
The important thing to manufacturing semiconductors is a process often called photolithography. Photolithography involves etching intricate patterns on a silicon wafer, a crystalized type of the element silicon used as the bottom for the microchip. The method involves coating the wafer with a light-sensitive chemical called photoresist after which exposing it to ultraviolet light through a mask that incorporates the specified circuit. The exposed areas of the photoresist are then developed, leaving a pattern that might be etched into the wafer. Essentially the most critical machines for this process are developed by the Dutch company ASML, which produces extreme ultraviolet (EUV) lithography systems and holds the same stranglehold to Nvidia in its segment of the AI value chain.
Just as Nvidia got here to dominate the GPU design market, its primary manufacturing partner, Taiwan Semiconductor Manufacturing Company (TSMC), holds a similarly large share of the manufacturing marketplace for essentially the most advanced AI chips. To grasp TSMC’s place within the semiconductor manufacturing landscape, it is useful to grasp the broader foundry landscape.
Semiconductor manufacturers are split between two primary foundry models: pure-play and integrated. Pure-play foundries, corresponding to TSMC and GlobalFoundries, focus exclusively on manufacturing microchips for other firms without designing their very own chips (the complement to fabless firms like Nvidia and AMD, who design but don’t manufacture their chips). These foundries concentrate on fabrication services, allowing fabless semiconductor firms to design microchips without heavy capital expenditures in manufacturing facilities. In contrast, integrated device manufacturers (IDMs) like Intel and Samsung design, manufacture, and sell their chips. The integrated model provides greater control over your entire production process but requires significant investment in each design and manufacturing capabilities. The pure-play model has gained popularity in recent many years because of the flexibleness and capital efficiency it offers fabless designers, while the integrated model continues to be advantageous for firms with the resources to take care of design and fabrication expertise.
It’s unimaginable to debate semiconductor manufacturing without considering the vital role of Taiwan and the resultant geopolitical risks. Within the late twentieth century, Taiwan transformed itself from a low-margin, low-skilled manufacturing island right into a semiconductor powerhouse, largely because of strategic government investments and a give attention to high-tech industries. The establishment and growth of TSMC have been central to this transformation, positioning Taiwan at the center of the worldwide technology supply chain and resulting in the outgrowth of many smaller firms to support manufacturing. Nonetheless, this dominance has also made Taiwan a critical focus in the continued geopolitical struggle, as China views the island as a breakaway province and seeks greater control. Any escalation of tensions could disrupt the worldwide supply of semiconductors, with far-reaching consequences for the worldwide economy, particularly in AI.
At essentially the most basic level, all manufactured objects are created from raw materials extracted from the earth. For microchips used to coach AI models, silicon and metals are their primary constituents. These and the chemicals utilized in the photolithography process are the first inputs utilized by foundries to fabricate semiconductors. While america and its allies have come to dominate many parts of the worth chain, its AI rival, China, has a firmer grasp on raw metals and other inputs.
The first ingredient in any microchip is silicon (hence the name Silicon Valley). Silicon is probably the most abundant minerals within the earth’s crust and is usually mined as Silica Dioxide (i.e., quartz or silica sand). Producing silicon wafers involves mining mineral quartzite, crushing it, after which extracting and purifying the fundamental silicon. Next, chemical firms corresponding to Sumco and Shin-Etsu Chemical convert pure silicon to wafers using a process called Czochralski growth, during which a seed crystal is dipped into molten high-purity silicon and slowly pulled upwards while rotating. This process creates a sizeable single-crystal silicon ingot sliced into thin wafers, which form the substrate for semiconductor manufacturing.
Beyond Silicon, computer chips also require trace amounts of rare earth metals. A critical step in semiconductor manufacturing is doping, during which impurities are added to the silicon to regulate conductivity. Doping is often done with rare earth metals like Germanium, Arsenic, Gallium, and Copper. China dominates the worldwide rare earth metal production, accounting for over 60% of mining and 85% of processing. Other significant rare earth metals producers include Australia, america, Myanmar, and the Democratic Republic of the Congo. The USA’ heavy reliance on China for rare earth metals poses significant geopolitical risks, as supply disruptions could severely impact the semiconductor industry and other high-tech sectors. This dependence has prompted efforts to diversify supply chains and develop domestic rare earth production capabilities within the US and other countries, though progress has been slow because of environmental concerns and the complex nature of rare earth processing.
The physical and digital technology stacks and value chains that support the event of AI are intricate and built upon many years of educational and industrial advances. The worth chain encompasses end application builders, AI model builders, cloud service providers, chip designers, chip fabricators, and raw material suppliers, amongst many other key contributors. While much of the eye has been on major players like OpenAI, Nvidia, and TSMC, significant opportunities and bottlenecks exist in any respect points along the worth chain. Hundreds of recent firms will probably be born to resolve these problems. While firms like Nvidia and OpenAI may be the Intel and Google of their generation, the non-public computing and web booms produced 1000’s of other unicorns to fill niches and solve issues that got here with inventing a brand new economy. The opportunities created by the shift to AI will take many years to be understood and realized, much as in personal computing within the 70s and 80s and the web within the 90s and 00s.
While entrepreneurship and crafty engineering may solve many problems within the AI market, some problems involve far greater forces. No challenge is bigger than rising geopolitical tension with China, which owns (or claims to own) a lot of the raw materials and manufacturing markets. This contrasts with america and its allies, who control most downstream phases of the chain, including chip design and model training. The struggle for AI dominance is very vital because the chance unlocked by AI just isn’t just economic but in addition military. Semi-autonomous weapons systems and cyberwarfare agents leveraging AI capabilities may play decisive roles in conflicts of the approaching many years. Modern defense technology startups like Palantir and Anduril already show how AI capabilities can expand battlefield visibility and speed up decision loops to achieve potentially decisive advantage. Given AI’s high potential for disruption to the worldwide order and the fragile balance of power between america and China, it’s imperative that the 2 nations seek to take care of a cooperative relationship geared toward mutually helpful development of AI technology for the betterment of worldwide prosperity. Only by solving problems across the provision chain, from the scientific to the economic to the geopolitical, can the promise of AI to supercharge humanity’s capabilities be realized.