What Can the History of Data Tell Us Concerning the Way forward for AI?

I actually have decided to jot down concerning the history of information for several reasons. First, I work in data, and I wish to know the history of my field. Second, I imagine one of the best strategy to understand what might occur in the long run is to know what’s happened prior to now. And third, I imagine the trends we will learn from the history of information could tell us loads concerning the way forward for AI. Data is the fossil fuel of AI, in any case. After I get curious (or afraid) of what AI might mean for humanity, I look online to see what experts say, and I get confused.

“There’s a 10 to twenty percent probability that AI will result in human extinction inside the following three a long time.” – Geoffrey Hinton (“Godfather of AI”) — The Guardian, Dec 2024

“I’ve all the time considered AI as probably the most profound technology humanity is working on—more profound than fire or electricity.” –Sundar Pichai (CEO, Google/Alphabet) – CNBC, Oct 2016

“There’s some probability that’s above zero that AI will kill us all.” – Elon Musk — NBC News, Sept 2023

“AI is the brand new electricity.” – Andrew Ng (Co-founder, Google Brain & Coursera) – 2017

“The event of full artificial intelligence could spell the tip of the human race.” –Stephen Hawking — BBC interview, Dec 2014“

That’s why AI is exciting… What if we will have the sort of economic growth [we enjoyed in the early 20th century] only this time it’s way more even?” – Satya Nadella (CEO of Microsoft) – TIME, 2023

AI would be the end of the human race, or be as impactful and useful as fire or electricity. I’m no AI expert, and I don’t even really understand what it’s or how it really works, but fairly than throw my hands up within the air and say that the long run of AI is somewhere between apocalypse and utopia, I began reading. My logic is that if I can understand the history and current state of information, I can have a greater idea of the long run of AI—a minimum of higher than the jokers I just quoted.

I break data into three types based on what it’s about: personal, public, and enterprise. Personal data is data about individual people—all of the information stored in your laptop computer and all of the press data that tech corporations harvest from you. Public data is data concerning the world, which doesn’t necessarily mean it’s free. Enterprise data is data about corporations. It’s mostly stuff that doesn’t live to tell the tale the general public web, though it increasingly lives within the cloud. I do know that there are additional ways to categorize data aside from by what it’s about. The sort (text, images, video) of information, for instance, could be equally vital. We’re not going to speak about that here.

My goal with this piece is to know how data has modified over the past 40 years when it comes to what’s collected, the way it is stored and what it’s used for. To do this, I first needed to explore the devices and architectures that shape those trends. Along the best way, I discovered that gets collected is just half the story; that information is monetized is just as vital. The SaaS business model and AdWords (the best way Google began placing ads in search results) are only as impactful as any technological breakthrough, for instance. I’m convinced the following wave of AI shall be driven by exactly these forces: who captures the information, how they capture it, what kinds they capture, and the business models that turn that data into dollars.

This text is supposed for data practitioners who’re inquisitive about the long run of AI but overwhelmed with articles by people claiming to know what the long run of AI will appear like. I do not know what the long run holds, but understanding how we came is first step. My next piece will get into actual predictions concerning the future, which shall be falsifiable claims in order that I could be held accountable. I’ll use Philip Tetlock’s framework from his book, , to make these predictions. Here’s an outline of what this text will cover:

Part 1 is about Stewart Brand, my favorite person I learned about through this research. 🤘

Part 2 of this story is concerning the laptop computer. Personal data really began with the dawn of the PC, which began in full-force in 1981, when IBM launched the IBM PC. The IBM PC ran MS-DOS, the operating system built and licensed by Microsoft. When “clones” of the IBM PC, like Compaq and Dell, popped up, additionally they used MS-DOS, benefiting Microsoft. Apple, however, never licensed their operating system. Microsoft stays, primarily, a software company, and Apple, a hardware company.

Part 3 is about how personal computers enabled enterprise data to maneuver away from mainframes and mini-computers and to a client-server architecture—client PCs sharing data on a centralized database. This shift meant more people had access to enterprise data and apps, but created a nightmare of systems integrations and data alignment that persists to today.

Part 4 is about how Tim Berners-Lee (TBL) invented the World Wide Web in 1993 and private computers became portals to the Web. The primary “Browser War” began, mostly between Netscape and Microsoft’s Web Explorer. It also goes into TBL’s original vision and the degree to which it has been realized with public data, notably Wikipedia.

Part 5 is concerning the rise of Google and Amazon within the Nineties. Google began scraping links off the Web and constructing a search engine. They eventually learned that one of the best strategy to become profitable on the Web was by harvesting click data (data about how people use the Web) and using that data to serve targeted ads. They called this product AdWords. Amazon began as a web based bookstore but quickly grew to an all the things store. As they grew, additionally they built massive data center and began renting server space to other corporations to run applications and store data. “The cloud” was born.

Part 6 is a deeper dive into the move to the cloud, using Nicolas Carr’s as reference. In his book, he draws a parallel between the expansion of electricity as a utility within the late nineteenth century and the rise of cloud computing within the late twentieth century.

Part 7 is about how enterprise data has began moving to the cloud, starting with Salesforce in 1999. The client-server architecture is replaced with “Web-based” architectures, using the technology of the World Wide Web, after which to a software as a service (SaaS) model, where the seller hosts your complete architecture themselves and sells subscriptions fairly than the software itself. Moreover, due to technologies like parallelization and virtualization, corporations were capable of store and compute data across multiple servers, leading the the rise of the “data lake.” I take a while here to focus on that the issue of integrated data that flared up through the client-server architecture era has still not been solved, but that Tim-Berners Lee’s vision of the semantic web might hold promise.

Part 8 is all about Facebook and the rise of social media. They took the business model that Google pioneered with AdWords and applied it to much more personal data.

Part 9 details the launch of the iPhone, which put computers in our pockets and adjusted the best way personal data is captured. This led to completely recent industries like ride sharing and dating based on proximity. It was so successful, Apple became the primary company with a half-trillion dollar market evaluation in 2012 and the primary to a trillion in 2018 (Haigh and Ceruzzi 401). This also confirms Apple’s place as primarily a hardware vendor.

The conclusion goes through the main players business models and the information they collect. This helps to refine the sorts of questions that I’ll try and answer partially two:

Since I do know you won’t read all of this, listed here are some major takeaways from my research:

To begin with, there usually are not enough women on this history. Listed here are just a few women that I need to focus on as being significant within the history of information and computers: Ada, Countess of Lovelace, was an artist and a mathematician and wrote the primary computer program in 1843, a full fucking CENTURY before Alan Turing (Isaacson 33). Grace Hopper wrote the primary compiler, wrote the primary computer handbook, and championed COBOL, turning programming from arcane machine code into English-like instructions that anyone could learn (Isaacson 88). Larry Page and Sergey Brin didn’t start Google of their garage; they began it in Susan Wojcicki’s garage. Wojcicki became worker number 16 and oversaw their promoting and analytics products, including AdWords, “probably the most successful scheme for getting cash on the Web that the world had ever seen” (Levy 83). She then managed the $1.65 billion acquisition of YouTube and became the YouTube CEO from 2014 to 2023. And Facebook never made a profit until Sheryl Sandberg showed up, ended the dorm room boys’ club, and turned Facebook right into a real (and profitable) company (Levy 190).

There’s loads more written concerning the laptop computer era and the Steve Jobs/Bill Gates rivalry than some other a part of this history. It’s an interesting period, but we’d like more books and a biopic about Larry Ellison (starring Sam Rockwell) and the entire enterprise side of information.

There’s also loads written concerning the personalities of those billionaires. I’m less inquisitive about their psychology than the outcomes of their decisions, nevertheless it is difficult to not see some patterns. Generally, probably the most common personality traits of those guys (Gates, Jobs, Ellison, Bezos, Zuckerberg, Brin, and Page) were that they’re stubborn, relentless, and irreverent.

The business model often followed the product. There’s probably a word for this that you simply learn in business school, but I didn’t go to business school. Often, the product becomes ubiquitous, after which the corporate figures out a business model and revenue stream to fund it. Google is one of the best example—it became the biggest search engine on the earth before they discovered they might use targeted ads to print money. Same with Facebook—they weren’t profitable until Sheryl Sandberg joined and informed them they were an ad company.

Conversely, a product may turn into ubiquitous and a revenue stream never develops. Microsoft spent lots of money and time (and have become the plaintiff in an antitrust lawsuit) destroying Netscape. But once they’d the preferred browser on the earth, Web Explorer, it didn’t matter. There’s not nearly as much money in browsers as other parts of the Web. That being said, should you don’t win wars, you lose wars and die. The browser wars did have an effect on Netscape—it doesn’t exist anymore.

Established corporations often don’t embrace recent technology fast enough due to their established success. That is generally known as the “Innovator’s Dilemma” and described in Clayton Christenen’s book of the identical name. Principally, an organization that has found product-market fit will incrementally improve their product to fulfill the needs of their existing customer base. An alternate product or architecture could cannibalize this existing revenue stream so that they ignore that and concentrate on the thing that works. IBM invented the relational database but they didn’t commercialize it because they didn’t want it to encroach on the revenue of their hierarchical database business line. Similarly, Oracle was capable of beat SAP to market with a web-based architecture (the E-Business Suite) because SAP didn’t HAVE to pivot—their client-server product (R/3) was massively successful. Barnes and Noble didn’t wish to risk investing in a web based store that wasn’t, on the time, as profitable as their brick and mortar stores (Stone 59).

The revenue model matters greater than just dollars and cents. Firms actions could be higher understood by understanding their underlying revenue model. Google didn’t create Chrome or buy Android to become profitable directly—they were tools to get more people to spend more time online and get served ads. Facebook’s content could be horrific and drive outrage, but outrage drives engagement, and engagement drives ad revenue.

Moore’s law (the commentary that transistor counts double about every two years) has held but slowed. Intel CEO Pat Gelsinger said in 2023 that the industry is now “doubling effectively closer to each three years.” And Butters’ Law of Photonics (that the information capability of an optical fiber roughly doubles every nine months) held true through the 2000s, but advances have slowed to roughly every two years as systems near physical limits. Through much of the 2000-2020 period, Butters’ Law enabled fiber to exchange legacy telephone lines.

Data > Storage > Computation > Communication: The quantity of information created has all the time been much greater than the whole storage capability. Storage capability has all the time been greater than processing power. And processing power has all the time been greater than the flexibility to speak the insights of those computations. I’m not a brain doctor, but I feel this is similar for humans: we perceive way more information than we will remember (store); we store greater than we will take into consideration at any given time (computation); and we take into consideration greater than we will effectively communicate.

There’s a positive feedback loop between data, product, and AI. The very best product gets market dominance, which allows it to gather more data which allows it to enhance its algorithms which allows it to expand market share which…

Data is moving to the cloud. Duh. Enterprise data and apps are increasingly built on the hyperscalers—AWS, Google Cloud, and Microsoft Azure. There are even SaaS-native database corporations built on this infrastructure like Snowflake and Databricks, that are the fastest growing database management systems (53 percent and 42 percent year-on-year revenue growth, respectively). For private data, billions of users feed information into apps like Facebook, Instagram, and TikTok, on cloud-based collaborative tools like Google Workspace, and streaming services like Spotify and Netflix. Spotify has shut down its data-servers and runs all the things on Google Cloud and Netflix accomplished its cloud migration to AWS in 2016. Even the CIA uses AWS.

Connecting enterprise data has been a headache through every architectural era. Whether in a client-server architecture or within the cloud, connecting data to make useful insights has been a challenge for a long time. Oracle tried to unravel this with their “one company, one database” initiative (Symonds 168) but realized that the “key to all the things … was a shared data schema, allowing semantic consistency” (Symonds 188). With the rise of cloud computing, corporations again tried to unravel their siloed data problem by putting it multi functional database, but this time called it a “data lake.” No surprise that this didn’t work due to lack of a unified semantic layer.

Graph analytics fueled the rise of Google and Facebook: From day one, Google’s PageRank and Meta’s social graph mined network connections to rank pages, notifications, and goal ads, making graph evaluation on metadata the engine of each corporations’ meteoric rise.

Revenue models and data resources can tell us about where AI goes, or a minimum of which inquiries to ask: At the very least, that is my theory.

Google and Meta are promoting corporations. They’re using AI to get users to interact with their products more so that they can serve them more ads. They’re creating devices (Meta’s Ray-Ban’s and Google’s Warby Parkers) to get people online more so that they can serve them more ads. AI is a feature of their products to drive engagement. In addition they have a ton of non-public data; Google knows our search history and Meta knows all the things about us. Google also has lots of enterprise data through their Google Workspace and Google Cloud Platform and lots of public data because they’re the biggest search engine on the Web.

Apple is a tool company and their revenue is driven by hardware sales. They’re embedding AI directly into their devices so that they can sell more of them. They’ve lots of personal data too, though they don’t use it to sell targeted ads.

Microsoft rents its software and servers, and makes most of its revenue on these subscriptions. It’s incorporating AI into these applications (Copilot) to drive expansion. Other enterprise software corporations (Google, Oracle, IBM, SAP, Salesforce, Workday, and ServiceNow) are doing the identical. Microsoft’s Azure can also be the second largest cloud computing platform behind AWS.

Amazon is an area exploration company funded by some terrestrial enterprises.

1. Acid Foundations

I do know I just said we’d start in 1981, but I need to take a moment to acknowledge the best person I learned about in the entire reading I did for this project: Stewart Brand. The laptop computer movement and bringing “power to the people” within the Nineteen Seventies and 80s was a direct consequence of the hippies and the beats of the 60s, and Brand is the embodiment of this transition. “The counterculture’s scorn for centralized authority provided the philosophical foundations of your complete personal-computer revolution,” Brand himself wrote (Isaacson 269).

Brand was a component of the “Merry Pranksters” within the 60s—Ken Kesey’s LSD-fueled group who rode a bus driven by Neal Cassady (Dean Moriarty from On the Road) cross country, making pit stops to throw psychedelic parties and jam with the Grateful Dead. While tripping sooner or later, he became convinced that seeing an image of the entire earth from space would change the best way people considered protecting our home and petitioned the federal government to take and release an image from space. Famed inventor, architect and futurist Buckminster Fuller offered to assist, and a few NASA employees even wore Brand’s pins that said, “Why haven’t we seen a photograph of the entire Earth yet?”

After NASA took the photo in 1967, Brand began the Whole Earth Catalog with the image of the Whole Earth on the duvet. The catalog was a do-it-yourself magazine teaching people methods to use tools (including computers), be self-sufficient, share resources, and resist conformity and centralized authority (Isaacson 265). This magazine would encourage many young people, including Steve Jobs, who would famously quote it during his 2005 Stanford Commencement address: “Stay hungry, stay silly.”

After starting the Whole Earth Catalog, he met Douglas Engelbart, an engineer running a lab focused on how computers could augment human intelligence. They took LSD together on the lab, and Brand parlayed his experience throwing psychedelic trip fests into helping Douglas Engelbart give the “Mother of All Demos” in 1968. This was the primary time many fundamental parts of the laptop computer were shown: the mouse, on-screen graphics, multiple windows, blog-like publishing, wiki-like collaboration, email, document sharing, fast messaging, hypertext linking, and video conferencing (Isaacson 278).

He realized that computers were the brand new drugs and “hackers” were the brand new hippies. He organized the primary Hacker’s Conference in 1984. He began the WELL in 1985 (The Whole Earth ‘Lectronic Link), one in every of the primary and most influential virtual communities. It was craigslist before craigslist (though its founder Craig Newman was a member of the WELL) and “AOL for Deadheads” (AOL founder Steve Case was also a WELL member).

The laptop computer was not created by corporate suits. Yes, IBM brought the laptop computer into the mainstream, but lots of the pieces they put together had been invented by hippy hackers who read the Whole Earth Catalog. These innovations were driven by people fighting straight-laced corporate conformity, attempting to bring the ability of computers to the person. Take into consideration how trippy it’s that the words you’re reading are tiny flashing lights on a screen that you simply’re moving together with your finger. That couldn’t have been envisioned in a board room; it was the function of anti-authoritarianism, irreverence, free love, and psychedelics.

What’s wild is that Stewart Brand remains to be alive today and actively working on futuristic environmental problems just like the Long Now Foundation, which is constructing a ten,000 12 months clock, and attempting to bring the wooly mammoth back to life. He lives on a ship in California along with his wife. Take a look at the documentary We Are As Gods (which comes from the Whole Earth Catalog’s statement of purpose: “We’re as gods and might as well get good at it”) for more information on this awesome dude.

2. The Personal Computer

The 12 months is 1981. Ronald Reagan becomes the fortieth US president, Lady Diana Spencer becomes a princess, Indiana Jones prevents the Nazis from using the Ark of the Covenant for evil, and IBM releases their first laptop computer, the IBM PC.

The IBM PC just isn’t the primary laptop computer. The true first industrial laptop computer was the Altair 8800, built by Ed Roberts in Albuquerque and released in 1975. The Altair was wildly successful amongst hobbyists and inspired a complete wave of innovation, including a young Bill Gates to start out an organization called Microsoft to jot down and sell code for the Altair. While not mainstream successful, the Altair began the laptop computer race. Two years later, in 1977, Radio Shack began selling its TRS-80, Commodore International unveiled the Commodore PET, and two Steves in Cupertino, California began selling their Apple II. While costlier than its competition, the Apple II was way more popular (Ceruzzi 265).

On the time, IBM was the dominant force in computing, focusing totally on mainframes. The recognition of the Apple II forced IBM to take personal computers seriously and enter the market. To get a product to market as fast as possible, IBM used third parties and off the shelf components.

None of this might have been possible without the microprocessor, built by Intel in 1971. Intel was the product of Robert Noyce, Gordon Moore, and Andrew Grove. Noyce and Moore had left Fairchild Semiconductor on account of differences with erratic founder William Shockley. “He can have been the worst manager within the history of electronics,” said Shockley’s biographer. Side note is that Andy Grove wrote an amazing management book (High Output Management), which I might recommend. Larry Ellison even said in his book, “Andy’s the one guy whom each Steve Jobs and I agree we’d be willing to work for” (Symonds 271).

Our story starts in 1981 because, while the IBM PC was not the primary laptop computer, it was when PCs entered the mainstream. IBM was THE name in computing for a long time, and when it launched its first PC, it meant that PCs could turn into a part of the workforce in a way that machines built by startups like Apple never could. The launch of the IBM PC can also be significant due to software it used. It ran PC-DOS, an operating system licensed by Bill Gates at Microsoft. This is important for several reasons. Let’s undergo them one after the other:

First, Bill Gates and his team at Microsoft were capable of see the potential in selling software, specifically PC-DOS to IBM, even when it wasn’t that profitable on the front end. They got a flat rate from IBM for selling the OS to them (about $80K) and no royalties. But, they were free to sell their OS to other vendors as well. They kept the IP and licensed the appropriate for IBM to make use of it, non-exclusively. That will turn into the usual way Microsoft would do business for a long time.

Second, Microsoft didn’t have an operating system to sell to IBM when IBM asked. They told IBM to check with Gary Kildall of Digital Research about his OS, but when Gary wasn’t available, Microsoft seized the chance and went and acquired an OS from Seattle Computer Products for $50K. The initial success of Microsoft was fueled by a good amount of luck and stealing products from others.

This can also be significant since it set the stage for DOS becoming “one in every of the longest-lived and most influential pieces of software ever written,” (Ceruzzi 270). IBM sold 750,000 of their PCs inside two years but then the replicas began arising, starting with Compaq in 1983 (Ceruzzi 277). “[…] corporations like Compaq and Dell would earn more profits selling IBM-compatible computers than IBM would. IBM remained a serious vendor, but the largest winner was Microsoft, whose operating system was sold with each IBM computers and their clones” (Ceruzzi 279).

As Robert Cringely puts it in his documentary, “Microsoft bought outright for $50,000 the operating system they needed, and so they turned around and licensed it for as much as $50 per PC. Consider it. 100 million personal computers running MS-DOS software, funneling billions into Microsoft, the corporate that, back then, was 50 kids managed by a 25 12 months old who needed to scrub his hair.”

Finally, that is indicative of the lasting difference between computers running Microsoft software, which might turn into generally known as ‘PCs’ and Apple products. Apple products are vertically integrated—the hardware, software, and apps are all integrated and tightly controlled. Apple doesn’t sell its OS individually. It wants complete control over the user experience. Apple is a hardware company; Microsoft is a software company.

IBM dominated the PC market within the 80s, with Apple trailing behind. Remember the famous Super Bowl ad in 1984 where Apple positioned themselves because the challenger to the dominant “Big Brother” of IBM? Meanwhile, Microsoft pushed forward with DOS after which Windows. Windows 3 (Haigh and Ceruzzi 266) launched in 1990, bringing graphical user interfaces (GUIs) into the mainstream. Apple had been using GUIs for some time, which Steve Jobs stole from Xerox PARC, but Jobs was still upset at Gates for using them.

By 1993, just 12 years after the IBM PC was launched, nearly 100 million American households (23 percent) had a laptop computer, and this was even before the Web. Nearly all of these computers were what became generally known as “PCs” which really meant “IBM PC compatible.” Due to its open architecture decision, nevertheless, IBM lost its lead in market share by 1994 to “clones” like Compaq and never regained it.

IBM sold its laptop computer business to the Chinese company Lenovo in 2005 for $1.3 billion. Hewlett-Packard bought Compaq in 2002 for $24.2 billion. In 2024, Lenovo (26 percent) and HP (22 percent) still dominate market share, and over 245 million personal computers are sold globally yearly.

The laptop computer boom reshaped data in two ways. First, it forced enterprises to rethink how they stored and managed information, shifting from just a few central mainframes to networks of individual PCs, i.e., the client-server architecture described in the following section. Second, once the Web arrived, adoption exploded. Hundreds of thousands of non-public computers were already wired and able to go.

Tangent on the Gates/Jobs bromance: There’s loads written concerning the young Gates/Jobs rivalry within the 90s. When it comes to the personalities of Steve Jobs and Bill Gates, here’s my take: they were each entitled, bratty children who became entitled, bratty young men. They’d each throw matches after they didn’t get their way and bullied or manipulated those around them to get their way. They usually each smelled terrible. The most important difference in personalities between the 2, so far as I can tell, is that Steve Jobs smelled like shit early on because he convinced himself, despite all evidence on the contrary, that by eating only fruit he didn’t need to shower, while Bill Gates smelled like shit because he’d stay awake all night coding and forget to shower.

3. Client-Server Architecture

We shouldn’t judge IBM too harshly for completely flubbing the laptop computer race, because it was busy dominating enterprise data and the relational database wars. Just kidding, they totally fucked that up too. IBM invented the relational database management system (RDBMS) and decided to not pursue it.

In 1970, Edgar F. Codd, while working at IBM, wrote a paper called, “A relational model of information for big shared data banks,” which defined the relational database model. A relational database stores data as tables, with keys to uniquely discover each row. A structured query language (SQL) is a pc language to retrieve data from and insert data into tables. That is, to today, the usual way data is organized for all the things from medical records to airline schedules (O’Regan 274).

IBM built the IBM System R research project in 1974, marking the primary implementation of SQL (Haigh and Ceruzzi 274). They decided to not commercialize their RDBMS because they desired to preserve revenue from their existing hierarchical database, an example of the “Innovator’s Dilemma” I discussed within the intro. Codd’s paper was public, nevertheless, and others read it and understood the industrial value. Michael Stonebraker of UC Berkeley created INGRES through the 70s using the framework described within the Codd paper (Haigh and Ceruzzi 275), and a young Larry Ellison read the paper and began Software Development Laboratories (SDL) in 1977 with Bob Miner and Ed Oates. They modified their name to Oracle Systems Corporation in 1983.

Oracle’s first product, Oracle Version 2 (there was no Oracle Version 1 because they wanted their product to seem more mature than it was) was released in 1979. They beat IBM to market. IBM’s first industrial relational database management system, SQL/DS was released in 1981, a full 11 years after Codd’s article (Symonds 62).

Through the 80s, database products were focused on either a mainframe architecture or minicomputers. By the best way, the ‘mini’ in minicomputer meant that they were sufficiently small to (hopefully) fit through a doorway, but they were still gigantic. The first players within the database wars of the 80s were Oracle, Sybase (whose code base Microsoft licensed and later forked into Microsoft SQL Server), IBM, and Informix (Symonds 110).

Oracle got here out on top within the database wars. “With the discharge of Oracle 7 and, specifically, Version 7.1 in 1993, Oracle had, for the primary time in several years, unambiguously one of the best database in the marketplace (Symonds 105). While Oracle won the database wars, there was a price. Oracle was so focused on beating other RDBMS that they neglected the “applications” side of the business. The applications side are back office things like financial accounting and procurement (later called Enterprise Resource Planning or ERP), human resources and payroll (Human Capital Management or HCM) and sales and marketing (Customer Relationship Management or CRM). These are things that use the interior data stored within the relational database. Moreover, the world had moved towards personal computers and away from mainframes, even on the office. That meant a brand new architecture was required to administer enterprise data.

In 1992, SAP, the German company founded by former IBM engineers, launched SAP R/3. SAP’s previous product, SAP R/2, released in 1979, was “widely known as probably the most complete and thoroughly engineered of the brand new breed of packaged applications” (Symonds 114). The R/3 version was built for a client-server architecture—capitalizing on the prevalence of non-public computers. This can be a significant event for a lot of reasons. Let’s undergo them one after the other:

First, R/3 used a three-tier model. Users work on their PCs, normally a Windows machine (client tier); this client communicates with SAP’s business logic, normally hosted on a Unix server (tier 2); then all of the information is stored within the third tier, a large database. This was a fundamental architectural shift away from mainframes and towards personal computers. The thought of the client-server architecture was “custom corporate applications running on personal computers that stored their data in a relational database management system running on a server. This combined one of the best features of non-public computing and traditional time sharing systems,” (Haigh and Ceruzzi 275).

Second, it highlights the difference between enterprise data and enterprise applications. The best way data is stored and the best way it’s used at an enterprise are very various things and products meant for one usually are not built for the opposite. Also they are entirely different products, sold otherwise, marketed otherwise, and operated otherwise.

Third, this loss would drive Oracle business decisions for a long time, and they might never catch as much as SAP. As Ray Lane from Oracle stated, “R/3 modified the sport. Although we’d had some success in that area, we weren’t really an application company. Our sales force and our consultants didn’t really understand methods to compete within the applications business. … Against SAP, we were a fraction. So we went on what became a four-year binge to attempt to meet up with SAP. From 1993 through to 1997, our entire application effort was dedicated to attempting to construct features to compete” (Symonds 114-115). Oracle would struggle with applications and eventually buy PeopleSoft and JD Edwards in 2004, Siebel Systems in 2005, and NetSuite in 2016.

And eventually, and partly as a consequence of the three-tier architecture, this led to a boom in “systems integrators,” or SIs, that are corporations focused on helping with the transition to this recent client-server architecture and digitizing internal systems. “SAP had fastidiously nurtured relationships inside the Big Five consulting firms, especially with Andersen Consulting (now called Accenture), the biggest integrator on the earth. When corporations were deciding whether and the way they were going to implement an ERP system, they rarely began off by talking on to the software vendors. As an alternative, they might ask one in every of the consultancies, normally one with which they’d an existing relationship, to guage their business processes after which recommend the software that will best fit their requirements” (Symonds 116).

Andersen Consulting’s revenue from client-server-related projects grew from $309 million in 1990 to almost $2 billion in 1993, employing 10,000 of their people. IBM Global Services, their consulting arm, grew from $4 billion in revenue in 1990 to $24 billion by 1998. In 1997 alone they hired 15,000 people. The dark side of the expansion in ERPs and SIs is potentially best shown by FoxMeyer—a $5 billion drug company that spent $100 million in 1993 to implement SAP R/3, failed, and went bankrupt.

The cynical stance on SIs is that they’re incentivized to make implementing enterprise software as difficult as possible because if anything worked out of the box they wouldn’t be needed. As Ellison said, “IBM recommends that you simply buy lots of different applications from numerous different vendors. In reality, IBM resells applications from SAP, Siebel, i2, Ariba, just about everyone I can consider except Oracle. Then IBM makes a bundle by selling you guys with glue guns to stay all of it together” (Symonds 281).

The potential nightmare of systems integrations and ballooning IT costs is best captured in Dave McComb’s book Software Wasteland (McComb). In his book, McComb explains how most enterprise software is middleware and requires integrations with other software. Not only does this mean huge IT costs, nevertheless it also results in tons of siloed apps. “An estimated ‘35 to 40 percent’ of programmer time in corporate IT departments was spent keeping data in files and databases consistent” (Haigh and Ceruzzi 276).

Integrating enterprise data became an even bigger problem with the rise of the client-server architecture and persevered through web-based and SaaS architectures as we’ll see in the following sections. Repeatedly, the proposed solution was to place your whole data in the identical place, physically or within the cloud, however the differences in underlying schema still prevented a unified database. A possible solution got here from outside of the enterprise data world and on the opposite side of the Atlantic.

4. The World Wide Web

While Ellison was battling SAP, a young man on the European Organization for Nuclear Research (CERN) was devising a way for various computers at his research center to speak with one another. The Web had been around for some time, and was established at research centers like CERN, but not one of the computers “spoke the identical language.” Tim Berners-Lee (TBL) built the World Wide Web in 1993, correctly selecting an acronym with more syllables than the words themselves.

The World Wide Web laid the muse for people to navigate the online by establishing things like URLs and html, but users still needed a browser to truly surf the online. Netscape was founded by Jim Clark and Marc Andreessen in 1994 and launched the primary popular web browser. Sixteen months later, in August 1995, they went public and had a market value of $4.4 billion, the biggest IPO in history, and so they had yet to indicate a profit (Berners-Lee and Fischetti 106). Microsoft, so consumed by the laptop computer, didn’t see the importance of the online early enough. “Microsoft saw the importance of the online and open standards, but its leadership couldn’t imagine solutions that didn’t center on the laptop computer” (Muglia and Hamm 28).

Bill Gates did realize the magnitude of the Web in 1995 and issued a now famous memo to his company where he stated that the Web is “crucial to each a part of our business” and “a very powerful single development to return along for the reason that IBM PC was introduced in 1981.” A method he planned to dominate the browser wars was by packaging their recent browser, Web Explorer, with their recent operating system, Windows 95. This triggered an antitrust lawsuit—United States vs. Microsoft Corp. Microsoft LOST the case and was ordered to be broken up into two corporations: one for producing the operating system Windows and one for producing other software components. They appealed and won, largely since the judge improperly spoke to the media concerning the case, violating codes of conduct.

Netscape released its source code and began the Mozilla Organization in 1998 to allow open source versions of its browser. It was acquired by AOL for $4.2 billion one 12 months later. A part of the acquisition required Andreessen turn into the CTO of AOL, reporting on to former WELL member Steve Case. Microsoft, nevertheless, was dumping $100 million into IE yearly and there have been 1000 people focused on it, which eventually paid off. In 2003, just five years after the AOL acquisition of Netscape, IE held 95 percent of the market.

Microsoft won the primary browser war, at an enormous cost, but this was before anyone really knew methods to make real money from the Web. Netscape sold their browser on to consumers and Microsoft gave theirs away without cost (to kill Netscape). By the point the second browser war rolled around, the business model for Web corporations had turn into clear—collect user data for targeted ads, something Google had pioneered. For this reason, despite veteran CEO Eric Schmidt’s reluctance after witnessing the brutality of the primary browser war, Google entered the second browser war. Google knew there wasn’t money in browsers themselves, however the more people on the internet, the more they search and the more ads they see, and the more cash Google makes. “Chrome was all the time regarded as an operating system for web applications” (Levy 213).

The source code released by Netscape in 1998 was became a brand new browser, appropriately named Phoenix. The browser was renamed Firefox in 2003 on account of trademark claims. Firefox never beat IE but rose to a peak of 32 percent of market share in 2009. Google launched Chrome in 2008, which is now the preferred browser, accounting for 68 percent of market share. Apple’s Safari is the second hottest at 20 percent, and the successor to IE, Edge, is in third with just 5.7 percent.

4.1 Tim Berner’s Lee’s Vision

In his book, “Weaving the Web,” Tim Berners-Lee describes his vision in two parts (Berners-Lee and Fischetti 157). Part one is about human collaboration on the internet. This required standards and protocols so that everybody could access all parts of the online. That was realized by the invention of the URI/URL, HTML, and XML. Due to those standards, browsers like Netscape and Web Explorer could flourish. But he also saw the online not only as a spot to read web pages, but to contribute to them too. This part was never realized in the best way he envisioned—a preferred browser was never invented that allowed editing capabilities on html directly.

The thought of individuals participating on the internet, after all, has been successful. This a part of the vision is said to ‘Web 2.0’, a term popularized by Tim O’Reilly of O’Reilly books on the Web 2.0 conference in 2004. If Web 1.0 was about reading static HTML, then Web 2.0 is about users actively contributing to the online. Wikipedia, the web encyclopedia, accommodates 65 million articles, receives 1.5 billion unique visits a month, and 13 million edits per 30 days. Social media sites like Facebook also allow people to contribute on to the online, though the information is more personal than public (more on Facebook later).

TBL’s vision was grander. The second a part of his vision is about computers collaborating on the internet. “Machines turn into able to analyzing all the information on the Web—the content, links, and transactions between people and computers. A ‘Semantic Web,’ which should make this possible has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy, and our each day lives shall be handled by machines talking to machines, leaving humans to supply the inspiration and intuition” (Berners-Lee and Fischetti 158). There is commonly called “The Semantic Web” or “Web 3.0,” to not be confused with Web3, the thought of a decentralized web built on the blockchain.

The thought behind the Semantic Web is that individuals would connect structured metadata to their html so computers can interpret web pages. The format of the metadata (or semantics) is Resource Description Framework (RDF). RDF data is commonly called “triples” because fairly than storing data in columns and rows, RDF stores the information as a series of statements of the format: subject – predicate – object. These triples allow users to make information on the internet machine-readable. For instance, as a substitute of claiming “Kurt Gödel died in Princeton, Recent Jersey,” you can say: Kurt Gödel (subject) – died (predicate) – Princeton, NJ (object). Likewise, Albert Einstein (subject) – died (predicate) – Princeton, NJ (object). A machine could then infer that Albert Einstein died in the identical town as Gödel. Along with RDF data, languages for describing the RDF metadata exist, allowing users to create ontologies. For instance, we could describe the predicate ‘died’ as being restricted to 1 location per subject, i.e., you’ll be able to only die in a single place. With wealthy ontologies and RDF data, users can create large graphs of information, i.e., Knowledge Graphs, which computers can over.

The Semantic Web never truly took off but its core principles are alive and well in pockets of the online. For instance, there may be a counterpart to Wikipedia called Wikidata that stores Wikipedia data as a structured knowledge graph and provides facts for Wikipedia pages. They’ve a public SPARQL API (SPARQL is like SQL but for triples) where you’ll be able to query the information directly. Here is an example of how yow will discover all individuals who died in the identical town as Gödel. Most web sites don’t offer public SPARQL APIs nevertheless. These technologies (SPARQL, RDF, OWL, SHACL, etc.) are all open source and the standards are maintained by the World Wide Web Consortium (W3C), the non-profit TBL began to make sure interoperability on the internet.

While the technologies haven’t exactly taken off on the general public web, they’ve had success for enterprise data management. The thought of making a wealthy metadata layer to maintain track of and query all of the information on the Web is a bit overwhelming, but the thought of constructing a wealthy metadata layer for an organization, an Enterprise Semantic Layer—a graph of wealthy metadata linking systems, documents, and policies—is more reasonable.

5. Amazon and Google

In February 1994, a Senior Vice President at hedge fund D. E. Shaw & Co. read in a newsletter that the quantity of knowledge transmitted on the Web had increased by an element of about 2300 between January 1993 and January 1994 (Stone 25). Jeffrey Bezos would claim that this was the rationale he quit his hedge fund to start out a web site to sell books. He would claim in interviews that he “got here across this startling statistic that web usage was growing at 2300 percent a 12 months.” This is inaccurate—an element of 2300 means a 230,000 percent increase. Luckily for Jeffrey, he was incorrect in the appropriate direction.

Bezos considered names like makeitso.com (a Star Trek reference) and relentless.com but eventually landed on Amazon.com. They grew quickly without making a profit, competing with existing brick and mortar bookstores who were also selling books online: Barnes and Noble and Borders. Barnes and Noble struggled to pivot—one other case study of the “Innovator’s Dilemma.” “The Riggios were reluctant to lose money on a comparatively small a part of their business and didn’t wish to put their most resourceful employees behind an effort that will siphon sales away from the more profitable stores” (Stone 59). Bezos knew this. In response to a Harvard Business School student who told him he would fail and that he should sell his company to Barnes and Noble, Bezos said, “I feel you could be underestimating the degree to which established brick-and-mortar business, or any company that could be used to doing things a certain way, will find it hard to be nimble or to focus attention on a brand new channel. I suppose we’ll see” (Stone 65).

Amazon began as a web based retail store just like eBay but without the auction component. It began spreading into CDs and DVDs and even digital books (tablets) nevertheless it wasn’t until 2006 with the launch of Amazon Web Services (AWS) that it truly became a tech company and not only one other dot com startup. There’s a preferred story that AWS was began because Amazon needed to construct infrastructure to support the vacation shopping season but that those servers sat idle the remainder of the 12 months. That appears to be unfaithful. Werner Vogels, the Amazon CTO even said so. There are a bunch of reasons Amazon began AWS: they were battling allocating server space internally fast enough to maintain up with growing demand for experimentation; Tim O’Reilly of O’Reilly books made a private appeal to Bezos to share their product catalog with a broader community so he could higher predict trends out there; and Bezos read the book Creation by Steve Grand (Stone 208-211).

Bezos listened to O’Reilly preach about Web 2.0 and the mutual good thing about sharing data and built APIs as a way for developers to higher access the Amazon website (Stone 210). Around the identical time, the Amazon executive book club read Creation, by Steve Grand. Grand created a video game called ‘Creatures’ within the Nineties that allowed you to guide and nurture a creature. No, not like a Tomagotchi. This game, apparently, allowed you to “code artificial life organisms from the genetic level upwards using a classy biochemistry and neural network brains, including simulated senses of sight, hearing and touch”.

“Grand wrote that sophisticated AI can emerge from cybernetic primitives, after which it’s as much as the ‘ratchet of evolution to alter the design,’” (Stone 213). The Amazon team wanted to make use of this framework to encourage developers to create recent and exciting things without prescribing exactly what those things ought to be. The ‘primitives’ for the developer, they concluded, were storage, compute and a database. They released the storage primitive (Easy Storage Solution or S3) in March 2006, followed by the primitive for compute (Elastic Cloud Compute or EC2) a month later (Stone 213 – 214).

Comedy break: Here’s a video of Bezos in a documentary from 1998 talking about his “Web idea” before he began cosplaying as Jean-Luc Piccard. And here’s Bo Burnham performing Jeffrey’s theme song. Come on, Jeff! Get ‘em!

While Bezos was beginning to sell books online, two young PhD students at Stanford were in search of dissertation topics. Larry Page thought that he could devise a greater way of rating the importance of web pages—by counting the variety of other pages that referenced them. A crucial web page could be referenced by many web pages, like how a very important academic journal article is cited by many other articles. The issue is that web pages only inform you what they reference (hyperlinks) but not what references them. Links on the internet only go in a single direction. To know the variety of times a page is linked to from other webpages you wish the entire backlinks, which suggests you may have to scrape your complete web. Page teamed up with one other PhD candidate and math prodigy, Sergey Brin, who specialized in this type of data mining. They called their project ‘BackRub’ since it was all about harvesting these backlinks. They named the algorithm, a variation of eigenvector centrality, PageRank, after Larry Page (Levy 16-17). “We benefit from one central idea: the Web provides its own metadata…It’s because a considerable portion of the Web is concerning the Web…easy techniques that concentrate on a small subset of the doubtless useful data can succeed on account of the dimensions of the online” (Wiggins and Jones 213).

Jon Kleinberg was a postdoctoral fellow at IBM in 1996 and was also fidgeting with the thought of exploiting the link structure of the Web to enhance search results. Through mutual friends, he got in contact with Larry Page and learned about BackRub. By this time, IBM finally learned their lesson and hurried on a technology that will define the following generation of tech corporations. Just kidding, they boofed it again. Kleinberg encouraged Page to jot down an educational paper concerning the technology, but Page declined. Kleinberg went on to a successful academic profession, while Page founded Google but never got his PhD (Levy 26).

Page and Brin eventually realized that this rating would make for search engine, and so they created an organization they called Google, a misspelling of the word for the big number ten to the hundredth power, googol (Levy 31). They began a search company “regardless that there was no clear strategy to become profitable from search” (Levy 20). Soon, they discovered a strategy to become profitable, and it was through a technology that was arguably more vital than PageRank: AdWords. They kept their revenue a secret because they didn’t want anyone else to make use of the identical method for generating revenue. They’d to disclose it as a part of their IPO in 2004 (Levy 70).

The thought is comparatively easy: put sponsored ads at the highest of users’ search results. However it was different from existing online advertisements in several ways. First, the ads were based on the user’s search words—the services or products a user would see an ad for could be relevant. Second, the worth of the ads could be the results of an auction—advertisers could bid against one another to find out the worth of the ad related to the keyword. And three, the advertiser could be charged by the variety of clicks, not the variety of times their ad was seen. Because Google had a lot data about how people searched and were so good at getting users one of the best results possible, they were also experts at putting the suitable ads in front of the appropriate people. This benefitted the advertisers, who got more clicks, Google, who got ad revenue, and sometimes the users, who (hopefully) got ads for exactly what they were looking for.

Before they discovered AdWords, they assumed they might need to rent their search engine out to an Web portal like Yahoo! or Excite to generate revenue, now they might become profitable directly. Their entire business model modified, and so they eventually expanded to promoting on greater than just search results (Levy 95). AdSense was launched three years later, in 2003, and allowed web sites to embed ads directly on their pages. Google was capable of be sure that ads could be relevant to the content on the positioning by matching key themes on the positioning and matching them to ads. They acquired a startup called Applied Semantics to do that (Levy 103). If you happen to ran a webpage, you can sell a portion of it to Google, who would place relevant ads there and offer you a percent of the revenue. Matching ads to keywords on a webpage doesn’t all the time work, nevertheless. An early version of AdSense put an ad for Olive Garden on an article about someone getting food poisoning from Olive Garden (Levy 105).

One 12 months later, in April 2004, Google launched Gmail, a free email service which included a gigabyte of storage for each user. For reference, the biggest existing email services were Microsoft’s Hotmail and Yahoo!, who only offered 2 and 4 megabytes of storage, respectively (Levy 168). To accommodate the large amounts of information storage from web sites and Gmail, together with the entire computations required to index and supply search results for over 200 million queries a day, Google had to construct a ton of information centers.

This information just isn’t public, and Google doesn’t disclose numbers on what number of servers it runs, but Steven Levy, in his book,, said, “In accordance with an industry observer, Data Center Knowledge, there have been twenty-four facilities by 2009, a number Google didn’t confirm or dispute. Google wouldn’t say what number of servers it had in those centers. Google did, nevertheless, eventually say that it’s the biggest computer manufacturer on the earth—making its own servers requires it to construct more units yearly than the industry giants HP, Dell, and Lenovo” (Levy 181).

Following Amazon’s lead, Google launched Google Cloud Storage (the S3 equivalent) in 2010, allowing users to make use of their servers for storage and launched Google Cloud Compute Engine (the EC2 equivalent) in 2012. They continue to be one in every of the large three cloud providers currently (behind AWS and Microsoft Azure). The power to make use of third-party servers to run applications and store data, together with increasing bandwidth, led to a fundamental architectural shift in the best way applications are built and where data lives. The subsequent section explores that architectural upheaval.

6. The Big Switch

Nicolas Carr wrote a book, that’s so good, I sometimes even recommend it to individuals who usually are not data nerds. In it, he draws a parallel between the expansion of electricity as a utility within the late nineteenth century and the rise of cloud computing within the late twentieth century. Here’s a temporary summary, but I definitely recommend this book.

Thomas Edison invented the lightbulb and built all of the required components to display its use for the International Exposition of Electricity in Paris in 1881. There, he also showed blueprints for the world’s first central generating station (Carr 28). He got the generator working the following 12 months. He then built a business focused on licensing the patented system and selling the entire required components. He thought an electrical generator could be an alternative choice to gas utilities, that many would should be built, and that currents wouldn’t must travel far. In reality, because his system relied on direct current, they couldn’t be transmitted far. “Edison had invented the primary viable electric utility, but he couldn’t envision the following logical step: the consolidation of electricity production into giant power plants and creation of a national grid to share the ability” (Carr 30).

Samuel Insull, who worked for Edison, realized that electricity might be sold as a utility. The more you sell, the cheaper it gets, which helps you to sell more. This plan required convincing business owners that they need to stop producing their very own electricity and buy it from a centralized power station—something that had never been done before. Eventually, and clearly, all of us got electrified. Factories got larger and more productive, and modern corporations were formed (Carr 90). Ice corporations disappeared due to refrigeration. Ford created the electrified assembly line to supply the primary mass-produced automotive, the Model T. To rent the factory staff, Ford offered higher wages, which others were forced to match, setting in motion the creation of the fashionable American middle class (Carr 93). As industries became more advanced, they’d to rent scientists, engineers, marketers, designers, and other white-collar employees. This recent group of “knowledge staff” incentivized investments in education—highschool enrollment in 1910 was 30 percent max within the wealthiest areas but went as much as between 70 and 90 percent across the country 25 years later (Carr 94).

Let’s return to the client-server architecture of the early 90s. Remember on this setup, users have personal computers that they hook up with their company’s centralized data centers. That is like an organization running its own electricity generator to power its factory. The logical next step on this architecture is to treat data storage and computation as a utility. This happened (or is currently happening) but was facilitated by just a few things.

First, the Web needed to go from a DARPA research project into mainstream America. In 1991, Tennessee Senator Al Gore created and introduced the High Performance Computing Act of 1991, commonly generally known as the Gore Bill. Yes, that’s right. Al Gore did, to his credit, play an enormous part in making the Web available to all. Before the Gore Bill, it was illegal for ISPs like AOL to connect with the Web, they were “walled gardens” (Isaacson 402). The Gore Bill allowed AOL to provide its users access to the broader Web. The Gore Bill also put $600 million into Web infrastructure, including funding the National Center for Supercomputing Applications (NCSA) on the University of Illinois. An undergrad on the University, Marc Andreessen, worked on the NCSA and learned about TBL’s World Wide Web. He created a browser called Mosaic, which he commercialized into Netscape after graduating. As Vice President, Gore pushed forward the National Information Infrastructure Act of 1993, making the Web available to most of the people and industrial use (Isaacson 402).

By the best way, he never said he invented the Web. Here’s the interview where he said, “During my service in america Congress, I took the initiative in creating the Web.” He misspoke and may have phrased that higher, but Vint Cerf and Bob Kahn, who invent the Web’s protocols said, “Nobody in public life has been more intellectually engaged in helping to create the climate for a thriving Web than the Vice President” (Isaacson 403). Even Newt Gingrich said, “Gore just isn’t the Father of the Web, but in all fairness, Gore is the one that, within the Congress, most systematically worked to be certain that we got to the Web (Isaacson 403). Al Gore had great ideas, but as Jared Dunn from Silicon Valley said, “People don’t wish to follow an idea, they wish to follow a frontrunner. Take a look at the last guy to create a brand new Web. Al Gore. His ideas were excellent, but he talked like a narcoleptic plantation owner, so he lost the presidency to a fake cowboy and now he makes apocalypse porn.”

The opposite reason computing power could turn into a utility is that Amazon, Microsoft, and Google built a shitload of information centers. Amazon began AWS and began renting out its servers. Google launched GCP in 2010. But renting out servers required some additional technologies, specifically virtualization and parallelization. Virtualization is the flexibility for a machine to run multiple operating systems—one server can contain a ‘virtual’ PC running Windows and a ‘virtual’ Linux OS (Haigh and Ceruzzi 368). Amazon’s system runs on virtualization. “If you rent a pc—through Amazon’s EC2 service, you’re not renting real computers. You’re renting virtual machines that exist only within the memory of Amazon’s physical computers. Through virtualization, a single Amazon computer could be programmed to act as if it were many alternative computers, and every of them could be controlled by a distinct customer” (Carr 76). Parallelization is the flexibility to run a task on multiple different servers concurrently (in parallel). Google pioneered this technology with their product, MapReduce.

But there was still an issue: the Web was strung along with phone lines. There was no strategy to transmit computing power very far. The advantages of computing could only be realized by having an information center in-house. This could be like if we were stuck with direct current (DC) electricity, which couldn’t be sent long distances. But we weren’t stuck with DC; we had alternating current (AC), which might be sent long distances. Thanks, Tesla (the person, not the corporate). And we were soon not constrained by telephone poles. Moore’s Law met Grove’s Law. Remember Andy Grove, who each Larry Ellison and Steve Jobs would work for? These two laws coincided. “Moore’s Law says that the ability of microprocessors doubles yearly or two. The second was proposed within the Nineties by Moore’s equally distinguished colleague Andy Grove. Grove’s Law says that telecommunications bandwidth doubles only every century” (Carr 58). This just isn’t true in any respect, by the best way. Telecommunications bandwidth increases much faster than that. Grove said that more as a criticism of telco and regulator progress than as an actual prediction.

Nevertheless, telecommunications was finally catching up. With the rise of fiber-optic cables, Web bandwidth has turn into fast enough for data to stream like electricity. “When the network becomes as fast because the processor, the pc hollows out and spreads across the network,” Eric Schmidt (Carr 60). We at the moment are moving on-premise data centers to the cloud, identical to we moved electricity generators to the ability station. But transitioning computing and storage to the cloud doesn’t just mean we don’t need on-prem data centers any more. The thought of renting these resources enables a wholly recent business model: Software as a Service, or SaaS.

There are just a few things to indicate within the comparison between electricity and cloud computing. First, the “rebound effect” is real. Lower costs don’t shrink workloads; they increase them. Electricity was presupposed to lighten household chores, yet cheaper power led families to run more appliances, and fairly than reducing the hassle to iron clothes, people just expected to iron them day by day (Carr 99). Cloud guarantees to chop IT overhead, but as storage and compute get cheaper, corporations spin up more micro-services, datasets, and integrations than ever. In each cases the rebound effect turns savings into surging demand. The identical pattern is emerging with AI: while it’s marketed as a strategy to ease our workloads, its availability is already raising expectations and workload volumes faster than it reduces effort.

The second take away from the electricity metaphor is that it led to a golden age of prosperity, nevertheless it took some time. Edison invented the lightbulb in 1879, but Henry Ford didn’t create an electrified assembly line until 34 years later, in 1913. Only a long time later, after WWII, did the American middle class hit its post-war peak. If AWS was the lightbulb, and we assume the identical time-delay, a Ford-scale cloud assembly line won’t appear until 2040, and a brand new middle-class boom shall be a generation after that.

7. SaaS / Cloud Computing

7.1 Enterprise Data Moves to the Cloud

As increasingly people began using the Web, an Oracle worker saw the writing on the wall and decided to start out his own company focused on enterprise applications hosted entirely within the cloud. Marc Benioff describes the best way he began Salesforce in his book, , which accommodates advice like how you need to take a year-long sabbatical and check with the Dalai Lama about your small business idea before starting an organization (Benioff 2) and the way you need to hearken to your customers (Benioff 13).

Salesforce was founded in 1999 and surpassed one billion in revenue in five years. Benioff wasn’t the primary to think about this, after all. Oracle had been investing heavily in Web technology because it got wrecked by SAP’s R/3 in 1992. “Client/server could be all right for departmental use, but for any company that desired to unify its operations over numerous different sites, it was a nightmare” (Symonds 143). But while Oracle’s E-Business Suite, launched in 2001, was using web-based technologies, like browsers, it was still hosted on the shoppers infrastructure (on-prem). Salesforce was SaaS from the beginning—they hosted the entire infrastructure themselves and sold subscriptions to their product. Their first “mascot” was SaaSy, which is just the word “software” with a red line through it, indicating the tip of software.

Other enterprise application corporations caught on, but not as fast as Benioff. ServiceNow was founded in 2004 and Workday in 2005, each SaaS-based ERP solutions. To begin, Salesforce hosted its own servers, but eventually began moving to the hyperscalers, together with the opposite ERP vendors. In 2016, Workday chosen AWS as its “primary production cloud platform”, and Salesforce chosen AWS as its “preferred public cloud infrastructure provider”. In 2019, ServiceNow selected Azure as its preferred cloud provider.

7.2 Semantics Tech within the Enterprise

Connecting enterprise data has been a headache through every architectural era. When personal computers entered the workforce, the variety of applications, databases, and integrations increased. Since you’d have multiple apps, it became inconceivable to ask even basic questions on a big company like, “How many individuals work here?” Oracle pushed for “one company, one database” within the 2000s as a strategy to address this pain point (Symonds 168) but soon realized that to run applications off of this database, you wish a unified data structure or schema. “The important thing to all the things was the seemingly esoteric concept of a standard data model uniting each piece of the suite. Every module—and there have been about 140 of them—could be written to the identical shared data schema, allowing semantic consistency (for instance, the definition of a customer remained the identical irrespective of from which application the knowledge was coming and will thus be shared by all the opposite applications within the suite) in addition to a whole view into every transaction” (Symonds 188).

We didn’t learn that lesson when a brand new architecture presented itself. The parallelization technology, MapReduce, that allowed Google to run computations across tens of millions of servers was described in several papers by Jeffrey Dean, Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung in 2003 and 2004. This technology was became an open-source project called Hadoop which allowed anyone to implement cloud computing (Levy 202-203). This essentially allowed corporations to store and compute large datasets across multiple servers, and led to the term ‘data lake’. In contrast to data warehouses, which needed to follow a predefined schema, data lakes might be data of any format. Unfortunately, the flexibility to dump anything into a large lake without an ordinary schema or metadata management layer didn’t work out, as Oracle knew too well.

Data lakes became data swamps. Enterprises stored wastelands of information within the hopes it might be useful in the long run. More recently, Databricks, a cloud-native data management platform has pushed the thought of a “data lakehouse.” The thought is to take the advantages of an information lake (ability to store data and not using a predefined schema) with the advantages of the information warehouse (assurance that transactions are complete, correct, conflict-free, and safely stored, aka ACID).

While the architecture has modified from mainframes to minicomputers to client-server to cloud to SaaS, the underlying problem hasn’t modified: it’s difficult to attach disparate datasets because they don’t speak the identical language. That might mean they follow a distinct metadata structure (schema), are of a distinct format entirely (JSON vs relational vs text), or are in numerous servers. This could be very just like the issue that TBL solved with the World Wide Web. The move to the SaaS/Cloud architecture has only helped with the third problem—keeping data together in the identical virtual server. But colocation doesn’t really allow you to connect datasets. It’s like if I put a bunch of people that spoke different languages in the identical room and expected them to collaborate—you’re going to wish some shared vocabulary or translators or something to bridge that language barrier.

That is where the semantic technologies inspired by TBL are available. While annotating your complete web with structured metadata could also be inconceivable, it’s doable on the enterprise level, a minimum of for a very powerful data. This is commonly called the enterprise semantic layer, and, I imagine, it is going to turn into more vital as we start attempting to get AI (which wasn’t trained on enterprise data) to interact with enterprise data. AI agents need to know your data to utilize it. They should know the of the information, not only the numbers. Semantics is the layer of meaning that connects data and makes it comprehensible to humans and machines.

8. Facebook

While the world was beginning to use Google as a verb and Bezos was expanding Amazon to a full-on empire, a 20-year-old Harvard student saw the social implications of the online. Mark Zuckerberg, attempting to be as cool because the lamest version of Justin Timberlake, began Facebook in his Harvard dorm room.

Facebook began as a way for Harvard students to search out one another. A facebook is a tough copy book of scholars’ (and college) faces that many faculties use to assist students get to know one another. It’s principally a boring yearbook that’s distributed in the beginning of the college 12 months. Zuckerberg allowed Harvard students to make their very own online facebook page, a photograph of themselves together with some additional data like relationship status. He then expanded to other campuses and eventually opened it to everyone.

Before making thefacebook, Zuckerberg scraped photos of the entire undergraduate female students at Harvard and built an app that allowed users to rate their ‘hotness’. He got in trouble for this and learned two vital lessons. One: don’t steal data, let users give it to you and Two: individuals are more voyeuristic than you’d think (Levy 52).

Social networking sites just do that: they permit users to upload their very own data and so they mean you can take a look at pictures of your folks. Other social networking sites like Myspace and Friendster already existed, but one thing that made thefacebook different from the beginning was exclusivity—originally it was just for users with a harvard.edu email address. Even after expanding to everyone, the thought of exclusivity remained within the sense that only people you “accept” can view your profile. This was different from other social networking sites on the time. The whole lot you set on Myspace, for instance, was visible to everyone, a minimum of when it began. By putting this barrier in place, people were more willing to provide much more details about themselves.

As sociologist Danah Boyd said, “Zuckerberg made it interactive. It had a slight social stalking element too. It was addictive. And the undeniable fact that you can see only people in your network was crucial—it allow you to be in public but only within the gaze of eyes you would like to be in public to,” (Levy 67). Eventually, Facebook built a “News Feed” where you’ll be able to see updates about your folks. They quickly realized that the users responded most to stories about themselves. The key of Facebook’s success isn’t a secret in any respect—people just wish to stalk their crushes online and see “news” about themselves. I actually have a theory that the rationale the movie The Social Network is so good is that director David Fincher understands this. As Fincher has said, “I feel individuals are perverts. I’ve maintained that. That’s the muse of my profession.”

Facebook collected data on each user and every user’s friends but didn’t have a transparent business model. They knew they might sell ads but didn’t wish to spend engineering resources on that so that they outsourced all ads to Microsoft (Levy 179). Zuckerberg said, “We don’t wish to spend a single resource here working on promoting…It’s not something we care about. Microsoft wants to construct an promoting business here…and so we’re going to provide our inventory to them and so they’re going to pay us,” (Levy 179).

Eventually, nevertheless, Facebook needed to turn into profitable. Similar to Google hired Schmidt to be the “adult within the room” to an organization founded by young people, Facebook hired Sheryl Sandberg in 2008. She got here from Google and understood that Facebook, identical to Google, was within the promoting business. As Sandberg explained to everyone on her first day, promoting is an inverted pyramid with a large top of demand and a narrow bottom of intent. Google dominates the underside—when people go browsing aspiring to buy something, they seek for it, and Google delivers the relevant ad. Facebook could dominate the broader top of the pyramid, by creating and monetizing demand. Advertisers can get in front of individuals before they even know they need the product (Levy 195). So Facebook became an ad company, and its overall goal became to get its users to spend more time on Facebook and share more personal information so it could serve more ads (Haigh and Ceruzzi 375).

The Dark Side of Facebook

When TBL created the Web and put forth a vision of a utopia where all of us come together, the belief was that more sharing and more openness was an inherently good thing. Web sites should share data and permit others to contribute, and we will all learn more concerning the world. That is true in the case of public data, and it’s how we now have something like Wikipedia. Hundreds of thousands of individuals are coming together to construct the biggest encyclopedia within the history of humankind. But in the case of personal data, it just isn’t really easy. “Walled gardens,” platforms where the admin controls access to data, went against the unique tenets of the World Wide Web. But when that data is about people’s personal preferences, habits, family and health, walled gardens are a necessity. By constructing a platform that enables users to create content that may go viral or pay for targeted ads at specific demographics, nevertheless, Facebook enabled propaganda machines.

Understanding an organization’s data and revenue model can tell us loads about their actions. Facebook (and now Instagram, which they own) collects personal data on people so it might serve targeted ads. The metrics for fulfillment, then, are growth in users and engagement on the positioning. The more people log in to the apps usually, the more ads they see and the more revenue for Meta. Unfortunately, an enormous driver of engagement is outrage—individuals are more likely to interact with content if it upsets them, even whether it is unfaithful. “Humans usually tend to be engaged by a hate-filled conspiracy theory than by a sermon on compassion. So in pursuit of user engagement, the algorithms made the fateful decision to spread outrage” (Harari 199). While not doing anything inherently evil, Facebook’s algorithms set the stage for viral misinformation which has led to hate speech and violence.

What does this mean for the long run? The OpenAI revenue model without delay, together with most AI corporations, relies on subscriptions. What if the revenue model changes to targeted ads like Google and Meta? Then the knowledge AI gives us won’t be aimed toward giving us probably the most ‘accurate’ or ‘truthful’ answer, but the reply that keeps us engaged the longest, interacting with our friends (or enemies) on their platforms, and that encourages us to disclose more personal details about ourselves. In Yuval Noah Harari’s book “Nexus,” he describes a person who tried to kill the Queen of England in 2021 because his AI girlfriend encouraged him to (Harari 211). If Facebook might be became a propaganda machine that contributes to genocide due to the information it collects and the algorithms serving its business model, then AI can too. Probably the most dystopian AI future I see just isn’t Terminator but one where AI girlfriends persuade packs of incels that genocide is cool.

9. The iPhone

The recognition of social media wouldn’t have been possible without handheld computers that we will carry with us all over the place we go. Improved bandwidth and cloud computing technologies have allowed the pc to “hole out and spread across the network” as Eric Schmidt said (Carr 60). However the computer has also shrunk and ended up within the pockets of billions of individuals.

The iPhone was launched in 2007, and there really hasn’t been a more significant or impactful single products for the reason that dawn of the laptop computer in 1981. Yes, there have been smartphones just like the Blackberry before the iPhone, however the iPhone modified all the things. It was a phone, an iPod, and an online communications device. “Are you getting it? These usually are not three separate devices. That is one device. And we’re calling it: iPhone”, Steve Jobs said through the product launch. It had a whole touchscreen with the flexibility to do multi-finger gestures, something that had never been done in a mass-produced product. And it had a 2 megapixel camera. It also had a full operating system (OS X). It was a tool you can keep in your pocket that you can use to view webpages, something that had never existed before (Haigh and Ceruzzi 395). The operating system also meant that apps might be built for it.

The iPhone didn’t really invent anything recent, nevertheless it put all of those pieces together in a way that had never happened before. As Jobs said, “We’ve got all the time been shameless about stealing great ideas”. The thought of getting a tool in your pocket that you can use to hearken to music, watch videos, make phone calls, and browse the web was the stuff of science fiction. In some ways, the iPhone is a success of Stewart Brand’s vision of non-public computing. It is smart that Jobs—a reader of the , which espoused individual empowerment, decentralization, and access to tools—would turn Apple into the biggest company on the earth by constructing probably the most laptop computer ever made.

Listed here are just a number of the ways the iPhone fundamentally modified the tech industry and on a regular basis life for many humans.

Having a pc with an operating system in your pocket meant that apps might be developed. Apple controlled the app store after all, meaning they might control the apps users got to make use of. Gaming were a number of the first popular apps. You may play games like Offended Birds and Candy Crush, which disrupted the gaming industry.

Soon, all types of recent and creating apps might be built that took advantage of iPhone features that weren’t possible before. iPhones had a built-in GPS which meant a restaurant booking site like OpenTable or Resy could now turn into a restaurant booking site for restaurants near your physical location. Likewise, apps for dating based on physical proximity were created. Grindr was launched in 2009 and the hetero version, Tinder, was launched in 2012. GPS also enabled ride share apps like Uber (2009) and Lyft (2012).

Facebook caught on and invested in a mobile version of their product, which quickly became one of the vital popular apps. iPhones had cameras so you can take pictures together with your phone and directly upload them to your Facebook page. As the recognition of taking pictures using phones increased, Instagram was began in 2010 so people could add artsy filters to pictures of their food.

In 2011, the iPhone launched with Siri, an AI-powered virtual assistant (Haigh and Ceruzzi 394 – 400). Then Google created an AI Assistant, Microsoft created Cortana, and Amazon created Alexa. By 2011, Apple sold more smartphones than Nokia and made more in profits than all other cellular phone makers put together (Haigh and Ceruzzi 401). Apple became the primary company with a half-trillion dollar market evaluation in 2012 and have become the primary to a trillion in 2018 (Haigh and Ceruzzi 401). They continue to be one in every of the biggest corporations on the earth by market cap to today.

While there have been many attempts to exchange the iPhone because the device of selection, to date nobody has succeeded. Not even Apple, with its watches and glasses, can get people to trade their iPhones for something else. Nevertheless, OpenAI recently acquired Jony Ive’s (the designer of the iPhone) startup for $6.5 billion and has said they may release a tool in late 2026.

10. Conclusion

In my next post I’ll undergo an accounting of different sources of information and the main players in each sector. For now, here’s a high-level overview of who owns different kinds of information and their revenue models.

Google and Meta are promoting corporations. They become profitable by collecting personal details about people and serving them targeted ads. About 78 percent of Google’s revenue comes from ads and nearly 99 percent of Meta’s revenue comes from ads. For this reason, they need you online so that they can serve you ads. The highest 4 most visited web sites on the earth, as of June 2025 are Google, YouTube (owned by Google), Facebook, and Instagram (owned by Facebook). Google also has a 21 percent market share of the collaborative software industry through Google Workspace and owns Android, the preferred phone OS on the earth. Yet, these are really just tools to get people online to view ads. Google can also be the third largest hyperscaler company on the earth with their Google Cloud Platform, which accounted for over 10 percent of their total revenue in 2023.

Apple is primarily a hardware company—over half their revenue is from the iPhone and a few quarter from other products like MacBooks, iPads, Wearables, etc. Nearly 1 / 4 comes from “services,” which suggests the AppleCare, cloud services, digital content, and payment services. They claim that they only collect user data to “power our services, to process your transactions, to speak with you, for security and fraud prevention, and to comply with law.”

Microsoft is primarily a cloud computing and software company. Azure (and other server and cloud products) accounts for 43 percent of revenue. The second largest money-maker is Office, followed by Windows. Their revenue model relies on subscriptions to their software or cloud computing resources. In addition they own LinkedIn, the seventeenth most visited website on the earth in June 2025, Bing, the twenty fourth, and GitHub.

Amazon is an area exploration company that’s funded by a web based store and a cloud computing service on Earth. That just isn’t a joke—I genuinely imagine that. Zuckerberg and Gates were coders who loved constructing things; Jobs and Woz turned their love of tinkering into an organization that sells computers. Page and Brin were Stanford PhD students who had a passion for math and data and turned a dissertation idea right into a business. All of them followed the thing they were keen about, and it led them to riches. Bezos didn’t spend his childhood dreaming of online retail—he spent it dreaming about space exploration and science fiction. He didn’t start selling books online because he loves books, he began selling books online since it was probably the most practical and lucrative thing to sell online. With Blue Origin, he’s finally starting to appreciate his vision. Congratulations, Jeff!

Amazon online sales (including third-party vendors) accounts for the largest portion of their revenue (39 percent), but AWS is an even bigger share of their operating income (due to higher margins). AWS is the leader in cloud computing since they got there early—they’ve 29 percent of the marketplace for cloud computing, followed by Azure (22 percent) and Google (12 percent).

Let’s return to our framework of non-public, enterprise, and public data:

For private data, Meta and Google dominate and generate revenue from targeted ads. Apple and Amazon also capture a ton of non-public data through devices, they simply don’t use it for targeted ads.

For enterprise data, we will take a look at each database vendors and applications. Relating to database management systems (DBMS), the leaders are Amazon, Microsoft, Oracle, and Google, accounting for 3 quarters of the $100 billion market. IBM and SAP are behind them on the 5 and 6 spots and Snowflake and Databricks are the fastest growing challengers. For applications, Microsoft still leads collaboration with its Office suite (38 percent market share), followed by Google (21 percent). Salesforce leads CRMs (over 20 percent market share). SAP and Oracle are still the ERP leaders but additionally they play in Human Resource Management (HCM), competing with Workday, and Supply-Chain Management. ServiceNow leads IT/Customer Service Management.

Google owns the biggest repo of public data on the earth—Google’s search index accommodates over 100 million gigabytes of information. While Google is proprietary, there are truly public data sources. The three big ones are the Web Archive / WayBack machine which has over 100 petabytes of information, Common Crawl which has greater than 9.5 petabytes of information, and Wikimedia projects which is about 30 terabytes of information. GPT3, and other large language models were trained on these public data sources.

I’m convinced the following wave of AI shall be driven by the businesses that capture the information, how they capture it, what kind of information they capture, and the business models they use to monetize it.

In my next post, I’ll formalize an inventory of questions on the long run of information, the Web, and AI. I’ll use the framework that Philip Tetlock proposes in his book, and implemented in his Good Judgement Project. These shall be predictions with percentages about falsifiable claims concerning the future with dates. This fashion, I’ll have the option to validate my predictions and improve over time. For instance, an issue could be, “. I’ll place my prediction against this query, 20 percent possibly, after which use a Brier rating to calibrate my answers. If a tool with an LLM is shipped this 12 months (the final result of the query is a probability of 1) then the Brier rating for this query could be (0.2 – 1) ^ 2 = 0.64. The goal is to get a Brier rating as near zero as possible.

I’ll create an inventory of relevant questions, my predictions, together with explanations for my predictions. I’d also wish to make this as collaborative as possible by allowing others to make their very own predictions in order that we will collectively come to a greater understanding of the long run of AI.

Works Cited

Benioff, Marc. . Jossey-Bass, 2009.

Berners-Lee, Tim, and Mark Fischetti. . Edited by Mark Fischetti, HarperCollins, 1999.

Carr, Nicholas. . W. W. Norton, 2013.

Ceruzzi, Paul E. . ebrary, 2003.

Gorelik, Alex. . O’Reilly Media, 2019.

Grove, Andrew S. . Knopf Doubleday Publishing Group, 1995.

Haigh, Thomas, and Paul E. Ceruzzi. . MIT Press, 2021.

Harari, Yuval N. . Random House Publishing Group, 2024.

Isaacson, Walter. . Simon & Schuster, 2014.

Isaacson, Walter. . Simon & Schuster, 2011.

Levy, Steven. . Penguin Publishing Group, 2021.

Levy, Steven. . Simon & Schuster, 2021.

McComb, Dave. . Technics Publications, 2018.

Mirchandani, Vinnie. . Deal Architect Incorporated, 2014.

Muglia, Bob, and Steve Hamm. . Skyhorse Publishing, 2023.

O’Regan, Gerard. . Springer International Publishing, 2016.

Stone, Brad. . Simon & Schuster, 2022.

Stone, Brad. . Little, Brown, 2014.

Symonds, Matthew. . Simon & Schuster, 2004.

Tetlock, Philip E., and Dan Gardner. . Crown, 2015.

Wiggins, Chris, and Matthew L. Jones. . W.W. Norton, 2024.

What Can the History of Data Tell Us Concerning the Way forward for AI?

1. Acid Foundations

2. The Personal Computer

3. Client-Server Architecture