Home Artificial Intelligence Pinecone Series B Fundraise Edo == Tony Stark Ram == Vision AI/Data Architecture Changes: that Spark/Databricks Feeling Hits Different Closing the Deal

Pinecone Series B Fundraise Edo == Tony Stark Ram == Vision AI/Data Architecture Changes: that Spark/Databricks Feeling Hits Different Closing the Deal

2
Pinecone Series B Fundraise
Edo == Tony Stark
Ram == Vision
AI/Data Architecture Changes: that Spark/Databricks Feeling Hits Different
Closing the Deal

The backstory of two of probably the most sensible people I do know, who helped make Pinecone a component of the fashionable AI stack (with some Avengers references)

Funding announcement posts are sometimes stuffed with over-the-top enterprise capitalist claims about vision, foresight, and category mastery. I won’t do this here (or will I?). As a substitute, I’ll speak about our connection to Pinecone — stories that return over a decade with the founding team, resulting in the news today: Pinecone has raised a $100M Series B, led by A16Z; with explosive growth justifying their latest $750M valuation.

Concurrently, I’ll tie in some Avengers analogies. (I’d reference Star Wars but I couldn’t determine who’d be Darth Vader.)

They are saying partnership in enterprise capital is every thing. Thankfully, my partnership with founder Edo Liberty and CTO Ram Sriharsha goes back greater than ten years.

I first met Edo when he was at Yahoo research labs, and I used to be leading engineering teams, a few of which were using Hadoop to count unique users of Yahoo by counting cookies. Yahoo assigns unique cookies to every browser instance on a machine; the variety of cookies in a given day is the union of cookies across multiple browsers, incognito mode, robots, and cookie clearing can reach the high billions of uniques. “select count(distinct(cookies))” at that scale isn’t fun, especially when the underlying JVM is out of heap allocation.

We wanted something superior and after all, reached for hyperloglog. Dissatisfied, we as a substitute prolonged stochastic streaming algorithms to Data Sketches, which is now a popular OSS project. After scientifically solving big data problems at Yahoo, Edo eventually went on to run AI Research Labs at Amazon. I even have at all times considered him a dynamic, multi-talented and sensible person, with an eye fixed for what’s next but a practical approach. He can be someone who lives life to the fullest (I’m excited to make use of this round to speculate in bubble wrap to guard him from his extreme sports hobbies). He’s just like Tony Stark, except Edo loves his family and other people.

Edo busy designing algorithms while planning a hang gliding adventure in Morocco (this happened)

Ram and I even have one other parallel and distinct story. Often engineers are described as being “10x” developers. Ram isn’t a 10x developer; he’s a 1000x developer. His intellect jogs my memory of Vision from the Avengers, but with an enormous, caring heart inside like Vision, but Ram is human!

Ram at Pinecone contemplating using the Reality Stone to manifest a custom Linux kernel module so query execution can either be 0.001% faster or 1000% faster. Ram will thank me for not calling him Wanda at some point.

Working in the information team together, we were dissatisfied with Hadoop’s performance and wanted more. We took it thus far as to rewrite the entire thing in C++ with a custom file format that appears precisely like Parquet (including metadata within the footer). Having sniffed across the literature for a greater way, we discovered a project on the UC Berkeley AMPLab named Spark. We were intrigued by the graph processing model and immediately hopped on the following BART train to Berkeley to fulfill with Ion Stoica, Matei Zaharia, and Reynold Xin. In rapid succession, we sponsored the lab, hired a few of their grad students as interns at Yahoo. From that, Databricks was born, formed by the AMPLabs team. Ram became an early worker at Databricks and considered one of their most vital engineers.

There’s an Avengers analogy with BART somewhere. Possibly UC Berkeley is like Wakanda with its science and engineering. [Carnegie Mellon is better, but I’m not biased at all.] Should you read captions this long and have a greater idea, otherwise you’re starting an amazing AI/ML company, mail me at tim@menlovc.com.

Fast forward to 2021 — I used to be CTO at Splunk, and Ram was running our machine learning and security research teams. I left to work at Menlo Ventures — Ram stayed, but we chatted often. I desired to found or incubate an organization with Ram, and we quickly landed on vector embeddings — either applying them against cybersecurity problems or as a database. Ram was still involved with Edo since they’d worked closely together previously. When he learned Edo had began a vector database company, Ram joined Pinecone straight away.

At that time, I knew we had one other inflection point in data and AI. I knew this sense — I had it before — it felt exactly just like the day we took BART to Berkeley and met the Spark team that formed Databricks.

Vector embedding databases were at all times going to be the longer term of knowledge. Vectors are the brand new oil, like folks once said, “Data was the brand new oil.” It’s a richer, high fidelity technique to represent any data — structured or unstructured. Semantic search is clearly superior to lexical search and goes to alter the search category for a long time. The following great enterprise corporations in security, observability, sales, marketing, and more — all of those categories can be built on embeddings.

The concept an organization could construct a database for vectors within the cloud as Snowflake did for OLAP was a mind-blowing opportunity that was each impossibly technically difficult and lucrative. If anyone could construct a Snowflake-like cloud database with separation of storage and compute, vertical/horizontal scaling, CRUD semantics and a custom vector storage layer, it was going to be the Edo, Ram, and the Pinecone team.

Once I learned Ram joined Pinecone, I made it my mission to get in front of it. I quickly connected with Edo. After exchanging ideas in regards to the art of the possible with vector databases, a number of dinners (including with Edo’s wife), we eventually reached a deal. Menlo led their Series A in December of 2021.

$17M at $170M post in December 2021 for a vector database when no person understood vector embeddings sounded daring.

Someone will create a Pinecone coin at some point. Please don’t email me with that fundraising pitch. Email shawn@menlovc.com.

We were okay to have Menlo look crazy at that time. It was clear: Pinecone can be an anchor piece within the architecture of AI. Though we couldn’t have predicted the timing generative AI hype (crypto was dominant on the time), we did know that Pinecone can be incredible attributable to semantic search, applications in machine learning, and, eventually, language models, just like the ones we’re all in love with today.

Pinecone was already going to be a large hit based on semantic search alone. Nevertheless, with the rise of LLMs, developers quickly realized that hallucinations and lack of model freshness attributable to the untenable pair of size and value was an issue. Pinecone filled that gap immediately, to the purpose that the pairing of OpenAI and Pinecone became “a thing” now generally known as the OP stack.

That combination sparked incredible and explosive growth in Pinecone. It is obvious that vector databases can be considered one of the important thing anchor elements of the fashionable AI data stack, and that Pinecone is the emerging category leader with a proven team. I’m incredibly proud and excited to be on the journey with Edo and Ram. We’re also thrilled excited to welcome Peter Levine and A16Z to the team as we proceed to design the longer term of AI with Pinecone.

PS: To rejoice this milestone, I cleaned up and promoted the Julia Pinecone API (Pinecone.jl) to 1.0. Thanks to the amazing Pinecone team for keeping me on my toes by utilizing every HTTP 20x status code that I didn’t know existed! Silly me to hardcode HTTP 200 when HTTP 202 can be higher!

Also, in the event you’re a Pinecone user, try the Pinecone command line interface I wrote, which helps you manage indexes and CRUD against data.

2 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here