Home Artificial Intelligence Pinecone Series B Fundraise Edo == Tony Stark Ram == Vision AI/Data Architecture Changes: that Spark/Databricks Feeling Hits Different Closing the Deal

Pinecone Series B Fundraise Edo == Tony Stark Ram == Vision AI/Data Architecture Changes: that Spark/Databricks Feeling Hits Different Closing the Deal

1
Pinecone Series B Fundraise
Edo == Tony Stark
Ram == Vision
AI/Data Architecture Changes: that Spark/Databricks Feeling Hits Different
Closing the Deal

The backstory of two of essentially the most sensible people I do know, who helped make Pinecone an element of the trendy AI stack (with some Avengers references)

Funding announcement posts are sometimes filled with over-the-top enterprise capitalist claims about vision, foresight, and category mastery. I won’t do this here (or will I?). As a substitute, I’ll discuss our connection to Pinecone — stories that return over a decade with the founding team, resulting in the news today: Pinecone has raised a $100M Series B, led by A16Z; with explosive growth justifying their recent $750M valuation.

Concurrently, I’ll tie in some Avengers analogies. (I’d reference Star Wars but I couldn’t determine who’d be Darth Vader.)

They are saying partnership in enterprise capital is every thing. Thankfully, my partnership with founder Edo Liberty and CTO Ram Sriharsha goes back greater than ten years.

I first met Edo when he was at Yahoo research labs, and I used to be leading engineering teams, a few of which were using Hadoop to count unique users of Yahoo by counting cookies. Yahoo assigns unique cookies to every browser instance on a machine; the variety of cookies in a given day is the union of cookies across multiple browsers, incognito mode, robots, and cookie clearing can reach the high billions of uniques. “select count(distinct(cookies))” at that scale isn’t fun, especially when the underlying JVM is out of heap allocation.

We wanted something superior and after all, reached for hyperloglog. Dissatisfied, we as a substitute prolonged stochastic streaming algorithms to Data Sketches, which is now a popular OSS project. After scientifically solving big data problems at Yahoo, Edo eventually went on to run AI Research Labs at Amazon. I actually have at all times considered him a dynamic, multi-talented and sensible person, with an eye fixed for what’s next but a practical approach. He can be someone who lives life to the fullest (I’m excited to make use of this round to take a position in bubble wrap to guard him from his extreme sports hobbies). He’s just like Tony Stark, except Edo loves his family and other people.

Edo busy designing algorithms while planning a hang gliding adventure in Morocco (this happened)

Ram and I actually have one other parallel and distinct story. Often engineers are described as being “10x” developers. Ram isn’t a 10x developer; he’s a 1000x developer. His intellect jogs my memory of Vision from the Avengers, but with an enormous, caring heart inside like Vision, but Ram is human!

Ram at Pinecone contemplating using the Reality Stone to manifest a custom Linux kernel module so query execution can either be 0.001% faster or 1000% faster. Ram will thank me for not calling him Wanda someday.

Working in the information team together, we were dissatisfied with Hadoop’s performance and wanted more. We took it thus far as to rewrite the entire thing in C++ with a custom file format that appears precisely like Parquet (including metadata within the footer). Having sniffed across the literature for a greater way, we discovered a project on the UC Berkeley AMPLab named Spark. We were intrigued by the graph processing model and immediately hopped on the subsequent BART train to Berkeley to satisfy with Ion Stoica, Matei Zaharia, and Reynold Xin. In rapid succession, we sponsored the lab, hired a few of their grad students as interns at Yahoo. From that, Databricks was born, formed by the AMPLabs team. Ram became an early worker at Databricks and certainly one of their most vital engineers.

There’s an Avengers analogy with BART somewhere. Possibly UC Berkeley is like Wakanda with its science and engineering. [Carnegie Mellon is better, but I’m not biased at all.] For those who read captions this long and have a greater idea, otherwise you’re starting an ideal AI/ML company, mail me at tim@menlovc.com.

Fast forward to 2021 — I used to be CTO at Splunk, and Ram was running our machine learning and security research teams. I left to work at Menlo Ventures — Ram stayed, but we chatted often. I desired to found or incubate an organization with Ram, and we quickly landed on vector embeddings — either applying them against cybersecurity problems or as a database. Ram was still involved with Edo since they’d worked closely together prior to now. When he learned Edo had began a vector database company, Ram joined Pinecone instantly.

At that time, I knew we had one other inflection point in data and AI. I knew this sense — I had it before — it felt exactly just like the day we took BART to Berkeley and met the Spark team that formed Databricks.

Vector embedding databases were at all times going to be the longer term of information. Vectors are the brand new oil, like folks once said, “Data was the brand new oil.” It’s a richer, high fidelity method to represent any data — structured or unstructured. Semantic search is clearly superior to lexical search and goes to alter the search category for many years. The subsequent great enterprise corporations in security, observability, sales, marketing, and more — all of those categories will likely be built on embeddings.

The concept an organization could construct a database for vectors within the cloud as Snowflake did for OLAP was a mind-blowing opportunity that was each impossibly technically difficult and lucrative. If anyone could construct a Snowflake-like cloud database with separation of storage and compute, vertical/horizontal scaling, CRUD semantics and a custom vector storage layer, it was going to be the Edo, Ram, and the Pinecone team.

After I learned Ram joined Pinecone, I made it my mission to get in front of it. I quickly connected with Edo. After exchanging ideas in regards to the art of the possible with vector databases, a couple of dinners (including with Edo’s wife), we eventually reached a deal. Menlo led their Series A in December of 2021.

$17M at $170M post in December 2021 for a vector database when no one understood vector embeddings sounded daring.

Someone will create a Pinecone coin someday. Please don’t email me with that fundraising pitch. Email shawn@menlovc.com.

We were okay to have Menlo look crazy at that time. It was clear: Pinecone can be an anchor piece within the architecture of AI. Though we couldn’t have predicted the timing generative AI hype (crypto was dominant on the time), we did know that Pinecone can be improbable attributable to semantic search, applications in machine learning, and, eventually, language models, just like the ones we’re all in love with today.

Pinecone was already going to be an enormous hit based on semantic search alone. Nevertheless, with the rise of LLMs, developers quickly realized that hallucinations and lack of model freshness attributable to the untenable pair of size and value was an issue. Pinecone filled that gap immediately, to the purpose that the pairing of OpenAI and Pinecone became “a thing” now generally known as the OP stack.

That combination sparked incredible and explosive growth in Pinecone. It is obvious that vector databases will likely be certainly one of the important thing anchor elements of the trendy AI data stack, and that Pinecone is the emerging category leader with a proven team. I’m incredibly proud and excited to be on the journey with Edo and Ram. We’re also thrilled excited to welcome Peter Levine and A16Z to the team as we proceed to design the longer term of AI with Pinecone.

PS: To rejoice this milestone, I cleaned up and promoted the Julia Pinecone API (Pinecone.jl) to 1.0. Thanks to the amazing Pinecone team for keeping me on my toes through the use of every HTTP 20x status code that I didn’t know existed! Silly me to hardcode HTTP 200 when HTTP 202 can be higher!

Also, in the event you’re a Pinecone user, try the Pinecone command line interface I wrote, which helps you manage indexes and CRUD against data.

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here