Anthony Deighton is CEO of Tamr. He has 20 years of experience constructing and scaling enterprise software firms. Most recently, he spent two years as Chief Marketing Officer at Celonis, establishing their leadership within the Process Mining software category and creating demand generation programs leading to 130% ARR growth. Prior to that, he served for 10+ years at Qlik growing it from an unknown Swedish software company to a public company — in roles from product leadership, product marketing and at last as CTO. He began his profession at Siebel Systems learning the way to construct enterprise software firms in quite a lot of product roles.
Are you able to share some key milestones out of your journey within the enterprise software industry, particularly your time at Qlik and Celonis?
I started my profession in enterprise software at Siebel Systems and learned lots about constructing and scaling enterprise software firms from the leadership team there. I joined Qlik when it was a small, unknown, Swedish software company with 95% of the small 60-person team situated in Lund, Sweden. I joke that since I wasn’t an engineer or a salesman, I used to be put accountable for marketing. I built the marketing team there, but over time my interest and contributions gravitated towards product management, and eventually I became Chief Product Officer. We took Qlik public in 2010, and we continued as a successful public company. After that, we desired to do some acquisitions, so I began an M&A team. After a protracted and fairly successful run as a public company, we eventually sold Qlik to a personal equity firm named Thoma Bravo. It was, as I wish to say, the total life cycle of an enterprise software company. After leaving Qlik, I joined Celonis, a small German software company trying to achieve success selling within the U.S. Again, I ran marketing because the CMO. We grew in a short time and built a really successful global marketing function.
Each Celonis and Qlik were focused on the front end of the info analytics challenge – how do I see and understand data? In Qlik’s case, that was dashboards; in Celonis’ case it was business processes. But a typical challenge across each was the info behind these visualizations. Many purchasers complained that the info was fallacious: duplicate records, incomplete records, missing silos of information. That is what attracted me to Tamr, where I felt that for the primary time, we’d give you the chance to unravel the challenge of messy enterprise data. The primary 15 years of my enterprise software profession was spent visualizing data, I hope that the subsequent 15 may be spent cleansing that data up.
How did your early experiences shape your approach to constructing and scaling enterprise software firms?
One vital lesson I learned within the shift from Siebel to Qlik was the facility of simplicity. Siebel was very powerful software, but it surely was killed available in the market by Salesforce.com, which made a CRM with many fewer features (“a toy” Siebel used to call it), but customers could get it up and running quickly since it was delivered as a SaaS solution. It seems obvious today, but on the time the wisdom was that customers bought features, but what we learned is that customers put money into solutions to unravel their business problems. So, in case your software solves their problem faster, you win. Qlik was a straightforward solution to the info analytics problem, but it surely was radically simpler. In consequence, we could beat more feature-rich competitors equivalent to Business Objects and Cognos.
The second vital lesson I learned was in my profession transition from marketing to product. We expect of those domains as distinct. In my profession I even have found that I move fluidly between product and marketing. There may be an intimate link between what product you construct and the way you describe it to potential customers. And there’s an equally vital link between what prospects demand and what product we should always construct. The flexibility to maneuver between these conversations is a critical success factor for any enterprise software company. A standard reason for a startup’s failure is believing “when you construct it, they are going to come.” That is the common belief that when you just construct cool software, people will line up to purchase it. This never works, and the answer is a sturdy marketing process connected together with your software development process.
The last idea I’ll share links my academic work with my skilled work. I had the chance at business school to take a category about Clay Christensen’s theory of disruptive innovation. In my skilled work, I even have had the chance to experience each being the disruptor and being disrupted. The important thing lesson I’ve learned is that any disruptive innovation is a results of an exogenous platform shift that makes the inconceivable finally possible. In Qlik’s case it was the platform availability of enormous memory servers that allowed Qlik to disrupt traditional cube-based reporting. At Tamr, the platform availability of machine learning at scale allows us to disrupt manual rules-based MDM in favor of an AI-based approach. It’s vital to all the time work out what platform shift is driving your disruption.
What inspired the event of AI-native Master Data Management (MDM), and the way does it differ from traditional MDM solutions?
The event of Tamr got here out of educational work at MIT (Massachusetts Institute of Technology) around entity resolution. Under the educational leadership of Turing Award winner Michael Stonebraker, the query the team were investigating was “can we link data records across lots of of 1000’s of sources and tens of millions of records.” On the face of it, that is an insurmountable challenge since the more records and sources the more records each possible match must be in comparison with. Computer scientists call this an “n-squared problem” because the issue increases geometrically with scale.
Traditional MDM systems try to unravel this problem with rules and enormous amounts of manual data curation. Rules don’t scale because you possibly can never write enough rules to cover every corner case and managing 1000’s of rules is a technical impossibility. Manual curation is amazingly expensive since it relies on humans to attempt to work through tens of millions of possible records and comparisons. Taken together, this explains the poor market adoption of traditional MDM (Master Data Management) solutions. Frankly put, nobody likes traditional MDM.
Tamr’s easy idea was to coach an AI to do the work of source ingestion, record matching, and value resolution. The wonderful thing about AI is that it doesn’t eat, sleep, or take vacation; additionally it is highly parallelizable, so it may tackle huge volumes of information and churn away at making it higher. So, where MDM was once inconceivable, it’s finally possible to attain clean, consolidated up-to-date data (see above).
What are the most important challenges firms face with their data management, and the way does Tamr address these issues?
The primary, and arguably an important challenge firms face in data management is that their business users don’t use the info they generate. Or said in a different way, if data teams don’t produce high-quality data that their organizations use to reply analytical questions or streamline business processes, then they’re wasting money and time. A primary output of Tamr is a 360 page for each entity record (think: customer, product, part, etc.) that mixes all of the underlying 1st and third party data so business users can see and supply feedback on the info. Like a wiki in your entity data. This 360 page can be the input to a conversational interface that enables business users to ask and answer questions with the info. So, job one is to offer the user the info.
Why is it so hard for firms to offer users data they love? Because there are three primary hard problems underlying that goal: loading a brand new source, matching the brand new records into the prevailing data, and fixing the values/fields in data. Tamr makes it easy to load latest sources of information because its AI routinely maps latest fields into an outlined entity schema. Which means that no matter what a brand new data source calls a specific field (example: cust_name) it gets mapped to the suitable central definition of that entity (example: “customer name”). The following challenge is to link records that are duplicates. Duplication on this context implies that the records are, in reality, the identical real-world entity. Tamr’s AI does this, and even uses external third party sources as “ground truth” to resolve common entities equivalent to firms and folks. A superb example of this may be linking all of the records across many sources for a vital customer equivalent to “Dell Computer.” Lastly, for any given record there could also be fields that are blank or incorrect. Tamr can impute the proper field values from internal and third party sources.
Are you able to share successful story where Tamr significantly improved an organization’s data management and business outcomes?
CHG Healthcare is a significant player within the healthcare staffing industry, connecting expert healthcare professionals with facilities in need. Whether it’s temporary doctors through Locums, nurses with RNnetwork, or broader solutions through CHG itself, they supply customized staffing solutions to assist healthcare facilities run easily and deliver quality care to patients.
Their fundamental value proposition is connecting the suitable healthcare providers with the suitable facility at the suitable time. Their challenge was that they didn’t have an accurate, unified view of all of the providers of their network. Given their scale (7.5M+ providers), it was inconceivable to maintain their data accurate with legacy, rules-driven approaches without breaking the bank on human curators. In addition they couldn’t ignore the issue since their staffing decisions trusted it. Bad data for them could mean a provider gets more shifts than they will handle, resulting in burnout.
Using Tamr’s advanced AI/ML capabilities, CHG Healthcare reduced duplicate physician records by 45% and almost completely eliminated the manual data preparation that was being done by scarce data & analytics resources. And most significantly, by having a trusted and accurate view of providers, CHG is in a position to optimize staffing, enabling them to deliver a greater customer experience.
What are some common misconceptions about AI in data management, and the way does Tamr help dispel these myths?
A standard misconception is that AI needs to be “perfect”, or that rules and human curation are perfect in contrast to AI. The truth is that rules fail on a regular basis. And, more importantly, when rules fail, the one solution is more rules. So, you’ve an unmanageable mess of rules. And human curation is fallible as well. Humans may need good intentions (although not all the time), but they’re not all the time right. What’s worse, some human curators are higher than others, or just might make different decisions than others. AI, in contrast, is probabilistic by nature. We are able to validate through statistics how accurate any of those techniques are, and after we will we find that AI is cheaper and more accurate than any competing alternative.
Tamr combines AI with human refinement for data accuracy. Are you able to elaborate on how this mixture works in practice?
Humans provide something exceptionally vital to AI – they supply the training. AI is admittedly about scaling human efforts. What Tamr looks to humans for is the small variety of examples (“training labels”) that the machine can use to set the model parameters. In practice what this looks like is humans spend a small period of time with the info, giving Tamr examples of errors and mistakes in the info, and the AI runs those lessons across the total data set(s). As well as, as latest data is added, or data changes, the AI can surface instances where it’s struggling to confidently make decisions (“low confidence matches”) and ask the human for input. This input, after all, goes to refine and update the models.
What role do large language models (LLMs) play in Tamr’s data quality and enrichment processes?
First, it’s vital to be clear about what LLMs are good at. Fundamentally, LLMs are about language. They produce strings of text which mean something, and so they can “understand” the meaning of text that’s handed to them. So, you possibly can say that they’re language machines. So for Tamr, where language is very important, we use LLMs. One obvious example is in our conversational interface which sits on top of our entity data which we affectionately call our virtual CDO. If you speak to your real-life CDO they understand you and so they respond using language you understand. This is strictly what we’d expect from an LLM, and that is strictly how we use it in that a part of our software. What’s worthwhile about Tamr on this context is that we use the entity data as context for the conversation with our vCDO. It’s like your real-life CDO has ALL your BEST enterprise data at their fingertips once they reply to your questions – wouldn’t that be great!
As well as, there are instances where in cleansing data values or imputing missing values, where we would like to make use of a language-based interpretation of input values to search out or fix a missing value. For instance, you would possibly ask from the text “5mm ball bearing” what’s the scale of the part, and an LLM (or an individual) would accurately answer “5mm.”
Lastly, underlying LLMs are embedding models which encode language intending to tokens (think words). These may be very useful for calculating linguistic comparison. So, while “5” and “five” share no characters in common, they’re very close in linguistic meaning. So, we are able to use this information to link records together.
How do you see the longer term of information management evolving, especially with advancements in AI and machine learning?
The “Big Data” era of the early 2000s must be remembered because the “Small Data” era. While numerous data has been created over the past 20+ years, enabled by the commoditization of storage and compute, nearly all of data that has had an impact within the enterprise is comparatively small scale — basic sales & customer reports, marketing analytics, and other datasets that would easily be depicted in a dashboard. The result’s that most of the tools and processes utilized in data management are optimized for ‘small data’, which is why rules-based logic, supplemented with human curation, remains to be so outstanding in data management.
The way in which people need to use data is fundamentally changing with advancements in AI and machine learning. The thought of “AI agents” that may autonomously perform a good portion of an individual’s job only works if the agents have the info they need. In the event you’re expecting an AI agent to serve on the frontlines of customer support, but you’ve five representations of “Dell Computer” in your CRM and it is not connected with product information in your ERP, how are you going to expect them to deliver high-quality service when someone from Dell reaches out?
The implication of that is that our data management tooling and processes might want to evolve to handle scale, which suggests embracing AI and machine learning to automate more data cleansing activities. Humans will still play an enormous role in overseeing the method, but fundamentally we’d like to ask the machines to do more in order that it’s not only the info in a single dashboard that’s accurate and complete, but it surely’s nearly all of data within the enterprise.
What are the most important opportunities for businesses today relating to leveraging their data more effectively?
Increasing the number of how that individuals can eat data. There’s no doubt that improvements in data visualization tools have made data rather more accessible throughout the enterprise. Now, data and analytics leaders must look beyond the dashboard for methods to deliver value with data. Interfaces like internal 360 pages, knowledge graphs, and conversational assistants are being enabled by latest technologies, and provides potential data consumers more ways to make use of data of their day-to-day workflow. It’s particularly powerful when these are embedded within the systems that individuals already use, equivalent to CRMs and ERPs. The fastest option to create more value from data is by bringing the info to the individuals who can use it.