Matt Hocking is the co-founder and CEO of WellSaid Labs, a number one enterprise-grade AI Voice Generator. He has greater than 15 years of experience leading teams and delivering technology solutions at scale.
Your background is fairly entrepreneurial, how did you initially get entangled in AI?
I suppose I’ve at all times considered myself pretty entrepreneurial. I began my first business out of faculty and with a background in product design, have found myself gravitating toward helping folks with early-stage ideas. Throughout my profession, I’ve been lucky enough to work with various startups which have gone on to have some pretty incredible runs. During those experiences, I’ve had exposure to loads of great founders first-hand, in turn inspiring me to pursue my very own ideas as a founder. AI was relatively recent to me once I joined AI2; nevertheless, that have provided me with a chance to use my product and startup lens to some truly amazing research and picture how these recent advancements were going to have the ability to assist loads of folks in the approaching years. My goal because the starting has been to develop real businesses for real people, and I feel AI has the potential to create loads of exciting opportunities and efficiencies in our future if applied thoughtfully.
Could you share the story of how the concept for WellSaid Labs was conceived whenever you were an entrepreneur in residence at The Allen Institute for AI?
I joined The Allen Institute for Artificial Intelligence (AI2) as an Entrepreneur in Residence in 2018. Arguably probably the most progressive incubator on the planet, AI2 houses the brightest minds in AI that apply solutions from the sting of what’s possible today to tangible products that solve problems across the globe. My background in design and technology nurtured a long-time interest within the creative fields, and with the AI boom we’re all witnessing today, I desired to explore a solution to connect the 2. I used to be introduced to Michael Petrochuk (WellSaid Labs co-founder and CTO) while developing an interactive healthcare app that guided the patient through various sensitive scenarios. Through the means of developing the content for the experience, my team worked with voice talent to pre-record hundreds of lines of voiceover for the avatar. After I was exposed to a few of the breakthroughs Michael had achieved during his research, we each quickly saw the worth of how human-parity text-to-speech (TTS) could transform not only the product I used to be working on but in addition impact various other applications and industries. Technology and tooling had struggled to maintain up with the needs of producers creating with voice as a medium. We saw a path to putting this technology within the hands of all creators, allowing voice to be an integral a part of all stories.
WellSaid Labs is one in every of the few corporations that gives voice actors with an avenue into the AI voiceover space. Why did you suspect it was necessary to integrate real voices into the product?
Our answer to that is two-pronged: first, we desired to create solutions that complimented skilled voice actors’ capabilities, expanding opportunities for voice. And second, we try to have the best level of human quality in our products. Our voice actors are long-term collaborative partners and receive compensation and revenue share for each their voice data and the next content produced with it. Every voice actor we hire to create an AI voice avatar based on the likeness of their voice is paid based on how much their voice is used on our platform. We encourage talent to partner with us; fair compensation for his or her contributions is incredibly necessary to us.
To supply the best level of human-quality products in the marketplace, we have to be rigorous about where we get our data. This process gives us more control over the standard, as we train our deep learning models to talk each to human parity and specific contextually relevant styles. We don’t just create a voice that recites the provided input. Our models offer a wide range of voice styles that perform what’s on the page. Whether users are creating voiceover through the use of an avatar from our library or creating voiceover with a custom-built voice for his or her brand, we use real voice data to make sure a seamless process and easy-to-use platform. If our customers had to control and edit our voices in post-production, the means of getting the specified output could be clunky and long. Our voices take the context of the written content and supply a contextually accurate reading. We provide voices for all sorts of use cases – whether it’s reading the news, making an audio ad, or automated call center support – so partnering with skilled voice talent specific for every use case provides us with each the context and high-quality voice data.
We frequently update and add recent styles and accents to our avatar library to make sure that we represent the voices of our customers. In WellSaid Labs’ Studio, customers and types can audition different voices based on region, style, and use case, allowing for a more seamless, unified production of audio content personalized to the maker’s needs. Once an initial recording is sampled, users can cue specific words, spellings, and pronunciations to make sure the AI consistently speaks specifically to their needs.
WellSaid Labs is staking its claim as the primary ethical AI voice platform. Why are AI ethics necessary to you?
As AI adoption increases and becomes more mainstream, fears of harmful use cases and bad actors are at the middle of each conversation – and these concerns are unfortunately validated by real-world occurrences. AI voice isn’t any exception; nearly every single day, a recent report of a star, public figure or politician being deepfaked for advertisements or political purposes makes news headlines. Though formal federal regulation regarding this technology continues to be evolving, detecting and combating malicious actors and uses of synthetic voice will change into increasingly difficult because the technology continues to advance.
Coming from AI2, where AI ethics is a core principle, Michael and I had these conversations on day one. Developing AI speech technology comes with significant responsibilities regarding consent, privacy, and overall safety. We all know that we, as developers, must construct our technology safely, address ethical concerns, and lay the groundwork for the long run development of synthetic voices. We recognize the potential of AI speech technology for misuse and embrace our responsibility to scale back the potential misuse of our product. We want to put this foundation from day one reasonably than run fast and make mistakes along the way in which. That wouldn’t be doing right by our enterprise customers and voice actors, who count on us to construct a high-quality, trustworthy product.
We fully support the decision for laws on this field; nevertheless, we won’t wait for federal regulations to be enacted. We’ve at all times prioritized and can proceed to prioritize practices that support privacy, security, transparency, and accountability.
We strictly abide by our company’s ethical code of intent, which relies on constructing with responsible innovation in every decision we make. That is in the very best interest of our global customers – enterprise brands.
How do you develop an ethical AI voice platform?
WellSaid Labs has been committed to moral innovation from the beginning. We centralize trust and transparency through the usage of in-house data models, explicit consent requirements, our content moderation program, and our commitment to brand protection. At WellSaid, we lean on the principles of Responsible AI to shape our decisions and designs, and people principles extend to the usage of our voices. Our code of ethics represents these principles as Accountability, Transparency, Privacy and Security, and Fairness.
Accountability: We maintain strict standards for appropriate content, prohibiting the usage of our voices for content that’s harmful, hateful, fraudulent, or intended to incite violence. Our Trust & Safety team upholds these standards with a rigorous content moderation program, blocking and removing users who try to violate our Terms of Service.
Transparency: We require explicit consent before constructing an artificial voice with someone’s voice data. Users are usually not capable of upload voice data from politicians, celebrities, or anyone else to create a clone of their voice unless we have now that person’s explicit, written consent.
Privacy and Security: We protect the identities of our voice actors through the use of stock images and aliases to represent the synthetic voices. We also encourage them to exercise caution about how and with whom they share their association with WellSaid Labs or other synthetic voice corporations to scale back the chance for misuse of their voice.
Fairness: We compensate all voice actors who provide voice data for our platform, and we offer them with ongoing revenue share for the usage of the synthetic voice we construct with their data.
Together with these principles, we also strictly respect mental property. We don’t claim ownership over the content provided by our users or voice actors. We prioritize integrity, fairness, and transparency in all the things we do, ensuring that our synthetic speech technology is used responsibly and ethically. We actively seek partnerships with voices from diverse backgrounds and experiences to make sure that we offer a voice for everybody.
Our commitment to responsible innovation and developing AI voice technology with ethics in mind sets us other than others within the space who’re searching for to capitalize on a recent, unregulated industry through any means. Our early investments in ethics, safety, and privacy establish trust and loyalty inside our voice actors and customers, who increasingly seek ethically-made services from the businesses on the forefront of innovation.
WellSaid Labs has created its own in-house AI model that enabled its AI voices to realize human parity, and it has achieved this by bringing the imperfections humans must conversations. What’s it about these imperfections that make the AI higher, and the way are these imperfections implemented?
WellSaid Labs isn’t just one other TTS generator. Where early TTS technology was unable to acknowledge human speech qualities like pitch, tone, and dialect that convey the context and emotion behind the words, WellSaid voices have achieved human parity, bringing uniquely human imperfections to AI-generated speech.
Our primary measure of voice quality is and has at all times been human naturalness. This guiding belief has shaped our technology at every stage, from the script libraries we’ve built to the instructions we give talent and, more recently, how we iterate on our core TTS algorithms.
We train on authentic human vocalizations. Our voice talent reads their scripts authentically and engagingly after they record for us. Speech perfection, then again, is a mechanical concept that results in a robotically flawless, unnatural output. When skilled voice talent performs, their rate of speech fluctuates. Their loudness moves along side the content they’re reading. Their vocal pitch may rise in a passage requiring an excited read and fall again in a more somber line. These dynamic variations make up a fascinating human vocal performance.
By constructing AI processes that work in coordination with the dynamic performances of our skilled talent, we have now built a very natural TTS platform. We developed the primary long-form TTS system with predictive controls throughout your entire creative process. Our phonetic library holds a various collection of audio data, allowing users to include specific vocal cues, like pronunciation guidance or controllability, into the model in the course of the production phase. In a single platform, WellSaid users can record, edit, and stylize their voiceover without having to import external data.
Could you discuss a few of the challenges behind constructing a text-to-speech (TTS) AI company?
The event of AI voice technology has created a wholly recent set of obstacles for each its producers and consumers. One among the foremost challenges just isn’t getting caught up within the noise and hype that floods the AI sector. As a recent, buzzy technology, many organizations try to money in on short-term AI voiceover developments. We would like to offer a voice for everybody, guided by central ethical principles and authenticity. This adherence to authenticity can delay the event and deployment of our technologies but solidifies the protection and security of WellSaid voices and their data.
One other challenge of developing our TTS platform was developing specific consent guidelines to make sure that organizations or individual actors won’t misuse our technology. To combat this challenge, we search out collaborative, long-term partnerships and are fully involved with voiceover development to extend accountability, transparency, and user security. We actively seek partnerships with voice talent from various backgrounds, organizations, and experiences to make sure that WellSaid Labs’ library of voices reflects its creators and audiences. These processes are designed to be intentional and detail-oriented to make sure our technology is getting used as safely and ethically as possible, which may slow the event and launch timeline.
What’s your vision for the long run of generative AI voices?
For the longest time, AI speech technology has not reached high enough quality to enable corporations to create meaningful content at scale. Now that audio technology now not requires expensive equipment and hardware, all written content could be produced and published in an audio format to create engaging, multi-modal experiences.
Today, AI voices can produce human-like audio and capture the nuance required to make digital storytelling more accessible and natural. The long run of generative AI voice can be all-encompassing audible experiences that touch every aspect of our lives. As technology continues to advance, we are going to see increasingly natural and expressive synthetic voices blur the road between human and machine-generated speech – opening recent doors for business, communications, accessibility, and the way we interact with the world around us.
Businesses will find enhanced personalization in AI voice interfaces and use them to make interactions with virtual assistants more immersive and user-friendly. These enhancements are happening already, from intelligent call center agents to fast-food drive-thrus. Content creation, including promoting, product marketing, news narration, podcasts, audiobooks, and other multimedia, will see increased efficiency through the use of tools to develop engaging content – ultimately increasing lift and revenue for organizations, especially now that multilingual models can expand an organization’s reach from a single point of origin to having a worldwide presence. Production teams will find great profit in synthetic voices to create voices tailor-made to the brand’s needs or customized to the listener.
Before the introduction of AI, TTS technology lacked the crucial human emotion, intonation, and pronunciation abilities required to inform a full story at scale and with ease. Now, AI-powered TTS offers more immersive and accessible experiences, including real-time speech capabilities and interactive conversational agents.
Achieving human-like speech capabilities has been a journey, but now that it’s attainable, we’re witnessing the entire scope of AI voice to create real business value for organizations.