Elevating Voices: Pixigon's Quest for Better Text-to-Speech with ElevenLabs
At Pixigon, we’re constantly striving to enhance the quality of marketing videos for our users. One crucial element that can make or break a video’s impact is the voice-over. That’s why we’ve been on a quest to find the best text-to-speech (TTS) solution to complement our video creation platform.The Search for Better Quality Speech
From day one, we’ve relied on Microsoft Azure for our TTS needs. While it has served us well, especially for English content, we felt there was room for improvement, particularly in other languages. As a Swedish company, we were keen to offer good Swedish voice-overs to cater to our local network and early adopters.
Our journey led us to explore various TTS providers, including industry giants like Amazon Polly and Google Cloud, as well as specialized services like Speechify and Narakeet. While each had its strengths, none quite hit the mark we were aiming for.
Enter ElevenLabs: A Promising alternative
During our search, we discovered ElevenLabs. Their TTS technology immediately stood out, offering a level of quality and expressiveness that surpassed our previous experiences. While still not perfect, it represented a significant leap forward, especially for our Swedish language needs.
Weighing the Pros and Cons
Integrating a new TTS provider isn’t a decision we take lightly. We carefully evaluated ElevenLabs against our current Azure setup:
API Integration: Unlike Azure’s client-side API, ElevenLabs requires a server-side implementation. This meant additional development work for our team, but the potential quality improvement made it worthwhile.
Voice Expressiveness: ElevenLabs’ voices offer more natural intonation and pronunciation, and generally more expressiveness. However, this comes with increased variability in outputs, which we managed by fine-tuning the stability settings.
Voice Selection: While ElevenLabs excels in voice cloning technology, their stock voice selection is limited, especially for non-English languages. This posed some challenges for our use case, as we prefer a wider range of ready-to-use voices.
A Phased Approach to Integration
After thorough testing and consideration, we decided to incorporate ElevenLabs into Pixigon. We’re taking a measured approach, starting with Swedish voice-overs while maintaining Azure for other languages. This allows us to leverage ElevenLabs’ strengths where they’re most impactful while ensuring broad language support through Azure.
Looking Ahead: Expanding Language Support
While our initial focus has been on enhancing Swedish voice-overs, our ultimate goal is to provide natural-sounding speech across multiple languages. We’re closely monitoring ElevenLabs’ development and plan to expand our use of their service as they broaden their language offerings and stock voice selection.
The Future of Voice in Pixigon
Integrating ElevenLabs marks an exciting step forward for Pixigon. It reflects our commitment to providing cutting-edge tools that help our users create compelling marketing videos. As we continue to refine this integration, we’re excited about the possibilities it opens up for more engaging, natural-sounding voice-overs across various languages.