Although Google Translate has helped bridge language barriers for years, it still received more than enough criticism due to its inaccurate translations. And Google’s response? Develop a new system: Google’s AI translator dubbed the Translatotron.
Before getting to know the Translatotron, here’s a quick breakdown on what has led to its development.
A departure from speech-to-text to text-to-speech conversion
Image from Google Translate
Before Google’s AI translator came into the picture, the Google Translate app has been the go-to tool to fill the language gap for years. This app was commonly used for travelling purposes as it can automatically translate unfamiliar signs and instructions into the user’s known language.
When Google’s first translator launched in 2006, the app used statistical machine translation. It was a cascade of systems broken into three separate components:
- Automatic speech recognition to transcribe the speech into text
- Machine translation from the transcribed text into a certain language
- Text-to-speech synthesis (TTS) to generate speech in the target language from the translated text
During the translation process, the system gathers data transcripts from the UN and European Parliament, looking for patterns to find the best translation. Although this three-step process has done very well in many of Google’s speech-to-speech products like Google Translate, its translations were inaccurate and needed improvements.
Relate Article: 5G in the UK: Launch Dates, Devices, and Health Concerns
A shift to direct speech-to-speech translation
More recently, Google has ventured off to using a neural machine translation. It is a sequence-to-sequence model that produces direct speech-to-speech translation without having to go through the three-step translation process.
The new system uses a vast artificial neural network to predict the likelihood of a sequence of words in one single integrated model.
Unlike the previous method of translation, the new system:
- Translates sentences as a whole, instead of piece by piece or word by word.
- Uses a broader context to help figure out the most relevant translation.
- Rearranges and adjusts the text to create a more human-like response.
Implementing the neural network into Google’s translation processes helps the app skip over a few steps like translating audio to text and back again, thus, systematically processing translations at a faster rate.
Along with a more efficient system, Google’s prototype AI translator can encode speech and preserve the original speaker’s voice. The Translatotron aims to bridge the language gap more efficiently.
Here’s what you need to know about the Translatotron
Image from Google AI
According to Google’s blog on the new AI translator, the development of the Translatotron started in 2016. It intends to create an end-to-end-sequence model for speech-to-text translation.
The new system aims to provide several advantages that make the translation process much simpler over the cascaded systems used previously. Such advantages include:
- Faster translation process
- Ability to avoid compounding errors between recognition and translation
- Ability to accurately retain the voice of the original speaker after translation
- Detection and handling of words that do not need to be translated
The Translatotron aims to take translating technology a step further by proving that “a single sequence-to-sequence model can directly translate speech from one language into speech in another language,” as stated in the blog.
Related Article: The Future of Drone Technology
How will Google’s AI Translator sound like?
As the Translatotron is based on a sequence-to-sequence network, it takes its source from spectrograms (visual representations of sound frequencies) as input and then analyses the spectrograms using a neural vocoder (a voice encoder) which converts output spectrograms to the target language.
The spectrogram will then be converted to an audio wave which is ready to be played. And with this audio, the original speaker’s vocal characteristics will be layered back into the final audio output.
Translatotron’s speech-to-speech capabilities include measuring the BLEU (Bilingual Evaluation Understudy) score. BLEU is the algorithm for evaluating the quality of the text. The translated content is then computed by a speech recognition system to transcribe the text.
Here's what it could sound like
This is how Google’s AI Translator is predicted to sound like in comparison to Google Translate. All sound clips are from google-research.github.io
Here’s a sound clip in Spanish (source language):
Here’s what it sounds like in English (target language):
Here’s what the translation sounds like using Google Translate’s statistical machine translation system:
Here’s what the Translatotron will sound like:
From listening to the predicted sound of the new Translatotron, its human-like accuracy could solve the criticisms that Google’s current translator has received. As the artificial neural network can better detect and translate human-like nuances onto its target language, it can create a more organic and smoother conversation for its users.
As of now, there are only a few text-to-speech translation apps similar to Google Translate that are currently out in the market such as SayHi, iTranslate, TripLingo, and Microsoft Translator. All of which still lack the human-like characteristics that could hinder the flow of a conversation between individuals.
However, with Google’s AI translator—Translatotron, we have hope that conversations between two different languages can be made possible, navigating through foreign lands can be a lot easier, and strong bonds among like-minded individuals despite the language barrier could be formed.
Need help in bringing your tech startup idea come to life? We at Cloud Employee can help you get started! You can hire dedicated offshore developers with us across many technologies. Talk to us, learn more how Cloud Employee works, or see our Developer Pricing Guide.