- SRI-B developed the Hindi language for Galaxy AI, ensuring 20 regional dialects were covered for best results
GURUGRAM, India – July 1, 2024:The development of Galaxy AI involved multiple R&D teams working across cultures and borders. SRI-Bangalore, Samsung’s largest R&D centre outside Korea, collaborated with teams around the world to develop AI language models for British, Indian and Australian English as well as Thai, Vietnamese and Indonesian.
Recently, core engineers from other Samsung Research centers visited Bangalore, India — where the SRI-B team helped ramp up the technology to bring Vietnamese, Thai and Indonesian to Galaxy AI.
SRI-B also developed the Hindi language for Galaxy AI. Developing the Hindi AI model wasn’t simple. The team had to ensure more than 20 regional dialects, tonal inflections, punctuation and colloquialisms were covered. Additionally, it is common for Hindi speakers to mix English words in their conversations. This required the team to carry out multiple rounds of AI model training with a combination of translated and transliterated data.
“Every language has its challenges,” said Giridhar Jakki, Head of Language AI at Samsung R&D Institute India – Bangalore (SRI-B). “But when you consider the end goal of bringing people the ability to communicate in other languages, it’s worth every ounce of effort. We couldn’t wait to bring Hindi to Galaxy AI.”
“Hindi has a complex phonetic structure that includes retroflex sounds — sounds made by curling the tongue back in the mouth — which are not present in many other languages,” said Jakki. “To build the speech synthesis element of the AI solution, we carefully reviewed data with native linguists to understand all the unique sounds and created a special set of phenomes to support specific dialects of the language.”
Collaborative efforts between Samsung and academic partners were instrumental in developing the AI language model that reflected the cultural nuances of the India’s regions. The Vellore Institute of Technology helped secure almost a million lines of segmented and curated audio data on conversational speech, words and commands. Data was a crucial component for a task as critical as incorporating the fourth most spoken language in the world into Galaxy AI. Working with universities ensured Samsung was using the highest quality data.
Galaxy AI now supports 16 languages, so more people can expand their language capabilities, even when offline, thanks to on-device translation in features such as Live Translate, Interpreter, Note Assist and Browsing Assist.
This project perfectly encapsulates Samsung’s philosophy of open collaboration and the company’s belief that sharing expertise and perspectives ensures meaningful innovation. In the case of SRI-B, this not only includes working with academia but also sharing insights and best practices with other Samsung research centers around the world.
“I’m extremely proud of what we’ve achieved with the help of our partners,” said Jakki. “AI innovation through collaboration is a big part of what we do. We will continue to better understand, collect and analyze language data so more people can have access to AI tools in the future.”