Telegram Group & Telegram Channel
18B speech recognition models

https://huggingface.co/collections/espnet/owls-scaling-laws-for-speech-recognition-and-translation-67ab7f991c194065f057ce8d

https://arxiv.org/abs/2502.10373

OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models

William Chen, Jinchuan Tian, Yifan Peng, Brian Yan, Chao-Han Huck Yang, Shinji Watanabe

Neural scaling laws offer valuable insights for designing robust sequence processing architectures. While these laws have been extensively characterized in other modalities, their behavior in speech remains comparatively underexplored. In this work, we introduce OWLS, an open-access, reproducible suite of multilingual speech recognition and translation models spanning 0.25B to 18B parameters, with the 18B version being the largest speech model, to the best of our knowledge. OWLS leverages up to 360K hours of public speech data across 150 languages, enabling a systematic investigation into how data, model, and compute scaling each influence performance in multilingual speech tasks. We use OWLS to derive neural scaling laws, showing how final performance can be reliably predicted when scaling. One of our key findings is that scaling enhances performance on low-resource languages/dialects, helping to mitigate bias and improve the accessibility of speech technologies. Finally, we show how OWLS can be used to power new research directions by discovering emergent abilities in la



tg-me.com/speechtech/2110
Create:
Last Update:

18B speech recognition models

https://huggingface.co/collections/espnet/owls-scaling-laws-for-speech-recognition-and-translation-67ab7f991c194065f057ce8d

https://arxiv.org/abs/2502.10373

OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models

William Chen, Jinchuan Tian, Yifan Peng, Brian Yan, Chao-Han Huck Yang, Shinji Watanabe

Neural scaling laws offer valuable insights for designing robust sequence processing architectures. While these laws have been extensively characterized in other modalities, their behavior in speech remains comparatively underexplored. In this work, we introduce OWLS, an open-access, reproducible suite of multilingual speech recognition and translation models spanning 0.25B to 18B parameters, with the 18B version being the largest speech model, to the best of our knowledge. OWLS leverages up to 360K hours of public speech data across 150 languages, enabling a systematic investigation into how data, model, and compute scaling each influence performance in multilingual speech tasks. We use OWLS to derive neural scaling laws, showing how final performance can be reliably predicted when scaling. One of our key findings is that scaling enhances performance on low-resource languages/dialects, helping to mitigate bias and improve the accessibility of speech technologies. Finally, we show how OWLS can be used to power new research directions by discovering emergent abilities in la

BY Speech Technology




Share with your friend now:
tg-me.com/speechtech/2110

View MORE
Open in Telegram


Speech Technology Telegram | DID YOU KNOW?

Date: |

How Does Telegram Make Money?

Telegram is a free app and runs on donations. According to a blog on the telegram: We believe in fast and secure messaging that is also 100% free. Pavel Durov, who shares our vision, supplied Telegram with a generous donation, so we have quite enough money for the time being. If Telegram runs out, we will introduce non-essential paid options to support the infrastructure and finance developer salaries. But making profits will never be an end-goal for Telegram.

That growth environment will include rising inflation and interest rates. Those upward shifts naturally accompany healthy growth periods as the demand for resources, products and services rise. Importantly, the Federal Reserve has laid out the rationale for not interfering with that natural growth transition.It's not exactly a fad, but there is a widespread willingness to pay up for a growth story. Classic fundamental analysis takes a back seat. Even negative earnings are ignored. In fact, positive earnings seem to be a limiting measure, producing the question, "Is that all you've got?" The preference is a vision of untold riches when the exciting story plays out as expected.

Speech Technology from us


Telegram Speech Technology
FROM USA