Loading…
Friday January 31, 2025 12:15pm - 2:15pm IST

Authors - Mrudul Dixit, Rajiya Landage, Prachi Raut
Abstract - The paper presents a comprehensive comparison of Speech-to-Text (STT) and Text-to-Speech (TTS) models, two foundational technologies in the field of natural language processing and human-computer interaction. The paper examines the evolution of these models, focusing on state-of-the-art approaches such as Whisper Automatic Speech Recognition (ASR), DeepSpeech, and Wav2vec, Kaldi, SpeechBrain for STT, and Tacotron, WaveNet, gTTS and FastSpeech for TTS. Through an analysis of architectures, performance metrics, and applications, the paper highlights the strengths and limitations of each model, particularly in domains requiring high accuracy, multilingual support, and real-time processing. The paper also explores the challenges faced by STT and TTS systems, including handling diverse languages, background noise, and generating natural-sounding speech. There are recent advances in end-to-end models, transfer learning, and multimodal approaches that are pushing the boundaries of both STT and TTS technologies. By providing a detailed comparison and identifying future research directions, this review aims to guide researchers and practitioners in selecting and developing speech models for various applications, particularly in enhancing accessibility for specially-abled individuals.
Paper Presenter
Friday January 31, 2025 12:15pm - 2:15pm IST
Virtual Room E Pune, India

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link