Name: Synthetic Speech Detection using MFCC and CQT with Res2Net Architecture
Start: 2025-01-30T09:30:00+0530
End: 2025-01-30T11:30:00+0530

Thursday January 30, 2025 9:30am - 11:30am IST

Virtual Room E

Open Zoom

Authors - Sathiyapriya K, S Bharath, Rohith Sundharamurthy, Prithivi Raaj K, Rakesh Kumar S, Rakkul Pravesh M, N Arun Eshwer
Abstract - The convenience and security offered by voice-based authentication systems results in its increasing use in various sectors such as banking, e-commerce, telecommunications, etc. But these systems are open to vulnerabilities from voice spoofing attacks, including replay synthesis and voice conversion. The following work makes use of Mel-Frequency Cepstral Coefficients (MFCC), Constant-Q Transform (CQT), and a deep learning model Res2Net and creates a framework that can classify genuine and spoofed voices. MFCC and CQT are commonly used for feature extraction, and the Res2Net model classifies the audio. The system was evaluated against the ASVspoof 2021 dataset, the reason being that it has a diverse collection of audio samples (almost 180,000) samples, and also it is recognized by the research community. Our system recorded a low Equal Error Rate (EER) of 0.0332 and a Tandem Detection Cost Function (t-DCF) of 0.2246. This framework contributes to the advancement of secure voice authentication systems, addressing critical challenges in modern cybersecurity.

Paper Presenter

Rohith Sundharamurthy

India

Thursday January 30, 2025 9:30am - 11:30am IST
Virtual Room E Pune, India

Virtual Room 5E, Virtual Room E

Ninth International Conference SmartCom 2025

Rohith Sundharamurthy

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!