Authors - Sathiyapriya K, S Bharath, Rohith Sundharamurthy, Prithivi Raaj K, Rakesh Kumar S, Rakkul Pravesh M, N Arun Eshwer Abstract - The convenience and security offered by voice-based authentication systems results in its increasing use in various sectors such as banking, e-commerce, telecommunications, etc. But these systems are open to vulnerabilities from voice spoofing attacks, including replay synthesis and voice conversion. The following work makes use of Mel-Frequency Cepstral Coefficients (MFCC), Constant-Q Transform (CQT), and a deep learning model Res2Net and creates a framework that can classify genuine and spoofed voices. MFCC and CQT are commonly used for feature extraction, and the Res2Net model classifies the audio. The system was evaluated against the ASVspoof 2021 dataset, the reason being that it has a diverse collection of audio samples (almost 180,000) samples, and also it is recognized by the research community. Our system recorded a low Equal Error Rate (EER) of 0.0332 and a Tandem Detection Cost Function (t-DCF) of 0.2246. This framework contributes to the advancement of secure voice authentication systems, addressing critical challenges in modern cybersecurity.