Authors - Yasharth Sonar, Piyush Wajage, Khushi Sunke, Anagha Bidkar Abstract - Emotion recognition from speech is a crucial part of human-computer interaction and has applications in entertainment, healthcare, and customer service. This work presents a speech emotion recognition system that integrates machine learning and deep learning techniques. The system processes speech data using Mel Frequency Cepstral Coefficients (MFCC), Chroma, and Mel Spectrogram properties that were extracted from the RAVDESS dataset. A variety of classifiers are employed, including neural network-based multi-layer percept, Random Forest, Decision Trees, Support Vector Machine, and other traditional machine learning models. We have created a hybrid deep learning system to record speech signals' temporal and spatial components. a hybrid model that combines convolutional neural networks (CNN) with long short-term memory (LSTM) networks. With an accuracy of identifying eight emotions—neutral, calm, furious, afraid, happy, sad, disgusted, and surprised—the CNN-LSTM model outperformed the others. This study demonstrates how well deep learning and conventional approaches may be used to recognize speech emotions.