Authors - Varun Maniappan, Praghaadeesh R, Bharathi Mohan G, Prasanna Kumar R Abstract - This paper constitutes a comprehensive review of how language models have changed, focusing specifically on the trends toward smaller and more efficient models rather than large, resource-hungry ones. We discuss technological progress in the direction of language models applied to attention mechanisms, positional embeddings, and architectural enhancements. The bottleneck of LLMs has been their high computational requirements, and this has kept them from becoming more widely used tools. In this paper, we outline how some very recent innovations, notably Flash Attention and small language models (SLMs), addressed these limitations by paying special attention to the Mamba architecture that uses state-space models. Moreover, we describe the emerging trend of open-source language models, reviewing major technology companies efforts such as Microsoft’s Phi and Google’s Gemma series. We trace here the evolution from early models of transformers to the current open-source implementations and report on future work to be done in making AI more accessible and efficient. Our analysis shows how such advances democratize AI technology while maintaining high performance standards.