Authors - Vidisha Deshpande, Gauri Shelke, Bhakti Kadam Abstract - Advancements in deep learning are fundamentally transforming assistive technologies, providing visually impaired users with unprecedented access to information and enhanced interaction with their surroundings. This paper comprehensively surveys traditional and emerging assistive technologies, focusing on real-time image caption generation systems. The modern advancements that bridge sensory limitations and digital interaction by covering a range of technologies such as Optical Character Recognition (OCR)-based text readers, object detection systems, image captioning systems, and intelligent haptic feedback devices are highlighted. In particular, the critical role of vision-language models and multimodal systems, which enable real-time auditory descriptions of visual scenes is studied. The survey also identifies significant gaps in real-world applications, particularly in terms of adaptability, cost, and inclusivity. These findings emphasize the need for more accessible, affordable, and real-time solutions that cater to the diverse needs of visually impaired individuals.