Loading…
Friday January 31, 2025 9:30am - 11:30am IST

Authors - Hrudai Aditya Dharmala, Ajay Kumar Thallada, Kovvur Ram Mohan Rao
Abstract - Recent advances in vision-language models have demonstrated remarkable multimodal generation capabilities. However, their typical reliance on training large models on massive datasets poses challenges in terms of data and computational resources. Drawing inspiration from the expert-based architecture of Prismer, we propose a novel framework for contextual visual question answering specifically tailored to remote sensing imagery. Our methodology extends the Prismer architecture through a two-stage approach: first, by incorporating a domain-specific segmentation expert trained on remote sensing datasets, and second, by integrating a fine-tuned Large Language Model (Mistral 7B) optimized using Parameter-Efficient Fine-Tuning (PEFT) with QLoRA for remote sensing terminology, with hyperparameters optimized with help of Unsloth framework. The segmentation expert performs the analysis of remote sensing imagery, At the same time, the language model acts as a reasoning expert, combining domain-specific knowledge with natural language understanding to process visual contexts and generate accurate responses. In our framework, the use of the Unsloth fine-tuning approach for the language model helps maintain high performance within the defined scope of remote sensing classes and terminology while avoiding hallucination or deviation from established classification schemas. This opens an exciting direction for making the use of Earth observation data more accessible to end-users, demonstrating significant improvements in accuracy and reliability compared to traditional approaches. Experimental results validate that this architecture effectively balances domain expertise with computational efficiency, providing a practical solution for remote sensing visual question answering that requires substantially fewer computational resources compared to end-to-end training of massive models.
Paper Presenter
Friday January 31, 2025 9:30am - 11:30am IST
Virtual Room E Pune, India

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link