Thursday January 30, 2025 9:30am - 11:30am IST

Authors - Madhuri Thorat, Priyanshu Kapadnis, Neel Kothimbire, Rameshkumar Choudhary, Atharva Jadhav
Abstract - The emergency of Generative AI has led to the development of various tools that present new opportunities for businesses and professionals engaged in content creation. The education sector is undergoing a significant transformation in the methods of content development and delivery. AI models and tools facilitate the creation of customized learning materials and effective visuals that enhance and simplify the educational experience. The advent of Large Language Models (LLMs) such as GPT and Text-to-Image models like Stable Diffusion has fundamentally changed and expedited the content generation process. The capability to generate high-quality visuals from textual descriptions has exceeded expectations from just a few years ago. Nevertheless, current research predominantly concentrates on text generation from text, with a notable lack of studies exploring the use of multimodal generation capabilities to tackle critical challenges in instruction supported by multimodal data. In this paper, we propose a framework for generating situational video content based on English poetry, which is executed through several phases: context analysis, prompt generation, image generation, and video synthesis. This comprehensive process necessitates various types of AI models, including text-to-text, text-to-video, text-to-audio, and image-to-image. This project illustrates the potential of combining multiple generative AI models to produce rich multimedia experiences derived from textual content.
Paper Presenter
Thursday January 30, 2025 9:30am - 11:30am IST
Virtual Room D Pune, India

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link