Loading…
Friday January 31, 2025 12:15pm - 2:15pm IST

Authors - Satya Kiranmai Tadepalli, Sujith Kumar Akkanapelli, Sree Harsha Deshamoni, Pranav Bingi
Abstract - This paper in detail analyzes how generative AI and encoder-based architectures are drastically changing the realm of video generation with multimodal inputs such as images and text. The application of CNNs, RNNs, and Transformers so neatly serves to encode divergent modalities that blend into the seamless synthesis of realistic video sequences. It is based on the up-and-coming fields of generative models like GANs and VAEs, in bridging from static images to video generation. However, this represents a significant leap forward in the technology of video creation. It also goes into great detail on the complexities of multimodal input, working to balance coherence over time as well as semantic alignment of what's being produced. In the above-described context, it can now be realized how the role of encoders translates visual and textual information into actionable representations for generating video. What follows is a survey on recent progress in adopting Generative AI and multimodal encoders, discussions on what challenges are encountered today, and possible future directions that ultimately lay emphasis on their potential to assist video-related tasks and change the multimedia and AI communities.
Paper Presenter
Friday January 31, 2025 12:15pm - 2:15pm IST
Virtual Room B Pune, India

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link