Loading…
Friday January 31, 2025 9:30am - 11:30am IST

Authors - Akshay Honnavalli, Hrishi Preetham G L, Aditya Rao, Preethi P
Abstract - In todays information-driven world, organizing vast amounts of textual data is crucial. Topic modelling, a subfield of NLP, enables the discovery of thematic structures in large text corpora, summarizing and categorizing documents by identifying prevalent topics. For Hindi speakers, adapting topic modelling methods used for English texts to Hindi is beneficial, as much of the research has focused primarily on English. This research addresses this gap by focusing on Hindi language topic modelling using a news category dataset, providing a comparative analysis between traditional approaches like LDA, LSA, NMF and BERT-based approaches. In this study, six open-source embedding models supporting Hindi were evaluated. Among these, the l3cube-pune/hindi-sentence-similarity-sbert model exhibited strong performance, achieving coherence scores of 0.783 and 0.797 for N-gram (1,1) and N-gram (1,2), respectively. Average coherence scores of all embedding models significantly exceeded traditional models, highlighting the potential of embedding models for Hindi topic modelling. Also, this research introduces a novel method to assign meaningful category labels to discovered topics by using dataset annotations, enhancing the interpretation of topic clusters. The findings illustrate both the strengths and areas for improvement in adapting these models to better capture the nuances of Hindi texts.
Paper Presenter
Friday January 31, 2025 9:30am - 11:30am IST
Virtual Room D Pune, India

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link