SentenceLDA: Discriminative and Robust Document Representation with Sentence Level Topic Model

Taehun Cha; Donghun Lee

SentenceLDA: Discriminative and Robust Document Representation with Sentence Level Topic Model

Taehun Cha, Donghun Lee

Add to Favorites

Main: Semantics and Applications Oral Paper

Session 9: Semantics and Applications (Oral)

Conference Room: Marie Louise 2

Conference Time: March 20, 09:00-10:30 (CET) (Europe/Malta)

TLDR:

RocketChat
Abstract

You can open the #paper-66-Oral channel in a separate window.

Abstract: A subtle difference in context results in totally different nuances even for lexically identical words. On the other hand, two words can convey similar meanings given a homogeneous context. As a result, considering only word spelling information is not sufficient to obtain quality text representation. We propose SentenceLDA, a sentence-level topic model. We combine modern SentenceBERT and classical LDA to extend the semantic unit from word to sentence. By extending the semantic unit, we verify that SentenceLDA returns more discriminative document representation than other topic models, while maintaining LDA's elegant probabilistic interpretability. We also verify the robustness of SentenceLDA by comparing the inference results on original and paraphrased texts. Additionally, we implement one possible application of SentenceLDA on corpus-level key opinion mining by applying SentenceLDA on an argumentative corpus, DebateSum.