Multi-document Topic Segmentation Using Bayesian Estimation

Mota P., Eskenazi M., Coheur L.

Proceedings - 2016 IEEE 10th International Conference on Semantic Computing, ICSC 2016

2016

pp 443

-

447

Abstract:

This paper proposes the use of lexical similarity across different documents in order to improve a topic segmentation task. Given a set of topically related documents, the segmentation process is carried out using a Bayesian framework. By using similar sentences from different documents more accurate segment likelihood estimations are obtained. The proposed approach was tested in an educational domain where a set of learning materials from different media sources needed to be segmented so that students could browse through them more efficiently. Initial results show that the proposed method does afford better segmentation compared to one of the present state of the art algorithms, a Bayesian baseline approach that segments the documents individually.