...
首页> 外文期刊>ACM transactions on Asian language information processing >An EDU-Based Approach for Thai Multi-Document Summarization and Its Application
【24h】

An EDU-Based Approach for Thai Multi-Document Summarization and Its Application

机译:基于EDU的泰国多文档摘要方法及其应用

获取原文
获取原文并翻译 | 示例
           

摘要

Due to lack of a word/phrase/sentence boundary, summarization of Thai multiple documents has several challenges in unit segmentation, unit selection, duplication elimination, and evaluation dataset construction. In this article, we introduce Thai Elementary Discourse Units (TEDUs) and their derivatives, called Combined TEDUs (CTEDUs), and then present our three-stage method of Thai multi-document summarization, that is, unit segmentation, unit-graph formulation, and unit selection and summary generation. To examine performance of our proposed method, a number of experiments are conducted using 50 sets of Thai news articles with their manually constructed reference summaries. Based on measures of ROUGE-1, ROUGE-2, and ROUGE-SU4, the experimental results show that: (1) the TEDU-based summarization outperforms paragraph-based summarization; (2) our proposed graph-based TEDU weighting with importance-based selection achieves the best performance; and (3) unit duplication consideration and weight recalculation help improve summary quality.
机译:由于缺少单词/短语/句子边界,泰文多个文档的摘要在单位细分,单位选择,重复消除和评估数据集构建方面面临一些挑战。在本文中,我们介绍了泰语基本语篇单元(TEDU)及其派生词,称为合并TEDU(CTEDU),然后介绍了泰语多文档摘要的三阶段方法,即单元细分,单元图表达,以及单位选择和摘要生成。为了检查我们提出的方法的性能,使用50套泰国新闻及其人工构建的参考摘要进行了许多实验。基于ROUGE-1,ROUGE-2和ROUGE-SU4的度量,实验结果表明:(1)基于TEDU的摘要优于基于段落的摘要; (2)我们提出的基于图的TEDU加权和基于重要性的选择可实现最佳性能; (3)单位重复的考虑和权重的重新计算有助于提高汇总质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号