首页> 外文会议>22nd International Conference on Computational Linguistics >A Framework for Identifying Textual Redundancy
【24h】

A Framework for Identifying Textual Redundancy

机译:识别文本冗余的框架

获取原文
获取原文并翻译 | 示例

摘要

The task of identifying redundant information in documents that are generated from multiple sources provides a significant challenge for summarization and QA systems. Traditional clustering techniques detect redundancy at the sentential level and do not guarantee the preservation of all information within the document. We discuss an algorithm that generates a novel graph-based representation for a document and then utilizes a set cover approximation algorithm to remove redundant text from it. Our experiments show that this approach offers a significant performance advantage over clustering when evaluated over an annotated dataset.
机译:识别从多个来源生成的文档中的冗余信息的任务为摘要和QA系统提出了重大挑战。传统的群集技术在句子级别检测冗余,并且不能保证在文档中保留所有信息。我们讨论了一种算法,该算法为文档生成基于图形的新颖表示形式,然后利用集合覆盖率近似算法从中删除多余的文本。我们的实验表明,对带注释的数据集进行评估时,该方法比聚类具有明显的性能优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号