...
首页> 外文期刊>Expert systems with applications >An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings
【24h】

An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings

机译:基于质心方法和句子嵌入的提取多文件摘要的无监督方法

获取原文
获取原文并翻译 | 示例
           

摘要

Extractive multi-document summarization (MDS) is the process of automatically summarizing a collection of documents by ranking sentences according to their importance and informativeness. Text representation is a fundamental process that affects the effectiveness of many text summarization methods. Word embedding representations have shown to be effective for several Natural Language Processing (NLP) tasks including Automatic Text Summarization (ATS). However, most of these representations do not consider the order and the semantic relationships between words in a sentence. This does not fully allow grasping the sentence semantics and the syntactic relationships between sentences constituents. In this paper, to overcome this problem, we propose an unsupervised method for generic extractive multi-document summarization based on the sentence embedding representations and the centroid approach. The proposed method selects relevant sentences according to the final score obtained by combining three scores: sentence content relevance, sentence novelty, and sentence position scores. The sentence content relevance score is computed as the cosine similarity between the centroid embedding vector of the cluster of documents and the sentence embedding vectors. The sentence novelty metric is explicitly adopted to deal with redundancy. The sentence position metric assumes that the first sentences of a document are more relevant to the summary, and it assigns high scores to these sentences. Moreover, this paper provides a comparative analysis of nine sentence embedding models used to represent sentences as dense vectors in a low dimensional vector space in the context of extractive multidocument summarization. Experiments are performed on the standard DUC'2002-2004 benchmark datasets and the Multi-News dataset. The overall obtained results have shown that our method outperforms several state-of-the-art methods and achieves promising results compared to the best performing methods including supervised deep learning based methods.
机译:提取多文件摘要(MDS)是根据他们的重要性和信息性通过排名句子自动总结一系列文件的过程。文本表示是一个基本进程,影响许多文本摘要方法的有效性。嵌入式表示已显示为几种自然语言处理(NLP)任务有效,包括自动文本摘要(ATS)。但是,大多数这些表示不考虑句子中单词之间的顺序和语义关系。这并不完全允许掌握句子语义和句子成分之间的句法关系。在本文中,为了克服这个问题,我们提出了一种基于句子嵌入式表示和质心方法的通用提取多文件摘要的无监督方法。该方法根据结合三个分数而获得的最终得分选择相关句子:句子内容相关性,句子新颖性和句子位置得分。句子内容相关性得分被计算为模拟嵌入矢量的质心嵌入矢量与句子嵌入向量之间的余弦相似性。明确采用句子新颖性指标来处理冗余。句子位置度量标准假定文档的第一个句子与摘要更相关,并且它将高分分配给这些句子。此外,本文提供了九句嵌入模型的比较分析,用于在提取多容录数总结的背景下以低维矢量空间中的致密传感器表示句子。实验是对标准Duc'2002-2004基准数据集和多新闻数据集进行的实验。总体上获得的结果表明,与最佳性能的方法相比,我们的方法优于几种最先进的方法,并实现了有希望的结果,包括受监督的深度学习方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号