An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings

Lamsiyah Salima; El Mahdaouy Abdelkader; Espinasse Bernard; Ouatik Said El Alaoui

首页> 外文期刊>Expert systems with applications >An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings

【24h】

An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings

机译：基于质心方法和句子嵌入的提取多文件摘要的无监督方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Extractive multi-document summarization (MDS) is the process of automatically summarizing a collection of documents by ranking sentences according to their importance and informativeness. Text representation is a fundamental process that affects the effectiveness of many text summarization methods. Word embedding representations have shown to be effective for several Natural Language Processing (NLP) tasks including Automatic Text Summarization (ATS). However, most of these representations do not consider the order and the semantic relationships between words in a sentence. This does not fully allow grasping the sentence semantics and the syntactic relationships between sentences constituents. In this paper, to overcome this problem, we propose an unsupervised method for generic extractive multi-document summarization based on the sentence embedding representations and the centroid approach. The proposed method selects relevant sentences according to the final score obtained by combining three scores: sentence content relevance, sentence novelty, and sentence position scores. The sentence content relevance score is computed as the cosine similarity between the centroid embedding vector of the cluster of documents and the sentence embedding vectors. The sentence novelty metric is explicitly adopted to deal with redundancy. The sentence position metric assumes that the first sentences of a document are more relevant to the summary, and it assigns high scores to these sentences. Moreover, this paper provides a comparative analysis of nine sentence embedding models used to represent sentences as dense vectors in a low dimensional vector space in the context of extractive multidocument summarization. Experiments are performed on the standard DUC'2002-2004 benchmark datasets and the Multi-News dataset. The overall obtained results have shown that our method outperforms several state-of-the-art methods and achieves promising results compared to the best performing methods including supervised deep learning based methods.

机译：提取多文件摘要（MDS）是根据他们的重要性和信息性通过排名句子自动总结一系列文件的过程。文本表示是一个基本进程，影响许多文本摘要方法的有效性。嵌入式表示已显示为几种自然语言处理（NLP）任务有效，包括自动文本摘要（ATS）。但是，大多数这些表示不考虑句子中单词之间的顺序和语义关系。这并不完全允许掌握句子语义和句子成分之间的句法关系。在本文中，为了克服这个问题，我们提出了一种基于句子嵌入式表示和质心方法的通用提取多文件摘要的无监督方法。该方法根据结合三个分数而获得的最终得分选择相关句子：句子内容相关性，句子新颖性和句子位置得分。句子内容相关性得分被计算为模拟嵌入矢量的质心嵌入矢量与句子嵌入向量之间的余弦相似性。明确采用句子新颖性指标来处理冗余。句子位置度量标准假定文档的第一个句子与摘要更相关，并且它将高分分配给这些句子。此外，本文提供了九句嵌入模型的比较分析，用于在提取多容录数总结的背景下以低维矢量空间中的致密传感器表示句子。实验是对标准Duc'2002-2004基准数据集和多新闻数据集进行的实验。总体上获得的结果表明，与最佳性能的方法相比，我们的方法优于几种最先进的方法，并实现了有希望的结果，包括受监督的深度学习方法。

著录项

来源
《Expert systems with applications》 |2021年第4期|1114152.1-1114152.16|共16页
作者
Lamsiyah Salima; El Mahdaouy Abdelkader; Espinasse Bernard; Ouatik Said El Alaoui;
展开▼
作者单位

Ibn Tofail Univ Natl Sch Appl Sci Lab Engn Sci Kenitra Morocco|Sidi Mohamed Ben Abdellah Univ Lab Informat Signals Automat & Cognitivism FSDM Fes Morocco;

Sidi Mohamed Ben Abdellah Univ Lab Informat Signals Automat & Cognitivism FSDM Fes Morocco;

Univ Toulon & Var Aix Marseille Univ LIS UMR CNRS 7020 Marseille France;

Ibn Tofail Univ Natl Sch Appl Sci Lab Engn Sci Kenitra Morocco|Sidi Mohamed Ben Abdellah Univ Lab Informat Signals Automat & Cognitivism FSDM Fes Morocco;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Extractive text summarization; Word embeddings; Sentence embeddings; Centroid approach; Transfer learning;

机译：提取文本摘要;Word Embeddings;句子嵌入;质心方法;转移学习;

相似文献

外文文献
中文文献
专利

1. A semantic approach to extractive multi-document summarization: Applying sentence expansion for tuning of conceptual densities [J] . Mohammad Bidoki, Mohammad R. Moosavi, Mostafa Fakhrahmad Information Processing & Management . 2020,第6期

机译：提取多文件摘要的语义方法：应用句子扩张调整概念密度
2. MSCSO: Extractive Multi-document Summarization Based on a New Criterion of Sentences Overlapping [J] . Khaleghi Zeynab, Fakhredanesh Mohammad, Hourali Maryam Iranian Journal of Science and Technology, Transactions of Electrical Engineering . 2021,第1期

机译：MSCSO：基于句子重叠的新标准的提取多文件摘要
3. Extractive multi-document summarization based on textual entailment and sentence compression via knapsack problem [J] . Naserasadi Ali, Khosravi Hamid, Sadeghi Faramarz Natural language engineering . 2019,第PTa1期

机译：基于背包问题的文本蕴涵和句子压缩的提取式多文档摘要
4. Multi-Document Extractive Summarization Using Window-Based Sentence Representation [C] . Yong Zhang, Meng Joo Er, Rui Zhao IEEE Symposium Series on Computational Intelligence . 2015

机译：使用基于窗口的句子表示的多文档提取摘要
5. Multi-document Summarization Based on Document Clustering and Neural Sentence Fusion [D] . Fuad, Tanvir Ahmed. 2018

机译：基于文档聚类和神经句子融合的多文件摘要
6. Unsupervised Extraction of Diagnosis Codes from EMRs Using Knowledge-Based and Extractive Text Summarization Techniques [O] . Ramakanth Kavuluru, Sifei Han, Daniel Harris -1

机译：使用基于知识的和提取文本摘要技术从EMR中无监督地提取诊断代码
7. STRASS: A Light and Effective Method for Extractive Summarization Based on Sentence Embeddings [O] . Léo Bouscarrat, Antoine Bonnefoy, Thomas Peel, 2019

机译：基于句子嵌入的基于句子汇总的轻度和有效方法

An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings

摘要

著录项

相似文献

相关主题

期刊订阅