首页> 外文期刊>Information Processing & Management >Text summarization using a trainable summarizer and latent semantic analysis
【24h】

Text summarization using a trainable summarizer and latent semantic analysis

机译:使用可训练的摘要器和潜在语义分析进行文本摘要

获取原文
获取原文并翻译 | 示例
           

摘要

This paper proposes two approaches to address text summarization: modified corpus-based approach (MCBA) and LSA-based T.R.M. approach (LSA + T.R.M.). The first is a trainable summarizer, which takes into account several features, including position, positive keyword, negative keyword, centrality, and the resemblance to the title, to generate summaries. Two new ideas are exploited: (1) sentence positions are ranked to emphasize the significances of different sentence positions, and (2) the score function is trained by the genetic algorithm (GA) to obtain a suitable combination of feature weights. The second uses latent semantic analysis (LSA) to derive the semantic matrix of a document or a corpus and uses semantic sentence representation to construct a semantic text relationship map. We evaluate LSA + T.R.M. both with single documents and at the corpus level to investigate the competence of LSA in text summarization. The two novel approaches were measured at several compression rates on a data corpus composed of 100 political articles. When the compression rate was 30%, an average f-measure of 49% for MCBA, 52% for MCBA + GA, 44% and 40% for LSA + T.R.M. in single-document and corpus level were achieved respectively. (C) 2004 Elsevier Ltd. All rights reserved.
机译:本文提出了两种解决文本摘要的方法:改进的基于语料库的方法(MCBA)和基于LSA的T.R.M.方法(LSA + T.R.M.)。第一个是可训练的摘要程序,它考虑了多个功能(包括位置,肯定关键字,否定关键字,中心性以及与标题的相似性)以生成摘要。开发了两个新思想:(1)对句子位置进行排名以强调不同句子位置的重要性;(2)分数函数由遗传算法(GA)进行训练,以获得特征权重的适当组合。第二种方法使用潜在语义分析(LSA)来导出文档或语料库的语义矩阵,并使用语义句子表示来构建语义文本关系图。我们评估LSA + T.R.M.既可以使用单个文档,也可以在语料库级别研究LSA在文本摘要中的能力。这两种新颖的方法是在由100篇政治文章组成的数据集上以几种压缩率进行测量的。当压缩率为30%时,MCBA的平均f量度为49%,MCBA + GA的平均f度为52%,LSA + T.R.M.的平均f值为40%。单文档和语料库水平分别达到。 (C)2004 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号