...
首页> 外文期刊>Pomiary Automatyka Kontrola >Automatyczne tworzenie podsumowań tekstów metodami algebraicznymi
【24h】

Automatyczne tworzenie podsumowań tekstów metodami algebraicznymi

机译:使用代数方法自动创建文本摘要

获取原文
获取原文并翻译 | 示例
           

摘要

Duża liczba zwracanych (na przykład przez różnego rodzaju wyszukiwar-ki intemetowe) dokumentów oznacza, że często zmuszeni jesteśmy do czasochłonnego ich przeglądania, celem weryfikacji trafności zwracanych wyników. Gdy dokumenty są długie, czas ich przeglądania znacznie się wydłuża. Można by go wydatnie skrócić, gdyby istniała możliwość automatycznego generowania sensownych podsumowań (streszczeń). W artykule omawiamy wybrane algebraiczne metody służące automatycznemu wydobywaniu z tekstu jego najistotniejszych słów kluczowych oraz najistotniejszych zdań.%Text summarization is a real practical problem due to explosion of the volume of textual information available nowadays. In order to solve this problem, text summarization systems which extract brief information from a given text are created. The end user, by looking only at the summary, may decide whether the document is or is not of interest to him/her. Built summaries can have 2 fundamental forms. Firstly, extractive summarization may collect important sentences from the input text to constitute the summary. Secondly, abstractive summarization tries to capture main concepts of the text and then some new sentences, summarizing the input text, are generated. Nowadays, however, it seems that the latter approach still needs extensive works to be really useful. A summary can be extracted from a single document or multiple documents. In the paper the authors build summaries of one document only. The extension into multi-document summaries is the straightforward task in the case when a set of semantically uniform texts is summarized. Summaries may also be categorized as generic and query-based summaries. In the first case, there are generated summaries containing main topics of a document. In the second case, summaries contain the sentences that are related to the given queries. In the paper there are built generic summaries. Summarization systems use different approaches to determine important sentences. Here there is used semantic oriented approach based on a method known as Latent Semantic Analysis (LSA). LSA is an algebraic method that extracts meaning of words and similarity of sentences using the information about usage of the words in the context. It uses Singular Value Decomposition (SVD) for finding semantically similar words and sentences. Using the results of SVD the authors try to select best sentences (which constitute the best summary of the text). The paper is organized as follows. In Section 2 there is formulated the problem. In Section 3 there is shown how a document may be represented in a useful algebraic format. The so called Term-Sentence matrix (TSM) is used. The authors also point at some preliminary tasks necessary to be performed for successful further analysis. In Subsection 3.2 there is shortly presented an idea of LSA as based on SVD decomposition. In the last section 4 two examples of text summarizations build for both Polish and English texts are given. The two methods used differ slightly from each other. The authors' extracting key words and key sentences seems to be proper content-related summaries of the input texts.
机译:返回的文档数量众多(例如,通过各种类型的Internet搜索引擎),这意味着我们经常被迫对其进行复查,以验证返回结果的准确性。当文档很长时,查看时间会大大增加。如果可以自动生成有意义的摘要(摘要),则可以将其大大缩短。在本文中,我们讨论了用于自动从文本中提取最重要的关键字和句子的选定代数方法。%由于当今可用的文本信息量激增,文本概述是一个实际的实际问题。为了解决该问题,创建了从给定文本中提取简要信息的文本摘要系统。最终用户仅查看摘要即可决定文档是否对他/她感兴趣。构建的摘要可以有2种基本形式。首先,摘录摘要可以从输入文本中收集重要的句子以构成摘要。其次,抽象总结试图捕获文本的主要概念,然后生成一些新句子,以总结输入文本。但是,如今,后一种方法似乎仍然需要大量工作才能真正有用。可以从单个文档或多个文档中提取摘要。在本文中,作者仅构建了一份文档的摘要。在总结一组语义统一的文本的情况下,扩展多文档摘要是一项直接的任务。摘要也可以归类为通用摘要和基于查询的摘要。在第一种情况下,将生成包含文档主要主题的摘要。在第二种情况下,摘要包含与给定查询相关的句子。在本文中,构建了通用摘要。摘要系统使用不同的方法来确定重要的句子。这里使用了基于称为潜在语义分析(LSA)的方法的面向语义的方法。 LSA是一种代数方法,它使用有关上下文中单词用法的信息来提取单词的含义和句子的相似性。它使用奇异值分解(SVD)来查找语义上相似的单词和句子。作者使用SVD的结果尝试选择最佳句子(构成文本的最佳摘要)。本文的结构如下。在第2节中提出了问题。在第3节中,显示了如何以有用的代数格式表示文档。使用所谓的术语句子矩阵(TSM)。作者还指出了成功进行进一步分析所需执行的一些初步任务。在3.2小节中简要介绍了基于SVD分解的LSA。在最后的第4部分中,给出了针对波兰语和英语文本的两个文本摘要构建示例。所使用的两种方法彼此略有不同。作者提取关键词和关键句子似乎是输入文本中与内容相关的适当摘要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号