【24h】

Indexing Text Documents Based on Topic Identification

机译:基于主题识别的文本文档索引

获取原文
获取原文并翻译 | 示例

摘要

This work provides algorithms and heuristics to index text documents by determining important topics in the documents. To index text documents, the work provides algorithms to generate topic candidates, determine their importance, detect similar and synonym topics, and to eliminate incoherent topics. The indexing algorithm uses topic frequency to determine the importance and the existence of the topics. Repeated phrases are topic candidates. For example, since the phrase 'index text documents' occurs three times in this abstract, the phrase is one of the topics of this abstract. It is shown that this method is more effective than either a simple word count model or approaches based on term weighting.
机译:这项工作通过确定文档中的重要主题,提供了算法和启发式方法来索引文本文档。为了索引文本文档,该工作提供了算法来生成候选主题,确定其重要性,检测相似和同义词的主题并消除不连贯的主题。索引算法使用主题频率来确定主题的重要性和存在性。重复的短语是主题候选词。例如,由于短语“索引文本文档”在此摘要中出现了三次,因此该短语是此摘要的主题之一。结果表明,该方法比简单的单词计数模型或基于术语加权的方法更为有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号