首页> 外文期刊>Information Processing & Management >Dictionary-based text categorization of chemical web pages
【24h】

Dictionary-based text categorization of chemical web pages

机译:化学网页的基于字典的文本分类

获取原文
获取原文并翻译 | 示例
           

摘要

A new dictionary-based text categorization approach is proposed to classify the chemical web pages efficiently. Using a chemistry dictionary, the approach can extract chemistry-related information more exactly from web pages. After automatic segmentation on the documents to find dictionary terms for document expansion, the approach adopts latent semantic indexing (LSI) to produce the final document vectors, and the relevant categories are finally assigned to the test document by using the k-NN text categorization algorithm. The effects of the characteristics of chemistry dictionary and test collection on the categorization efficiency are discussed in this paper, and a new voting method is also introduced to improve the categorization performance further based on the collection characteristics. The experimental results show that the proposed approach has the superior performance to the traditional categorization method and is applicable to the classification of chemical web pages. (c) 2005 Elsevier Ltd. All rights reserved.
机译:提出了一种新的基于字典的文本分类方法,以有效地对化学网页进行分类。使用化学词典,该方法可以从网页中更准确地提取与化学有关的信息。在对文档进行自动分割以找到用于文档扩展的词典术语之后,该方法采用潜在语义索引(LSI)来生成最终的文档向量,然后使用k-NN文本分类算法将相关类别最终分配给测试文档。讨论了化学词典和试题的特点对分类效率的影响,并引入了一种新的投票方法,以进一步提高分类特点。实验结果表明,该方法具有优于传统分类方法的性能,适用于化学网页的分类。 (c)2005 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号