首页> 外文会议>IEEE/WIC/ACM International Conference on Web Intelligence >The Role of Different Thesauri Terms and Captions in Automated Subject Classification
【24h】

The Role of Different Thesauri Terms and Captions in Automated Subject Classification

机译:叙词表中不同术语和标题在自动主题分类中的作用

获取原文

摘要

The paper aims to explore to what degree different types of terms in Engineering Information (Ei) thesaurus and classification scheme influence automated subject classification performance. Preferred terms, their synonyms, broader, narrower, related terms, and captions are examined in combination with a stemmer and a stop-word list. The algorithm comprises string-to-string matching between words in the documents to be classified and words in term lists derived from the Ei thesaurus and classification scheme. The data collection for evaluation consists of some 35000 scientific paper abstracts from the Compendex database. A subset of the Ei thesaurus and classification scheme is used, comprising 92 classes at up to five hierarchical levels from General Engineering. The results show that preferred terms perform best, whereas captions perform worst. Stemming in most cases shows to improve performance, whereas the stop-word list does not have a significant impact.
机译:本文旨在探讨工程信息词库和分类方案中不同类型的术语在多大程度上影响自动主题分类性能。结合词干和停用词列表来检查首选术语,它们的同义词,更宽,更窄,相关术语和标题。该算法包括要分类的文档中的单词与从Ei词库和分类方案派生的术语列表中的单词之间的字符串到字符串匹配。用于评估的数据收集包含Compendex数据库中约35,000篇科学论文摘要。使用了Ei同义词库和分类方案的子集,其中包括来自通用工程公司的多达五个层次级别的92个类。结果表明,首选字词效果最佳,而字幕则效果最差。在大多数情况下,词干可以提高性能,而停用词列表不会产生重大影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号