首页> 中文期刊> 《计算机应用与软件》 >基于行业专有词典的TF-IDF特征选择算法改进

基于行业专有词典的TF-IDF特征选择算法改进

         

摘要

An industry proprietary dictionary is a dictionary of industry-specific terms, it can improve the completeness of the text feature space by applying the industry proprietary dictionary to the feature selection algorithm based on TF-IDF.The key goal of TF-IDF-based improved algorithm is to extract low-frequency keywords.The existing improved method based on statistical features increases the computational complexity of the original algorithm and reduces the efficiency of the algorithm.To solve this problem, the original TF-IDF feature selection algorithm adopts lexical mapping to extract low-frequency keywords to construct a complete feature space.Experimental results show that the feature extracted by TF-IDF algorithm based on industry proprietary dictionary can improve the recall and precision of clustering effectively in the following secondary clustering verification experiments compared with the feature extracted without using the industry proprietary dictionary feature selection algorithm.%行业专有词典是收录特定行业专有用语的词典,将行业专有词典运用到基于TF-IDF的特征选取算法中可提高文本特征空间的完备性.基于TF-IDF的改进算法的核心目标是提取出低频的关鍵词,现有的基于统计特征的改进方法增加了原始算法的计算复杂度,降低了算法的效率.针对这一问题,在原始的TF-IDF特征选取算法上采用词典映射的方法提取低频关鍵词来构建完备的特征空间.实验结果表明,基于行业专有词典的TF-IDF算法提取出的特征较未使用行业专有词典特征选取算法提取出的特征在后续的二次聚类验证实验中能有效地提高聚类的查全率和查准率.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号