首页> 外文会议>Database and Expert Systems Applications; Lecture Notes in Computer Science; 4080 >Interactions Between Document Representation and Feature Selection in Text Categorization
【24h】

Interactions Between Document Representation and Feature Selection in Text Categorization

机译:文本分类中文档表示与特征选择之间的相互作用

获取原文
获取原文并翻译 | 示例

摘要

Many studies in automated Text Categorization focus on the performance of classifiers, with or without considering feature selection methods, but almost as a rule taking into account just one document representation. Only relatively recently did detailed studies on the impact of various document representations step into the spotlight, showing that there may be statistically significant differences in classifier performance even among variations of the classical bag-of-words model. This paper examines the relationship between the idf transform and several widely used feature selection methods, in the context of Naieve Bayes and Support Vector Machines classifiers, on datasets extracted from the dmoz ontology of Web-page descriptions. The described experimental study shows that the idf transform considerably effects the distribution of classification performance over feature selection reduction rates, and offers an evaluation method which permits the discovery of relationships between different document representations and feature selection methods which is independent of absolute differences in classification performance.
机译:在自动文本分类中,许多研究都集中在分类器的性能上,无论是否考虑特征选择方法,但通常仅考虑一个文档表示形式。直到最近才对各种文档表示的影响进行详细研究,才成为关注的焦点,这表明即使经典词袋模型的变体之间,分类器性能也可能存在统计学上的显着差异。本文在Naieve Bayes和支持向量机分类器的背景下,研究了从网页描述的dmoz本体提取的数据集上,idf变换与几种广泛使用的特征选择方法之间的关系。所描述的实验研究表明,idf变换极大地影响了分类性能在特征选择降低率上的分布,并提供了一种评估方法,该方法允许发现不同文档表示与特征选择方法之间的关系,而该关系与分类性能的绝对差异无关。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号