Experiments in High-Dimensional Text Categorization

机译：高维文本分类的实验

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present results for automated text categorization of the Reuters-810000 collection of news stories. Our experiments use the entire one-year collection of 810,000 stories and the entire subject index. We divide the data into monthly groups and provide an initial benchmark of text categorization performance on the complete collection. Experimental results show that efficient sparse-feature implementations of linear methods and decision trees, using a global unstemmed dictionary, can readily handle applications of this size. Predictive performance is approximately as strong as the best results for the much smaller older Reuters collections. Detailed results are provided over time periods. It is shown that a smaller time horizon does not diminish predictive quality, implying reduced demands for retraining when Sample size is large.

机译：我们为Reuters-810000收集新闻故事提供了自动文本分类的结果。我们的实验使用整个一年的810,000个故事和整个主题索引。我们将数据划分为每月组，并在完整集合上提供文本分类性能的初始基准。实验结果表明，使用全局调节词典的线性方法和决策树的有效稀疏特征实现可以易于处理这种大小的应用。预测性能大致强大是较小的较小的路透社集合的最佳效果。随着时间的推移提供了详细结果。结果表明，较小的时间范围不会减少预测质量，这意味着当样本大小大时对再培训的要求减少。

著录项

来源
《Annual international ACM SIGIR conference on research and development in information retrieval》|2002年||共2页
会议地点
作者
Fred J. Damerau; long Zhang; Sholom M. Weiss; Nitin Indurkhya;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类情报检索;
关键词
experimentation;

机译：实验;

相似文献

外文文献
中文文献
专利

1. The Hybrid Filter Feature Selection Methods for Improving High-Dimensional Text Categorization [J] . Le Nguyen Hoai Nam, Ho Bao Quoc International Journal of Uncertainty, Fuzziness, and Knowledge-based Systems . 2017,第2期

机译：改进高维文本分类的混合过滤器特征选择方法
2. Contextual Text Categorization: An Improved Stemming Algorithm to Increase the Quality of Categorization in Arabic Text [J] . Gadri Said, Moussaoui Abdelouahab The international arab journal of information technology . 2017,第6期

机译：上下文文本分类：一种改进的词干算法，可提高阿拉伯文本分类的质量
3. Experiment on Methods for Clustering and Categorization of Polish Text [J] . Wielgosz Maciej, Fraczek Rafa?, Russek Pawe?, Computing and informatics . 2017,第1期

机译：波兰语文本聚类和分类方法的实验
4. Experiments in High-Dimensional Text Categorization [C] . Fred J. Damerau, long Zhang, Sholom M. Weiss, The Twenty-Fifth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug 11-15, 2002, Tampere, Finland . 2002

机译：高维文本分类实验
5. The implementation of dynamic document organization using the integration of text clustering and text categorization. [D] . Jo, Taeho. 2006

机译：使用文本聚类和文本分类的集成来实现动态文档组织。
6. SANAD: Single-label Arabic News Articles Dataset for automatic text categorization [O] . Omar Einea, Ashraf Elnagar, Ridhwan Al Debsi 2019

机译：SANAD：用于自动文本分类的单标签阿拉伯新闻文章数据集
7. Experiments in High-Dimensional Text Categorization [O] . Fred Damerau Tong, Tong Zhang, Sholom M. Weiss, 2007

机译：高维文本分类中的实验

Experiments in High-Dimensional Text Categorization

摘要

著录项

相似文献

相关主题

期刊订阅