Cross-Comparison for Two-Dimensional Text Categorization

机译：二维文本分类的交叉比较

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The organization of large text collections is the main goal of automated text categorization. In particular, the final aim is to classify documents into a certain number of pre-defined categories in an efficient way and with as much accuracy as possible. On-line and run-time services, such as personalization services and information filtering services, have increased the importance of effective and efficient document categorization techniques. In the last years, a wide range of supervised learning algorithms have been applied to this problem. Recently, a new approach that exploits a two-dimensional summarization of the data for text classification was presented. This method does not go through a selection of words phase; instead, it uses the whole dictionary to present data in intuitive way on two-dimensional graphs. Although, successful in terms of classification effectiveness and efficiency (as recently showed in [3]), this method presents some unsolved key issues: the design of the training algorithm seems to be ad hoc for the Reuters-21578 collection; the evaluation has only been done only on the 10 most frequent classes of the Reuters-21578 dataset; the evaluation lacks measure of significance in most parts; the method adopted lacks a mathematical justification. We focus on the first three aspects, leaving the fourth as the future work.

机译：大型文本集合的组织是自动文本分类的主要目标。特别地，最终目标是以有效的方式并尽可能精确地将文档分类为一定数量的预定义类别。在线和运行时服务，例如个性化服务和信息过滤服务，已经增加了有效的文档分类技术的重要性。在过去的几年中，各种各样的监督学习算法已经被应用到这个问题上。最近，提出了一种新的方法，该方法利用数据的二维汇总进行文本分类。此方法不会经过单词选择阶段；相反，它使用整个词典以直观的方式在二维图形上显示数据。尽管在分类有效性和效率方面很成功（如最近在[3]中所示），但该方法存在一些未解决的关键问题：训练算法的设计似乎是Reuters-21578集合的临时性；仅对Reuters-21578数据集的10个最频繁的类别进行了评估；评估在大多数地方缺乏重要意义；采用的方法缺乏数学依据。我们将重点放在前三个方面，而将第四个方面留作未来的工作。

著录项

来源
《International Conference on String Processing and Information Retrieval(SPIRE 2004); 20041005-08; Padova(IT)》|2004年|P.125-126|共2页
会议地点 Padova(IT)
作者
Giorgio Maria Di Nunzio;
展开▼
作者单位

Department of Information Engineering, University of Padua;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类数据备份与恢复;
关键词

相似文献

外文文献
中文文献
专利

1. Contextual Text Categorization: An Improved Stemming Algorithm to Increase the Quality of Categorization in Arabic Text [J] . Gadri Said, Moussaoui Abdelouahab The international arab journal of information technology . 2017,第6期

机译：上下文文本分类：一种改进的词干算法，可提高阿拉伯文本分类的质量
2. Text Document Categorization using Enhanced Sentence Vector Space Model and Bi-Gram Text Representation Model Based on Novel Fusion Techniques [J] . Abdisa Demissie Amensisa New Media and Mass Communication . 2020,第4期

机译：基于新型融合技术的基于增强句子矢量空间模型和双革文本表示模型的文本文档分类
3. A Novel Text Representation Model to Categorize Text Documents using Convolution Neural Network [J] . M. B. Revanasiddappa, B. S. Harish International Journal of Intelligent Systems and Applications . 2019,第5期

机译：利用卷积神经网络对文本文档进行分类的新型文本表示模型
4. Cross-Comparison for Two-Dimensional Text Categorization [C] . Giorgio Maria Di Nunzio International Conference on String Processing and Information Retrieval . 2004

机译：二维文本分类的交叉比较
5. The implementation of dynamic document organization using the integration of text clustering and text categorization. [D] . Jo, Taeho. 2006

机译：使用文本聚类和文本分类的集成来实现动态文档组织。
6. Categorization of Two-Dimensional and Three-Dimensional Stimuli by 18-Month-Old Infants [O] . Martha E. Arterberry, Marc H. Bornstein, Julia B. Blumenstyk -1

机译：18个月大婴儿的二维和三维刺激分类
7. Two-dimensional Clustering for Text Categorization [O] . Hiroya Takamura, Yuji Matsumoto 2002

机译：二维聚类用于文本分类

Cross-Comparison for Two-Dimensional Text Categorization

摘要

著录项

相似文献

相关主题

期刊订阅