Influence of Word Normalization and Chi-Squared Feature Selection on Support Vector Machine (SVM) Text Classification

机译：字标准化与Chi方向特征选择对支持向量机（SVM）文本分类的影响

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this study, we used SVM for text classification. There is stemming or Iemmatization word normalization with the addition of Chi-squarefeature selection on the classification that we made. There are also pre-processing data being performed, namely stopwords removal and tokenize. We used BBC dataset containing 2,225 documents and 5 categories. There are 21,813. features resulting from the use of stemming and 31,007 features resulting from the use of lemmatization. Each feature represents the number of words that come out in the document. We used confusion matrix to evaluate the results of text clasification. SVM text classification performance using stemming enhanced by Chi-squared (method 1) get better results than using lemmatization enhanced by Chi-squared (method 2). The best performance was obtained using 80% feature reduction where method 1 received a precision value of 95%, a recall value of 95%, and an accuracy value of 95.05%. Method 2 only received a precision value of 93%, a recall value of 93%, and an accuracy value of 93.24% using the same amount of feature reduction.

机译：在这项研究中，我们使用SVM进行文本分类。在我们所做的分类上添加了Chi-Squestfeefure选择，有声明或IEMMATIZ化词标准化。还有预处理数据正在执行，即停止并令授权的停止。我们使用包含2,225个文档和5个类别的BBC数据集。有21,813。利用溶液的使用产生的特征是由使用lemmatization产生的31,007个功能。每个功能都代表文档中出现的单词数。我们使用了困惑矩阵来评估文本分解的结果。 SVM文本分类性能使用Chi平方增强（方法1）获得比Chi平方（方法2）增强的lemmatization更好的结果。使用80％特征减少获得的最佳性能，其中方法1接收到95％的精度值，召回值为95％，精度值为95.05％。方法2仅接收93％的精确值，召回值为93％，使用相同的特征减少量为93.24％的精度值。

著录项

来源
《International Seminar on Application for Technology of Information and Communication》|2018年|616p|共5页
会议地点
作者
Ardy Wibowo Haryanto; Edy Kholid Mawardi; Muljono;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN91-53;
关键词
Support vector machines; Text categorization; Business; Sports; Feature extraction; Training;

机译：支持向量机;文本分类;业务;体育;特征提取;培训;

相似文献

外文文献
中文文献
专利

1. Optimal feature selection for SAR image classification using biogeography-based optimization (BBO), artificial bee colony (ABC) and support vector machine (SVM): a combined approach of optimization and machine learning [J] . Rostami Omid, Kaveh Mehrdad Computational Geosciences . 2021,第3期

机译：基于生物地理的优化（BBO），人造蜜蜂（ABC）和支持向量机（SVM）的SAR图像分类的最佳特征选择：优化和机器学习的组合方法
2. Bagging based ensemble of Support Vector Machines with improved elitist GA-SVM features selection for cardiac arrhythmia classification [J] . Kadam Vinod, Jadhav Shivajirao, Yadav Samir International Journal of Hybrid Intelligent Systems . 2020,第1期

机译：基于支持向量机的组合，具有改进的Elitist GA-SVM特征选择，用于心脏心律失常分类
3. ESVM: Evolutionary support vector machine for automatic feature selection and classification of microarray data [J] . Hui-Ling Huang, Fang-Lin Chang BioSystems . 2007,第2期

机译：ESVM：用于自动特征选择和微阵列数据分类的进化支持向量机
4. Influence of Word Normalization and Chi-Squared Feature Selection on Support Vector Machine (SVM) Text Classification [C] . Ardy Wibowo Haryanto, Edy Kholid Mawardi, Muljono International Seminar on Application for Technology of Information and Communication . 2018

机译：单词归一化和Chi-Square特征选择对支持向量机（SVM）文本分类的影响
5. Support vector machine/regression feature selection with an application towards classification. [D] . Halstead, John Brantley. 2005

机译：支持向量机/回归特征选择以及分类应用。
6. Optimization of breast mass classification using sequential forward floating selection (SFFS) and a support vector machine (SVM) model [O] . Maxine Tan, Jiantao Pu, Bin Zheng -1

机译：使用顺序前向浮动选择（SFFS）和支持向量机（SVM）模型优化乳房质量分类
7. Support Vector Machines based Arabic Language Text Classification System: Feature Selection Comparative Study [O] . Abdelwadood Mesleh 2013

机译：基于支持向量机的阿拉伯语文本分类系统：特征选择比较研究

Influence of Word Normalization and Chi-Squared Feature Selection on Support Vector Machine (SVM) Text Classification

摘要

著录项

相似文献

相关主题

期刊订阅