首页> 外文会议>International Conference on Engineering MIS >Arabic text classification using linear discriminant analysis
【24h】

Arabic text classification using linear discriminant analysis

机译:使用线性判别分析的阿拉伯文字分类

获取原文

摘要

The linear discriminant analysis (LDA) is a dimensionality reduction technique that is widely used in pattern recognition applications. The LDA aims at generating effective feature vectors by reducing the dimensions of the original data (e.g. bag-of-words textual representation) into a lower dimensional space. Hence, the LDA is a convenient method for text classification that is known by huge dimensional feature vectors. In this paper, we empirically investigated two LDA based methods for Arabic text classification. The first method is based on computing the generalized eigenvectors of the ratio (between-class to within-class) scatters, the second method includes linear classification functions that assume equal population covariance matrices (i.e. pooled sample covariance matrix). We used a textual data collection that contains 1,750 documents belong to five categories. The testing set contains 250 documents belong to five categories (50 documents for each category). The experimental results show that the linear classification functions method outperforms the eigenvalue decomposition method. We emphasize that the goal of this work is to demonstrate how to employ the LDA algorithm in text classification rather than comparing the performance with other well-known text classification algorithms.
机译:线性判别分析(LDA)是一种降维技术,已广泛应用于模式识别应用中。 LDA旨在通过将原始数据的尺寸(例如单词袋的文本表示形式)减少到较低的空间来生成有效的特征向量。因此,LDA是一种方便的文本分类方法,众所周知,它具有巨大的维特征向量。在本文中,我们实证研究了两种基于LDA的阿拉伯文本分类方法。第一种方法是基于计算(类别间与类别内)比率散点的广义特征向量,第二种方法包括线性分类函数,这些函数假设总体人口协方差矩阵(即合并样本协方差矩阵)相等。我们使用了一个文本数据集合,其中包含1,750个文档,这些文档属于五个类别。测试集包含属于五个类别的250个文档(每个类别50个文档)。实验结果表明,线性分类函数方法优于特征值分解方法。我们强调,这项工作的目的是演示如何在文本分类中使用LDA算法,而不是将性能与其他知名的文本分类算法进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号