The key points to improve the generalization ability of question classifier is how to extract the es-sence and internal characteristics from the high scale,high correlation and nonlinear original data. The feature selection method based on all features,word bag and word sequence is discussed in this paper. A combination approach of random forest and support vector machine (SVM) is proposed for feature selection. Experiments show that this method is simple and effective in selection of classification features,and can improve the effi-ciency and accuracy of question classification.%由于数据具有海量、高相关性和非线性的特点,所以如何选择原始数据的本质特征,是关系到能否有效提高问题分类器推广能力的关键问题。本文讨论了目前基于所有特征以及词袋和词序列袋的特征选择方法,提出了采用随机森林和支持向量机(SVM)相结合的方法来进行特征选择。实验证明,此方法能够有效地选择分类特征,从而提升问题分类的效率和精度。
展开▼