...
首页> 外文期刊>International Journal of Pattern Recognition and Artificial Intelligence >TRANSDUCTIVE LEARNING FOR SHORT-TEXT CLASSIFICATION PROBLEMS USING LATENT SEMANTIC INDEXING
【24h】

TRANSDUCTIVE LEARNING FOR SHORT-TEXT CLASSIFICATION PROBLEMS USING LATENT SEMANTIC INDEXING

机译:基于潜在语义索引的短文本分类问题的翻译学习

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents work that uses Transductive Latent Semantic Indexing (LSI) for text classification. In addition to relying on labeled training data, we improve classification accuracy by incorporating the set of test examples in the classification process. Rather than performing LSI's singular value decomposition (SVD) process solely on the training data, we instead use an expanded term-by-document matrix that includes both the labeled data as well as any available test examples. We report the performance of LSI on data sets both with and without the inclusion of the test examples, and we show that tailoring the SVD process to the test examples can be even more useful than adding additional training data. This method can be especially useful to combat possible inclusion of unrelated data in the original corpus, and to compensate for limited amounts of data. Additionally, we evaluate the vocabulary of the training and test sets and present the results of a series of experiments to illustrate how the test set is used in an advantageous way.
机译:本文介绍了使用转译潜在语义索引(LSI)进行文本分类的工作。除了依赖标记的训练数据,我们还通过在分类过程中合并测试示例集来提高分类准确性。与其仅对培训数据执行LSI的奇异值分解(SVD)处理,不如使用扩展的逐项术语矩阵,其中既包括标记数据,也包括任何可用的测试示例。我们在包含和不包含测试示例的情况下报告了LSI在数据集上的性能,并且表明针对测试示例定制SVD流程比添加其他培训数据更为有用。此方法对于打击原始语料库中可能包含的无关数据以及补偿有限数量的数据特别有用。此外,我们评估了训练和测试集的词汇,并给出了一系列实验的结果,以说明如何以有利的方式使用测试集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号