...
首页> 外文期刊>Intelligent data analysis >Multi-label text classification based on the label correlation mixture model
【24h】

Multi-label text classification based on the label correlation mixture model

机译:基于标签相关混合模型的多标签文本分类

获取原文
获取原文并翻译 | 示例
           

摘要

In the current paper, we propose a probabilistic generative model, the label correlation mixture model (LCMM), to depict multi-labeled document data, which can be utilized for multi-label text classification. LCMM assumes two stochastic generative processes, which correspond to two submodels: 1) a label correlation model; and 2) a label mixture model. The former model formulates labels' generative process, in which a label correlation network is created to depict the dependency between labels. Moreover, we present an efficient inference algorithm for calculating the generative probability of a multi-label class. Furthermore, in order to optimize the label correlation network, we propose a parameter-learning algorithm based on gradient descent. The second submodel in the LCMM depicts the generative process of words in a document with the given labels. Different traditional mixture models can be adopted in this generative process, such as the mixture of language models, or topic models. In the multi-label classification stage, we propose a two-step strategy to most efficiently utilize the LCMM based on the framework of Bayes decision theory. We conduct extensive multi-label classification experiments on three standard text data sets. The experimental results show significant performance improvements comparing to existing approaches. For example, the improvements on accuracy and macro F-score measures in the OHSUMED data set achieve 28.3% and 37.0%, respectively. These performance enhancements demonstrate the effectiveness of the proposed models and solutions.
机译:在当前的论文中,我们提出了一种概率生成模型,即标签相关混合模型(LCMM),用于描述多标签文档数据,该数据可用于多标签文本分类。 LCMM假设有两个随机生成过程,分别对应两个子模型:1)标签相关模型; 2)标签混合物模型。前一个模型制定了标签的生成过程,其中创建了标签相关网络来描述标签之间的依赖性。此外,我们提出了一种有效的推理算法,用于计算多标签类的生成概率。此外,为了优化标签相关网络,提出了一种基于梯度下降的参数学习算法。 LCMM中的第二个子模型描述了带有给定标签的文档中单词的生成过程。在此生成过程中,可以采用不同的传统混合模型,例如语言模型或主题模型的混合。在多标签分类阶段,我们基于贝叶斯决策理论的框架,提出了两步策略以最有效地利用LCMM。我们对三个标准文本数据集进行了广泛的多标签分类实验。实验结果表明,与现有方法相比,性能有了显着提高。例如,OHSUMED数据集中的准确性和宏F分数度量的改进分别达到28.3%和37.0%。这些性能增强证明了所提出的模型和解决方案的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号