...
首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNs for Large-Scale Multi-Label Text Classification
【24h】

Hierarchical Taxonomy-Aware and Attentional Graph Capsule RCNNs for Large-Scale Multi-Label Text Classification

机译:用于大型多标签文本分类的分类分类分类和注意力图胶囊RCNN

获取原文
获取原文并翻译 | 示例
           

摘要

CNNs, RNNs, GCNs, and CapsNets have shown significant insights in representation learning and are widely used in various text mining tasks such as large-scale multi-label text classification. Most existing deep models for multi-label text classification consider either the non-consecutive and long-distance semantics or the sequential semantics. However, how to coherently take them into account is still far from studied. In addition, most existing methods treat output labels as independent medoids, ignoring the hierarchical relationships among them, which leads to a substantial loss of useful semantic information. In this paper, we propose a novel hierarchical taxonomy-aware and attentional graph capsule recurrent CNNs framework for large-scale multi-label text classification. Specifically, we first propose to model each document as a word order preserved graph-of-words and normalize it as a corresponding word matrix representation preserving both the non-consecutive, long-distance and local sequential semantics. Then the word matrix is input to the proposed attentional graph capsule recurrent CNNs for effectively learning the semantic features. To leverage the hierarchical relations among the class labels, we propose a hierarchical taxonomy embedding method to learn their representations, and define a novel weighted margin loss by incorporating the label representation similarity. Extensive evaluations on three datasets show that our model significantly improves the performance of large-scale multi-label text classification by comparing with state-of-the-art approaches.
机译:CNNS,RNNS,GCN和CapSnets显示了表示学习的重要见解,并且广泛用于各种文本挖掘任务,例如大规模的多标签文本分类。最现有的多标签文本分类的深层模型考虑了非连续和长距离语义或连续语义。但是,如何连贯地考虑到他们仍然远离研究。此外,大多数现有方法将输出标签视为独立的麦细管,忽略它们之间的分层关系,这导致有用的语义信息的大量损失。在本文中,我们提出了一种新颖的分类分类 - 感知和注意力图胶囊复发性CNNS框架,用于大规模的多标签文本分类。具体而言,我们首先建议将每个文档模拟作为单词秩序保留的单词,并将其标准化为保留非连续,长距离和本地连续语义的相应词矩阵表示。然后将单词矩阵输入到所提出的注意力图胶囊复制CNN,以有效地学习语义特征。为了利用类标签之间的分层关系,我们提出了一种分类分类的嵌入方法来学习其表示,并通过纳入标签表示相似度来定义新的加权保证金损失。三个数据集的广泛评估表明,我们的模型通过与最先进的方法进行比较,我们的模型显着提高了大规模多标签文本分类的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号