首页> 外文期刊>Information Processing & Management >Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts
【24h】

Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts

机译:优化和双向传播异构网络中的标签,以改善文本的转导分类

获取原文
获取原文并翻译 | 示例
           

摘要

Transductive classification is a useful way to classify texts when labeled training examples are insufficient. Several algorithms to perform transductive classification considering text collections represented in a vector space model have been proposed. However, the use of these algorithms is unfeasible in practical applications due to the independence assumption among instances or terms and the drawbacks of these algorithms. Network-based algorithms come up to avoid the drawbacks of the algorithms based on vector space model and to improve transductive classification. Networks are mostly used for label propagation, in which some labeled objects propagate their labels to other objects through the network connections. Bipartite networks are useful to represent text collections as networks and perform label propagation. The generation of this type of network avoids requirements such as collections with hyperlinks or citations, computation of similarities among all texts in the collection, as well as the setup of a number of parameters. In a bipartite heterogeneous network, objects correspond to documents and terms, and the connections are given by the occurrences of terms in documents. The label propagation is performed from documents to terms and then from terms to documents iteratively. Nevertheless, instead of using terms just as means of label propagation, in this article we propose the use of the bipartite network structure to define the relevance scores of terms for classes through an optimization process and then propagate these relevance scores to define labels for unlabeled documents. The new document labels are used to redefine the relevance scores of terms which consequently redefine the labels of unlabeled documents in an iterative process. We demonstrated that the proposed approach surpasses the algorithms for transductive classification based on vector space model or networks. Moreover, we demonstrated that the proposed algorithm effectively makes use of unlabeled documents to improve classification and it is faster than other transductive algorithms.
机译:当标注的训练示例不足时,转导分类是对文本进行分类的一种有用方法。已经提出了几种考虑矢量空间模型中表示的文本集合来进行转导分类的算法。然而,由于实例或术语之间的独立性假设以及这些算法的缺点,在实际应用中使用这些算法是不可行的。提出了基于网络的算法,以避免基于向量空间模型的算法的弊端,并改善转导分类。网络主要用于标签传播,其中一些带标签的对象通过网络连接将其标签传播到其他对象。双向网络可用于将文本集合表示为网络并执行标签传播。这种类型的网络避免了诸如具有超链接或引文的集合,集合中所有文本之间的相似度计算以及许多参数设置之类的要求。在双向异构网络中,对象对应于文档和术语,并且连接由文档中术语的出现给出。标签的传播是从文档到术语,然后是从术语到文档进行迭代。但是,在本文中,我们建议使用二分网络结构通过优化过程来定义类别的术语相关性分数,然后传播这些相关性分数以定义未标记文档的标签,而不是仅使用术语作为标签传播的手段。新的文档标签用于重新定义术语的相关性分数,从而在迭代过程中重新定义未标签文档的标签。我们证明了该方法优于基于向量空间模型或网络的转导分类算法。此外,我们证明了该算法有效地利用了无标签文档来提高分类效率,并且比其他转导算法要快。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号