Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts

Rafael Geraldeli Rossi; Alneu de Andrade Lopes; Solange Oliveira Rezende

首页> 外文期刊>Information Processing & Management >Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts

【24h】

Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts

机译：优化和双向传播异构网络中的标签，以改善文本的转导分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Transductive classification is a useful way to classify texts when labeled training examples are insufficient. Several algorithms to perform transductive classification considering text collections represented in a vector space model have been proposed. However, the use of these algorithms is unfeasible in practical applications due to the independence assumption among instances or terms and the drawbacks of these algorithms. Network-based algorithms come up to avoid the drawbacks of the algorithms based on vector space model and to improve transductive classification. Networks are mostly used for label propagation, in which some labeled objects propagate their labels to other objects through the network connections. Bipartite networks are useful to represent text collections as networks and perform label propagation. The generation of this type of network avoids requirements such as collections with hyperlinks or citations, computation of similarities among all texts in the collection, as well as the setup of a number of parameters. In a bipartite heterogeneous network, objects correspond to documents and terms, and the connections are given by the occurrences of terms in documents. The label propagation is performed from documents to terms and then from terms to documents iteratively. Nevertheless, instead of using terms just as means of label propagation, in this article we propose the use of the bipartite network structure to define the relevance scores of terms for classes through an optimization process and then propagate these relevance scores to define labels for unlabeled documents. The new document labels are used to redefine the relevance scores of terms which consequently redefine the labels of unlabeled documents in an iterative process. We demonstrated that the proposed approach surpasses the algorithms for transductive classification based on vector space model or networks. Moreover, we demonstrated that the proposed algorithm effectively makes use of unlabeled documents to improve classification and it is faster than other transductive algorithms.

机译：当标注的训练示例不足时，转导分类是对文本进行分类的一种有用方法。已经提出了几种考虑矢量空间模型中表示的文本集合来进行转导分类的算法。然而，由于实例或术语之间的独立性假设以及这些算法的缺点，在实际应用中使用这些算法是不可行的。提出了基于网络的算法，以避免基于向量空间模型的算法的弊端，并改善转导分类。网络主要用于标签传播，其中一些带标签的对象通过网络连接将其标签传播到其他对象。双向网络可用于将文本集合表示为网络并执行标签传播。这种类型的网络避免了诸如具有超链接或引文的集合，集合中所有文本之间的相似度计算以及许多参数设置之类的要求。在双向异构网络中，对象对应于文档和术语，并且连接由文档中术语的出现给出。标签的传播是从文档到术语，然后是从术语到文档进行迭代。但是，在本文中，我们建议使用二分网络结构通过优化过程来定义类别的术语相关性分数，然后传播这些相关性分数以定义未标记文档的标签，而不是仅使用术语作为标签传播的手段。新的文档标签用于重新定义术语的相关性分数，从而在迭代过程中重新定义未标签文档的标签。我们证明了该方法优于基于向量空间模型或网络的转导分类算法。此外，我们证明了该算法有效地利用了无标签文档来提高分类效率，并且比其他转导算法要快。

著录项

来源
《Information Processing & Management》 |2016年第2期|217-257|共41页
作者
Rafael Geraldeli Rossi; Alneu de Andrade Lopes; Solange Oliveira Rezende;
展开▼
作者单位

Department of Computer Science, Institute of Mathematics and Computer Science, University of Sao Paulo, Brazil;

Institute of Mathematics and Computer Science, University of Sao Paulo, Brazil;

Institute of Mathematics and Computer Science, University of Sao Paulo, Brazil;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Text classification; Transductive learning; Graph-based learning; Text mining; Label propagation; Bipartite heterogeneous network;

机译：文字分类;转换学习;基于图的学习;文本挖掘;标签传播;双向异构网络;

相似文献

外文文献
中文文献
专利

1. Optimizing the class information divergence for transductive classification of texts using propagation in bipartite graphs [J] . Faleiros Thiago de Paulo, Rossi Rafael Geraldeli, Lopes Alneu de Andrade Pattern recognition letters . 2017,第FEBa1期

机译：使用二部图中的传播优化类别信息差异以进行文本的归纳分类
2. Using bipartite heterogeneous networks to speed up inductive semi-supervised learning and improve automatic text categorization [J] . Rossi Rafael Geraldeli, Lopes Alneu de Andrade, Rezende Solange Oliveira Knowledge-Based Systems . 2017,第sepa15期

机译：使用双向异构网络来加速归纳半监督学习并改善自动文本分类
3. Inductive Model Generation for Text Classification Using a Bipartite Heterogeneous Network [J] . Rafael Geraldeli Rossi, Alneu de Andrade Lopes, Thiago de Paulo Faleiros, 计算机科学技术学报（英文版） . 2014,第003期

机译：双向异构网络用于文本分类的归纳模型生成
4. Active transductive KNN for sparsely labeled text classification [C] . Xiao, Wang-xin, Zhang, Xue International Conference on Soft Computing and Intelligent Systems;SCIS;International Symposium on Advanced Intelligent Systems;ISIS . 2012

机译：主动转导KNN用于稀疏标签文本分类
5. Deep Neural Networks for Multi-Label Text Classification: Application to Coding Electronic Medical Records [D] . Rios, Anthony. 2018

机译：用于多标签文本分类的深层神经网络：在电子病历编码中的应用
6. Label propagation method based on bi-objective optimization for ambiguous community detection in large networks [O] . Junhai Luo, Lei Ye -1

机译：基于双目标优化的大型网络模糊社区检测标签传播方法
7. Music classification by transductive learning using bipartite heterogeneous networks [O] . Silva Diego Furtado, Rossi Rafael Geraldeli, Rezende Solange Oliveira, 2014

机译：使用二分异构网络通过转导学习进行音乐分类

Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts

摘要

著录项

相似文献

相关主题

期刊订阅