首页> 中文期刊> 《情报学报》 >基于多重文本术语关系叠加识别文本核心主题的有效性探索

基于多重文本术语关系叠加识别文本核心主题的有效性探索

         

摘要

Most studies on text theme identification have been based on single-term relationship networks in past decades. A text is composed of semantic terms that are arranged according to a certain logical structure, for example, syntactic relationships, semantic relationships, and co-occurrence relationships. This can result in information loss in text mining based on a single text relationship. Therefore, we explored the effectiveness of text theme identification based on the superposition of three text relationships. We chose 249 texts on migraine disorders published from 2012 to 2014 in the PubMed database as experimental data. The experiment results showed that the superposition of terms and their relationships make the important text core information more prominent. We further analyzed simultaneously three relationship graphs consisting of terms and edges, and found that the numbers of nodes and edges in a multi-relationship graph are greater than in semantic and syntactic relationship graphs, but less than in a co-occurrence relationship graph. The cohesiveness of cliques in a multi-relationship graph is higher than in co-occurrence and syntactic relationship graphs; in addition, their difference is statistically significant. Text relationship superposition can reduce the effect of co-occurrence in text mining while strengthening the semantic and syntactic relationships. At the same time, the information in the three relationship graphs was overlaid and can reveal the theme of a text in a comprehensive manner.%目前基于图或网络进行文本主题挖掘的研究大多是基于单一文本术语关系,而文本是由一系列具有语义信息的术语,按照一定的逻辑结构构成的,这些术语除了物理位置上共现关系外,还存在句法上的支配从属关系和隐含的语义关联,仅利用单一术语关系对文本内容进行分析难免会造成信息的丢失,因此本文尝试将术语间的共现、句法和语义三种关系进行叠加,探索基于多重文本术语关系识别核心主题的有效性.文中选取PubMed数据库2012-2014年"migraine disorders"主题相关的249篇论文进行实验,结果表明术语和关系的叠加使文本主题信息更为凸显,同时存在三种关系的术语和边可表征文本的重要内容.对同时存在三种关系的术语和边组成多重文本术语关系图深度分析显示,叠加术语关系图中所包含的clique子团的边数和结点数少于术语共现关系图,但多于术语语义和句法关系图;在凝聚度最大的前20个clique中,叠加术语关系图中所含clique的凝聚度要高于共现、句法和语义三者中的任何一个,且这种差异具有统计学意义.多重术语关系的叠加平衡共现、句法和语义三种关系,在减少术语共现关系影响的同时增加术语语义和句法关系的优势,将三者含有的信息量进行叠加,克服了单独考虑一种术语关系时造成的信息丢失.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号