首页> 外文会议>Discovery science >A Methodology for Mining Document-Enriched Heterogeneous Information Networks
【24h】

A Methodology for Mining Document-Enriched Heterogeneous Information Networks

机译:丰富文档异构信息网络的方法

获取原文
获取原文并翻译 | 示例

摘要

The paper presents a new methodology for mining heterogeneous information networks, motivated by the fact that, in many real-life scenarios, documents are available in heterogeneous information networks, such as interlinked multimedia objects containing titles, descriptions, and subtitles. The methodology consists of transforming documents into bag-of-words vectors, decomposing the corresponding heterogeneous network into separate graphs and computing structural-context feature vectors with PageRank, and finally constructing a common feature vector space in which knowledge discovery is performed. We exploit this feature vector construction process to devise an efficient classification algorithm. We demonstrate the approach by applying it to the task of categorizing video lectures. We show that our approach exhibits low time and space complexity without compromising classification accuracy.
机译:本文提出了一种用于挖掘异构信息网络的新方法,这一事实的动机是,在许多实际场景中,文档可以在异构信息网络中使用,例如包含标题,描述和字幕的互连多媒体对象。该方法包括将文档转换成单词袋向量,将相应的异构网络分解为单独的图,并使用PageRank计算结构上下文特征向量,最后构造一个用于执行知识发现的公共特征向量空间。我们利用此特征向量构建过程来设计一种有效的分类算法。我们通过将其应用于视频讲座的分类任务来演示该方法。我们证明了我们的方法在不影响分类准确性的前提下,显示出较低的时间和空间复杂度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号