首页> 美国卫生研究院文献>other >Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks

【2h】

Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks

机译：通过异构信息网络将世界知识纳入文档聚类

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

One of the key obstacles in making learning protocols realistic in applications is the need to supervise them, a costly process that often requires hiring domain experts. We consider the framework to use the world knowledge as indirect supervision. World knowledge is general-purpose knowledge, which is not designed for any specific domain. Then the key challenges are how to adapt the world knowledge to domains and how to represent it for learning. In this paper, we provide an example of using world knowledge for domain dependent document clustering. We provide three ways to specify the world knowledge to domains by resolving the ambiguity of the entities and their types, and represent the data with world knowledge as a heterogeneous information network. Then we propose a clustering algorithm that can cluster multiple types and incorporate the sub-type information as constraints. In the experiments, we use two existing knowledge bases as our sources of world knowledge. One is Freebase, which is collaboratively collected knowledge about entities and their organizations. The other is YAGO2, a knowledge base automatically extracted from Wikipedia and maps knowledge to the linguistic knowledge base, Word-Net. Experimental results on two text benchmark datasets (20newsgroups and RCV1) show that incorporating world knowledge as indirect supervision can significantly outperform the state-of-the-art clustering algorithms as well as clustering algorithms enhanced with world knowledge features.

机译：在应用中使学习协议切实可行的主要障碍之一是需要对其进行监督，这是一个昂贵的过程，通常需要聘请领域专家。我们认为使用世界知识作为间接监管的框架。世界知识是通用知识，并非针对任何特定领域而设计。接下来的主要挑战是如何使世界知识适应领域，以及如何将其表示为学习对象。在本文中，我们提供了一个将世界知识用于依赖域的文档聚类的示例。我们提供了三种方法，通过解决实体及其类型的歧义来指定领域的世界知识，并将具有世界知识的数据表示为异构信息网络。然后，我们提出了一种聚类算法，可以对多种类型进行聚类，并将子类型信息纳入约束。在实验中，我们使用两个现有的知识库作为我们的世界知识来源。一个是Freebase，它是通过协作收集有关实体及其组织的知识。另一个是YAGO2，这是一个从Wikipedia中自动提取的知识库，并将知识映射到语言知识库Word-Net。在两个文本基准数据集（20newsgroups和RCV1）上的实验结果表明，将世界知识用作间接监管可以大大胜过最新的聚类算法以及具有世界知识功能增强的聚类算法。

著录项

期刊名称 other
作者
Chenguang Wang; Yangqiu Song; Ahmed El-Kishky; Dan Roth; Ming Zhang; Jiawei Han;
展开▼
作者单位

展开▼
年(卷),期 -1(2015),-1
年度 -1
页码 1215–1224
总页数 31
原文格式 PDF
正文语种
中图分类
关键词
World Knowledge Heterogeneous Information Network Document Clustering Knowledge Base Knowledge Graph;

机译：世界知识;异构信息网络;文档聚类;知识库;知识图;

相似文献

外文文献
中文文献
专利

1. Knowledge Popularity in a Heterogeneous Network: Exploiting the Contextual Effects of Document Popularity in Knowledge Management Systems [J] . Xiqing Sha, Klarissa Ting-Ting Chang, Cheng Zhang, Journal of the American Society for Information Science . 2013,第9期

机译：异构网络中的知识流行度：利用知识管理系统中文档流行度的上下文效应
2. Resource Efficient Clustering and Next Hop Knowledge Based Routing in Multiple Heterogeneous Wireless Sensor Networks [J] . Sunil Kumar, Priya Ranjan, Radhakrishnan Ramaswami, International journal of grid and high performance computing . 2017,第2期

机译：多异构无线传感器网络中的资源高效集群和基于下一跳知识的路由
3. Reconceptualizing knowledge networks for enterprise systems implementation: incorporating domain expertise of knowledge sources and knowledge flow intensity [J] . Sasidharan Sharath Information & Management . 2019,第3期

机译：重新概念化知识网络以实现企业系统：整合知识来源和知识流强度的领域专业知识
4. Incorporating Semantic and Syntactic Information in Document Representation for Document Clustering [C] . Yong Wang, Julia Hodges The 9th World Multi-Conference on Systemics, Cybernetics and Informatics(WMSCI 2005) vol.8 . 2005

机译：在文档表示中将语义和句法信息纳入文档聚类
5. Incorporating background knowledge in document clustering. [D] . Fodeh, Samah Jamal. 2010

机译：将背景知识纳入文档聚类。
6. KnowSim: A Document Similarity Measure on Structured Heterogeneous Information Networks [O] . Chenguang Wang, Yangqiu Song, Haoran Li, -1

机译：KnowSim：结构化异构信息网络上的文档相似性度量
7. Load balancing in heterogeneous wireless communications networks. Optimized load aware vertical handovers in satellite-terrestrial hybrid networks incorporating IEEE 802.21 media independent handover and cognitive algorithms. [O] . Ali Muhammad 2012

机译：异构无线通信网络中的负载平衡。结合了IEEE 802.21媒体独立切换和认知算法的卫星-地面混合网络中优化的负载感知垂直切换。

Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks

摘要

著录项

相似文献

相关主题

期刊订阅