首页> 外文会议>International Conference on Information Science and Cloud Computing Companion >A Three-Stage Clustering Framework Based on Multiple Feature Combination for Chinese Person Name Disambiguation
【24h】

A Three-Stage Clustering Framework Based on Multiple Feature Combination for Chinese Person Name Disambiguation

机译:基于多个特征组合的三级聚类框架,为中国人名消歧

获取原文

摘要

To solve name ambiguity problems and improve the performance of person name disambiguation, we propose a three-stage clustering algorithm in the paper. In the first stage, organizations and locations (OLs) are used to cluster documents about the same person, therefore some texts with more resemblance will be assigned to one category. This stage is simply document clustering based on the similarity of OLs. In the second stage, the clustered documents are used as a new data source from which some novel features (like co-author names) are extracted. We used these new extracted features to make additional clustering between documents. Meanwhile, a method was proposed to solve name ambiguity problems by using social networks construction based on the relationships among co-authors. In the third stage, Web pages are further clustered using content-based hierarchical agglomerative clustering (HAC) algorithm, then analyzing the useful content including title and abstract and keywords (TAKs) to disambiguate the ambiguous names. Experimental results show that our three-stage clustering algorithm can availably enhance the performance of person name disambiguation.
机译:要解决名称歧义问题并提高人名消歧的性能,我们提出了一种三级聚类算法。在第一阶段,组织和位置(OLS)用于群集关于同一个人的文档,因此具有更多相似的文本将分配给一个类别。此阶段仅基于OLS的相似性的文档群集。在第二阶段,群集文档用作提取一些新的数据源(例如共同作者名称)。我们使用这些新的提取功能来在文档之间进行额外的聚类。同时,提出了一种方法来解决基于共同作者之间的关系的社交网络施工来解决名称模糊问题。在第三阶段,使用基于内容的分层凝聚聚类(HAC)算法进一步群集网页,然后分析包括标题和抽象和关键字(TAK)的有用内容来消除模糊名称。实验结果表明,我们的三阶段聚类算法可以可用地提高人名消歧的表现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号