首页> 外文会议>IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies >A POI Categorization by Composition of Onomastic and Contextual Information
【24h】

A POI Categorization by Composition of Onomastic and Contextual Information

机译:通过本体信息和上下文信息的组合进行POI分类

获取原文

摘要

Point of interest (POI) categorization is the task of finding of categories of POIs within a document. Because the documents that possess POIs have clue words for identifying POI categories, the task can be solved as document classification. However, this approach misses two crucial factors for identifying the category of a POI. First, the approach pays no attention to onomastic information, even though POI names reveal much categorical information in many cases. Second, the approach ignores the fact that most clue words for identifying a POI category are located near the POI name. This paper proposes a novel method that incorporates both onomastic and local contextual information in POI categorization. The proposed method uses support vector machines (SVMs) to categorize POIs. In order to utilize the onomastic information of POIs, The proposed method adopts the string kernel that manages variations of the POI names efficiently at the character level. The method also proposes a Gaussian weighting to content words in a document. By setting the mean of a Gaussian weighting at the position of a POI name, the method imposes higher weights to the words near the POI name and lower weights to the words far from the name. Then, these two types of information are combined by a composite kernel of the string kernel and a linear kernel with the Gaussian weighting. A series of experiments prove that SVMs with the combined information outperforms those with single information.
机译:兴趣点(POI)归类是在文档中查找POI类别的任务。因为拥有POI的文档具有用于标识POI类别的线索,所以可以将任务作为文档分类来解决。但是,此方法缺少识别POI类别的两个关键因素。首先,即使POI名称在许多情况下都揭示了很多分类信息,该方法也不关注正则信息。其次,该方法忽略了以下事实:大多数用于标识POI类别的线索词都位于POI名称附近。本文提出了一种新颖的方法,该方法在POI分类中结合了异常信息和局部上下文信息。所提出的方法使用支持向量机(SVM)对POI进行分类。为了利用POI的本体信息,该方法采用了字符串核,该字符串核在字符级别上有效地管理POI名称的变化。该方法还提出了对文档中的内容词的高斯加权。通过在POI名称的位置处设置高斯加权的平均值,该方法对POI名称附近的单词施加较高的权重,而对远离名称的单词施加较高的权重。然后,这两种类型的信息由具有高斯加权的字符串核和线性核的复合核组合而成。一系列实验证明,具有组合信息的SVM优于具有单一信息的SVM。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号