首页> 外文会议>International symposium on intelligent data analysis >Automatic POI Matching Using an Outlier Detection Based Approach
【24h】

Automatic POI Matching Using an Outlier Detection Based Approach

机译:使用基于异常值检测的方法进行自动POI匹配

获取原文

摘要

Points of Interest (POI) are widely used in many applications nowadays mainly due to the increasing amount of related data available online, notably from volunteered geographic information (VGI) sources. Being able to connect these data from different sources is useful for many things like validating, correcting and also removing duplicated data in a database. However, there is no standard way to identify the same POIs across different sources and doing it manually could be very expensive. Therefore, automatic POI matching has been an attractive research topic. In our work, we propose a novel data-driven machine learning approach based on an outlier detection algorithm to match POIs automatically. SuΦrisingly, works that have been presented so far do not use data-driven machine learning approaches. The reason for this might be that such approaches need a training dataset to be constructed by manually matching some POIs. To mitigate this, we have taken advantage of the Crosswalk API, available at the time we started our project, which allowed us to retrieve already matched POI data from different sources in US territory. We trained and tested our model with a dataset containing Factual, Facebook and Foursquare POIs from New York City and were able to successfully apply it to another dataset of Facebook and Foursquare POIs from Porto, Portugal, finding matches with an accuracy around 95%. These are encouraging results that confirm our approach as an effective way to address the problem of automatically matching POIs. They also show that such a model can be trained with data available from multiple sources and be applied to other datasets with different locations from those used in training. Furthermore, as a data-driven machine learning approach, the model can be continuously improved by adding new validated data to its training dataset.
机译:如今,兴趣点(POI)广泛用于许多应用程序中,这主要是由于在线提供的相关数据量不断增加,尤其是来自自愿性地理信息(VGI)来源的数据。能够连接来自不同来源的这些数据对于许多事情很有用,例如验证,更正以及删除数据库中的重复数据。但是,没有标准的方法可以在不同来源中识别相同的POI,而手动进行操作可能会非常昂贵。因此,自动POI匹配已成为一个有吸引力的研究主题。在我们的工作中,我们提出了一种基于异常值检测算法的新型数据驱动的机器学习方法,以自动匹配POI。令人惊讶的是,到目前为止已经提出的作品没有使用数据驱动的机器学习方法。其原因可能是这种方法需要通过手动匹配某些POI来构建训练数据集。为了减轻这种情况,我们利用了在开始项目时就可用的Crosswalk API,该API使我们能够从美国境内的不同来源检索已经匹配的POI数据。我们使用包含来自纽约市的Factual,Facebook和Foursquare POI的数据集对模型进行了训练和测试,并能够成功地将其应用于来自葡萄牙波尔图的Facebook和Foursquare POI的另一个数据集,发现匹配度约为95%。这些令人鼓舞的结果证实了我们的方法是解决自动匹配POI问题的有效方法。他们还表明,可以使用可从多个来源获得的数据来训练这样的模型,并将其应用于与训练中所使用的位置不同的其他数据集。此外,作为一种数据驱动的机器学习方法,可以通过将新的经过验证的数据添加到其训练数据集中来不断改进该模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号