首页> 外文OA文献 >Cluster detection and analysis with geo-spatial datasets using a hybrid statistical and neural networks hierarchical approach
【2h】

Cluster detection and analysis with geo-spatial datasets using a hybrid statistical and neural networks hierarchical approach

机译:使用混合统计和神经网络分层方法对地理空间数据集进行聚类检测和分析

摘要

Spatial datasets contain information relating to the locations of incidents of phenomena for example, crime and disease. Areas that contain a higher than expected incidence of the phenomena, given background population and census datasets, are of particular interest. By analysing the locations of potential influence, it may be possible to establish where a cause and effect relationship is present in the observed process.Cluster detection techniques can be applied to such datasets in order to reveal information relating to the spatial distribution of the cases. Research in these areas has mainly concentrated on either computational or statistical aspects of cluster detection. Each clustering algorithm has its own strengths and weakness. Their main weaknesses causing their unreliability can be estimating the number of clusters, testing the number of components, selecting initial seeds (centroids), running time and memory requirements. Consequently, a new cluster detection methodology has been developed in this thesis based on knowledge drawn from both statistical and computing domains. This methodology is based on a hybrid of statistical methods using properties of probability rather than distance to associate data with clusters. No previous knowledge of the dataset is required and the number of clusters is not predetermined. It performs efficiently in terms of memory requirements, running time and cluster quality. The algorithm for determining both the centre of clusters and the existence of the clusters themselves was applied and tested on simulated and real datasets. The results which were obtained from identification of hotspots were compared with results of other available algorithms such as CLAP (Cluster Location Analysis Procedure), Satscan and GAM (Geographical Analysis Machine). The outputs are very similar.XVIGIS presented in this thesis encompasses the SCS algorithm, statistics and neural networks for developing a hybrid predictive crime model, mapping, visualizing crime data and the corresponding population in the study region, visualizing the location of obtained clusters and burglary incidence concentration ‘hotspots’ which was specified by clustering algorithm SCS. Naturally the quality of results is subject to the accuracy of the used data. GIS is used in this thesis for developing a methodology for modelling data containing multiple functions. The census data used throughout this construction provided a useful source of geo-demographic information. The obtained datasets were used for predictive crime modelling.This thesis has benefited from several existing methodologies to develop a hybrid modelling approach. The methodology was applied to real data on burglary incidence distribution in the study region. Relevant principles of statistics, Geographical Information System, Neural Networks and SCS algorithm were utilized for the analysis of observed data. Regression analysis was used for building a predictive crime model and combined with Neural Networks with the aim of developing a new hierarchical neural Network approaches to generate a more reliable prediction. The promising results were compared with the non-hierarchical neural Network back-propagation network and multiple regression analysis. The average percentage accuracy achieved by the new methodology at testing stage increase 13% compared with the non-hierarchical BP performance. In general the analysis reveals a number of predictors that increase the risk of burglary in the study region. Specifically living in a household in which there is ‘one person’, ‘lone parent’, household where occupations are in elementary or intermediate and unemployed. For the influence of Household space, the results indicate that the risk of burglary rate increases within the household living in shared houses.
机译:空间数据集包含与犯罪事件(例如犯罪和疾病)的位置有关的信息。在给定背景种群和人口普查数据集的情况下,包含比预期现象高的现象的地区特别受关注。通过分析潜在影响的位置,可以确定观察到的过程中因果关系的位置。可以将聚类检测技术应用于此类数据集,以揭示与案件的空间分布有关的信息。这些领域的研究主要集中在聚类检测的计算或统计方面。每种聚类算法都有其自身的优缺点。它们导致不可靠性的主要弱点可以是估计簇的数量,测试组件的数量,选择初始种子(质心),运行时间和内存需求。因此,本论文基于统计和计算领域的知识,开发了一种新的聚类检测方法。该方法基于统计方法的混合,该方法使用概率属性而不是距离来将数据与聚类相关联。不需要数据集的先前知识,并且簇的数目不是预定的。它在内存需求,运行时间和群集质量方面均能有效执行。确定聚类中心和聚类本身是否存在的算法已在模拟和真实数据集上应用和测试。从热点识别中获得的结果与其他可用算法(例如CLAP(集群位置分析程序),Satscan和GAM(地理分析机))的结果进行了比较。输出结果非常相似。本文介绍的XVIGIS包含SCS算法,统计数据和神经网络,用于开发混合预测犯罪模型,映射,可视化犯罪数据和研究区域中的相应人口,可视化获得的聚类和入室盗窃的位置。由聚类算法SCS指定的事件集中“热点”。自然,结果的质量取决于所用数据的准确性。本文使用GIS来开发一种对包含多个功能的数据进行建模的方法。在整个构建过程中使用的普查数据为地理人口统计信息提供了有用的资源。获得的数据集用于预测犯罪建模。本论文受益于现有的几种方法学,发展了一种混合建模方法。该方法已应用于研究区域内盗窃发生率分布的真实数据。利用统计的相关原理,地理信息系统,神经网络和SCS算法对观测数据进行分析。回归分析用于构建预测性犯罪模型,并与神经网络相结合,目的是开发一种新的分层神经网络方法以生成更可靠的预测。将该有希望的结果与非分层神经网络反向传播网络和多元回归分析进行了比较。与非分层BP性能相比,新方法在测试阶段所达到的平均百分比准确性提高了13%。一般而言,该分析揭示了许多预测因素,这些因素会增加研究区域入室盗窃的风险。特别是居住在有“一个人”,“单亲”的家庭,从事初级或中级职业且没有工作的家庭中。对于住户空间的影响,结果表明,居住在合住房屋的家庭中,入室盗窃率的风险增加。

著录项

  • 作者

    Majeed Salar Mustafa;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号