...
首页> 外文期刊>Computers,environment and urban systems >GSAM: A deep neural network model for extracting computational representations of Chinese addresses fused with geospatial feature
【24h】

GSAM: A deep neural network model for extracting computational representations of Chinese addresses fused with geospatial feature

机译:GSAM:一种深度神经网络模型,用于提取与地理空间特征融合的中国地址的计算表示

获取原文
获取原文并翻译 | 示例
           

摘要

Addresses are one of the most important geographical reference systems in natural languages. In China, due to the relatively backward address planning, there are a large number of non-standard addresses. This kind of unstructured text makes the management and application of Chinese addresses much more difficult. However, by extracting the computational representations of addresses, it can be structured and its related applications can be extended more conveniently. Therefore, this paper utilizes a deep neural language model from natural language processing (NLP) to automatically extract computational representations through an unsupervised address language model (ALM), which is trained in an unsupervised way and is suitable for a large-scale address corpus. We propose a solution to fuse addresses and geospatial features and construct a geospatial-semantic address model (GSAM) that supports a variety of downstream tasks. Our proposed GSAM constructing process consists of three phases. First, we build an ALM using bidirectional encoder representations from Transformers (BERT) to learn the addresses' semantic representations. Then, the fusion clustering results of the semantic and geospatial information are obtained by a high-dimensional clustering algorithm. Finally, we construct the GSAM based on the fused clustering results using novel fine-tuning techniques. Furthermore, we apply the extracted computational representation from GSAM to the address location prediction task. The experimental results indicate that the target task accuracy of the ALM is 90.79%, and the result of semantic geospatial fusion clustering strongly correlates with fine-grained urban neighbourhood area division. The GSAM can accurately identify clustering labels and the values of evaluation metrics are all above 0.96. We also demonstrate that our model outperforms purely ALM-based and word2vec-based models by address location prediction task.
机译:地址是自然语言中最重要的地理参考系统之一。在中国,由于相对落后的地址规划,有大量的非标准地址。这种非结构化文本使中国地址的管理和应用更加困难。但是,通过提取地址的计算表示,它可以是结构化的,并且其相关的应用程序可以更方便地扩展。因此,本文利用来自自然语言处理(NLP)的深神经语言模型来通过无监督的地址语言模型(ALM)自动提取计算表示,该模型以无监督的方式培训并且适用于大规模地址语料库。我们提出了解决熔断器地址和地理空间特征的解决方案,并构建支持各种下游任务的地理空间语义地址模型(GSAM)。我们提出的GSAM构建过程包括三个阶段。首先,我们使用来自变换器(BERT)的双向编码器表示构建ALM来学习地址的语义表示。然后,通过高维聚类算法获得语义和地理空间信息的融合聚类结果。最后,我们根据使用新型微调技术基于熔融聚类结果构建GSAM。此外,我们将来自GSAM的提取的计算表示应用于地址位置预测任务。实验结果表明,ALM的目标任务准确性为90.79%,语义地理空间融合聚类的结果与细粒度城市街区划分强烈相关。 GSAM可以准确地识别聚类标签,评估度量的值全部高于0.96。我们还通过地址定位预测任务表明我们的模型优于基于ALM的基于ALM和Word2Vec的模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号