词汇语义关联度计算是信息检索和自然语言处理的关键问题之一。针对该问题提出一种改进的基于 Wikipedia 语义关联度计算方法 WGR。该方法使用 Wikipedia 数据集作为背景知识库,在传统方法的基础上融合维基文章中的布局信息,并对维基概念的入链和出链使用不同的方法进行处理;引入 Google 搜索资源,经分类筛选后使用 LDA 建模计算关联度;最后综合两个数据集的结果得到 WGR 语义关联度。通过实验分析,WGR 在与现有算法比较时,取得了更好的准确率。%Calculating the semantic relatedness between words is one of the key issues of information retrieval and natural language processing,for this issue,we presented WGR,an improved semantic relatedness calculation method based on Wikipedia.The method uses Wikipedia dataset as the background knowledge base,integrates on the basis of traditional method the layout information in Wikipedia articles,and processes the backward link and forward link of Wiki concepts with different methods.Besides,it introduces the resources of Google search,after classification and sieving,it uses LDA modelling to calculate the semantic relatedness,and finally integrates the results from two datasets to get WGR semantic relatedness.Through experimental analysis,WGR achieves better accuracy in comparison with existing algorithms.
展开▼