首页> 中文期刊> 《计算机应用与软件》 >基于改进遗传退火HMM的Web信息抽取研究

基于改进遗传退火HMM的Web信息抽取研究

         

摘要

In order to further raise the accuracy of Web information extraction,for the shortcomings of hidden Markov model (HMM)and its hybrid method in the parameter optimisation,we present a Web extraction algorithm which is based on the improved genetic annealing and HMM.First,the algorithm sets up a novel HMMwith backward dependency assumption;secondly,it applies the improved genetic annealing algorithm to optimise HMM parameters.After the genetic operators and parameters of simulated annealing (SA)have been improved,the subpopulations are classified according to the adaptive crossover and mutation probability of GA in order to realise the multi-group parallel search and information exchange,which can avoid premature and accelerate convergence.Then SA is taken for a GA operator to strengthen the local searching capability.Finally,the bi-order Viterbi algorithm is used for decoding.Compared with existing HMM optimisation method,the comprehensive Fβ=1 value in experiment increases by 6% in average,which shows that the improved algorithm can effectively raise the extraction accuracy and search performance.%为进一步提高 Web 信息抽取的准确率,针对隐马尔可夫模型 HMM(Hidden Markov Model)及混合法在参数寻优上的不足,提出一种改进遗传退火 HMM的 Web 抽取算法。构建一个后向依赖假设的 HMM;用改进遗传退火优化 HMM参数,将遗传算子和模拟退火 SA(simulated annealing)参数改进后,据 GA(genetic algorithm)的自适应交叉、变异概率给子群体分类,实现多种群并行搜索和信息交换,以避免早熟,加速收敛;并将 SA 作为 GA 算子,加强局部寻优能力;最后,用双序 Viterbi 解码,与现有 HMM优化法相比,实验的综合 Fβ=1平均提高了6%,表明改进算法能有效提高抽取准确率和寻优性能。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号