首页> 中文期刊> 《计算机应用研究》 >基于主题词表和 FCA的网页语义概念树构建研究

基于主题词表和 FCA的网页语义概念树构建研究

         

摘要

In order to guide users to use well and improving websites’quality and construcing the Web semantic model,this pa-per presented a new approach and framework of learning from Web pages,and used formal concept analysis (FCA)to build the semantic concept tree.Firstly,it used information extraction and natural language processing tools to extract and segment texts, and then identified feature words by statistical methods.Secondly,it transformed feature words into thesaurus terms by using search-engine-based similarity calculation.Thirdly,it formed a formal context,and reduced the context by using rules,clustering and other techniques.Finally,it constructed concept lattice by using some algorithm to get hierarchy,which then transformed into the concept tree.Experimental results show that the concept tree can be used as the basis of Web ontology model,and have a pro-found signification for semantic assessment.The proposed algorithm has a certain value and referenced significance.%针对用户使用网站效率低和网站质量差的问题,提出了利用形式概念分析(FCA)来构建网页语义概念树的方法。该方法首先利用信息抽取、自然语言处理等技术对网页集进行文本抽取、分词,提取出描述文本语义的特征词;再以主题词表为参照,设计基于搜索引擎的词语相似度算法,将抽取的特征词全部转换成主题词表中主题词,对将抽取的语义信息转换成形式背景,利用规则、聚类等技术对形式背景进行约简。最后通过设计的建格算法构建概念格,实现概念树构建。实验结果表明,利用该方法构建的概念树可以作为网站本体模型的基础,对语义评估具有积极的意义,具有一定的应用价值和借鉴意义。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号