首页> 中文期刊> 《计算机应用与软件》 >基于查询接口文本VSM的Deep Web数据源分类

基于查询接口文本VSM的Deep Web数据源分类

         

摘要

With the rapid development of Internet technology,a large number of Web databases have mushroomed and the number remains in a fast-growing trend.In order to effectively organise and utilise the information which hides deeply in Web databases,it is necessary to classify and integrate them according to domains.Since the query interface of Webpage is the unique channel to access the Web database,the classification of Deep Web data source can be realised by classifying the query interfaces.In this paper,a classification method based on text VSM of query interface is proposed.The basic idea is to build a vector space model (VSM) by using query interface text information firstly.Then the typical data mining classification algorithm is employed to train one or more classifiers,thus to classify the domains the query interfaces belonging to is implemented.Experimental result shows that the approach proposed in the paper has excellent classification performance.%随着Intemet技术的快速发展,Web数据库数目庞大而且仍在快速增长.为有效组织利用深藏于Web数据库上的信息,需对其按领域进行分类和集成.Web页面上的查询接口是网络用户访问Web数据库的唯一途径,对Deep Web数据源分类可通过对查询接口分类实现.为此,提出一种基于查询接口文本VSM(Vector Space Model)的分类方法.首先,使用查询接口文本信息构建向量空间模型,然后通过典型的数据挖掘分类算法训练分类器,从而实现对查询接口所属领域进行分类.实验结果表明给出的方法具有良好的分类性能.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号