针对农业科技信息共享率低、分散分布等问题,应用Web信息抽取方法实现自动采集多源农业科技信息入库,采用XML文件实现失败重试机制.通过对日志文件进行处理,采用改进的k-means聚类方法建立用户访问模式,并得到访问模式的网页特征词及权重的集合,构建用户兴趣模型库,为来访会话推送网页.在实际应用中,定时更新用户模型库,从而保证了站点内容的及时性、推送服务的可靠性、可用性.%To solve the problems of low sharing rate and scattered distribution the agricultural science and technology information has,we use web information extraction method to realise the automatic acquisition and warehousing of multi-source agricultural information,and employ XML file to achieve failures retry mechanism.By processing web log files and using improved k-means clustering method to establish user accessing pattern,we obtain the set of webpages feature words and its weight of the accessing pattern and construct the library of user interest models to push the webpage for visiting session.In practical application,the user models library is updated timely so as to guarantee the timeliness of web contents and the reliability and availability of push service.
展开▼