首页> 外国专利> METHOD AND DEVICE FOR WEBPAGE TEXT CLASSIFICATION, METHOD AND DEVICE FOR WEBPAGE TEXT RECOGNITION

METHOD AND DEVICE FOR WEBPAGE TEXT CLASSIFICATION, METHOD AND DEVICE FOR WEBPAGE TEXT RECOGNITION

机译:用于网页文本分类的方法和设备,用于网页文本识别的方法和设备

摘要

A method and device for webpage text classification, and a method and device for webpage text recognition. The method for webpage text classification comprises: collecting text data from a webpage (101); segmenting the text data to obtain basic text segments(102); calculating a first attribute value and a second attribute value of each of the basic text segments (103); calculating a characteristic value of each of the basic text segments according to the first attribute value and the second attribute value (104); screening and selecting characteristic text segments from the basic text segments according to the characteristic value (105); calculating a weight corresponding to each of the characteristic text segments (106); treating the weight as a characteristic vector corresponding to the characteristic text segments, and utilizing the characteristic vector to train a classification model (107). The method and device of the present invention effectively ensure objectivity and accuracy in extracting a characteristic, and also take into account the influence of a characteristic on classification, thereby increasing the accuracy of webpage text classification, and further facilitating a user to accurately and timely obtain effective information from a massive amount of text.
机译:用于网页文本分类的方法和设备以及用于网页文本识别的方法和设备。用于网页文本分类的方法包括:从网页收集文本数据(101);分割文本数据以获得基本文本段(102);计算每个基本文本段的第一属性值和第二属性值(103);根据第一属性值和第二属性值,计算每个基本文本段的特征值(104);根据特征值从基本文本段中筛选并选择特征文本段(105);计算与每个特征文本段相对应的权重(106);将权重作为对应于特征文本段的特征向量,并利用特征向量训练分类模型(107)。本发明的方法和装置,有效地保证了特征提取的客观性和准确性,并考虑了特征对分类的影响,从而提高了网页文本分类的准确性,进一步方便了用户准确及时地获取。来自大量文本的有效信息。

著录项

  • 公开/公告号WO2017167067A1

    专利类型

  • 公开/公告日2017-10-05

    原文格式PDF

  • 申请/专利权人 ALIBABA GROUP HOLDING LIMITED;DUAN BINGNAN;

    申请/专利号WO2017CN77489

  • 发明设计人 DUAN BINGNAN;

    申请日2017-03-21

  • 分类号G06F17/27;

  • 国家 WO

  • 入库时间 2022-08-21 13:29:30

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号