首页>
外国专利>
METHOD AND DEVICE FOR WEBPAGE TEXT CLASSIFICATION, METHOD AND DEVICE FOR WEBPAGE TEXT RECOGNITION
METHOD AND DEVICE FOR WEBPAGE TEXT CLASSIFICATION, METHOD AND DEVICE FOR WEBPAGE TEXT RECOGNITION
展开▼
机译:用于网页文本分类的方法和设备,用于网页文本识别的方法和设备
展开▼
页面导航
摘要
著录项
相似文献
摘要
A method and device for webpage text classification, and a method and device for webpage text recognition. The method for webpage text classification comprises: collecting text data from a webpage (101); segmenting the text data to obtain basic text segments(102); calculating a first attribute value and a second attribute value of each of the basic text segments (103); calculating a characteristic value of each of the basic text segments according to the first attribute value and the second attribute value (104); screening and selecting characteristic text segments from the basic text segments according to the characteristic value (105); calculating a weight corresponding to each of the characteristic text segments (106); treating the weight as a characteristic vector corresponding to the characteristic text segments, and utilizing the characteristic vector to train a classification model (107). The method and device of the present invention effectively ensure objectivity and accuracy in extracting a characteristic, and also take into account the influence of a characteristic on classification, thereby increasing the accuracy of webpage text classification, and further facilitating a user to accurately and timely obtain effective information from a massive amount of text.
展开▼