首页> 外文会议>International conference on advanced data mining and applications >Extracting Novel Features for E-Commerce Page Quality Classification
【24h】

Extracting Novel Features for E-Commerce Page Quality Classification

机译:提取用于电子商务页面质量分类的新功能

获取原文

摘要

There're a huge amount of web pages describing the same product on e-commerce websites, while their quality varies greatly. Therefore, there is a growing need for automated, accurate and efficient quality classification methods. Several link-based, click-based and content-based approaches have been proposed to evaluate the quality of pages for general search engines. However, these methods only consider the surface features of the html documents. What's more, features like link relations have drawbacks when dealing with e-commerce pages, because the hypothesis that links mean endorsements is not always right in the environment of e-commerce. In this paper, we propose two kinds of features that can directly indicate the quality of content. We analyze pages' content structure with a corpus of labeled texts, and evaluate the property completeness with the help of ontology. Then we combine these features with other commonly used features in literature. We apply several learning methods to train and classify pages into good and bad ones. Experiments on real e-commerce pages show that the proposed novel features can greatly improve the accuracy of classification.
机译:电子商务网站上有大量描述同一产品的网页,但它们的质量差异很大。因此,对自动化,准确和有效的质量分类方法的需求日益增长。已经提出了几种基于链接,基于点击和基于内容的方法来评估通用搜索引擎的页面质量。但是,这些方法仅考虑html文档的表面特征。此外,链接关系之类的功能在处理电子商务页面时也有缺点,因为链接的假设意味着认可在电子商务环境中并不总是正确的。在本文中,我们提出了两种可以直接指示内容质量的功能。我们使用标注文本的语料库分析页面的内容结构,并借助本体评估属性的完整性。然后,我们将这些功能与文献中其他常用功能结合起来。我们采用了几种学习方法来训练页面并将其分类为好和坏。在真实的电子商务页面上进行的实验表明,所提出的新颖功能可以大大提高分类的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号