电商网页中商品规格信息自动抽取方法研究

赵晓永; 王磊

首页> 中文期刊> 《计算机工程与应用》 >电商网页中商品规格信息自动抽取方法研究

电商网页中商品规格信息自动抽取方法研究

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web中数十亿的商品规格信息的自动挖掘,对电子商务领域的市场分析、商品推荐、售后服务等诸多领域有重要的应用价值.但目前的商品规格信息抽取方法尚未有效解决人工标注工作量、扩展性和准确率之间的平衡问题,提出一种商品网页规格信息自动抽取方法TSAE(Title Seed Automatic Extract),采用无监督的学习方法,以网页标题为种子,结合统计特征、自然语义和机器语义,在减少工作量、提升扩展性的同时,达到了较高的准确率.实验表明,TSAE方法在提供更好的自动化抽取效果的同时,具备良好的性能和扩展性,能够支撑海量数据处理,具有良好的实用价值.%The automatic mining of billions of product specification information in Web has important application value in many fields such as e-commerce market analysis, commodity recommendation, after-sales service and so on. But the current methods of specification extraction don't effectively solve the balance between manual annotation workload, scal-ability and accuracy. This paper proposes the Title Seed Automatic Extract(TSAE)method, using unsupervised learning method, using the page title as seed, combining with statistical characteristics, natural and machine semantics, it achieves higher accuracy while reducing the workload, enhancing the scalability. The experimental results show that the TSAE method has better automatic extraction precision while providing good performance and expansibility, can support the massive data processing, has good practical value.

著录项

来源
《计算机工程与应用》 |2017年第24期|168-171|共4页
作者
赵晓永; 王磊;
展开▼
作者单位

北京信息科技大学信息管理学院;

北京 100129;

北京信息科技大学信息管理学院;

北京 100129;

展开▼
原文格式 PDF
正文语种 chi
中图分类程序设计、软件工程;
关键词
信息抽取; 自动抽取; 商品规格信息; 电子商务;

相似文献

中文文献
外文文献
专利

1. 半结构化网页中多记录信息的自动抽取方法 [J] . 朱明 ,王庆伟 . 计算机仿真 . 2005,第012期
2. 网页中商品“属性—值”关系的自动抽取方法研究 [J] . 唐伟 ,洪宇 ,冯艳卉 . 中文信息学报 . 2013,第001期
3. 基于DOM树和视觉特征的网页信息自动抽取 [J] . 黄武冠 ,朱明 ,尹文科 . 计算机工程 . 2013,第010期
4. 基于网页聚类的Web信息自动抽取 [J] . 邱韬奋 ,杨天奇 ,曾洪波 . 微型机与应用 . 2011,第004期
5. 网页信息自动抽取技术的研究 [J] . 胡少荣 ,孟嗣仪 ,刘云 . 铁路计算机应用 . 2010,第009期
6. 基于Web的农业信息自动抽取方法研究 [C] . 王文生 ,谢能付 . 全国农业信息分析理论与方法学术研讨会 . 2009
7. 网页信息的自动抽取方法研究 [A] . 王庆伟 . 2005

电商网页中商品规格信息自动抽取方法研究

摘要

著录项

相似文献

相关主题

期刊订阅