...
首页> 外文期刊>Future generation computer systems >A novel text mining approach for scholar information extraction from web content in Chinese
【24h】

A novel text mining approach for scholar information extraction from web content in Chinese

机译:中文翻译手机版从Web内容中的学者信息提取的新文本挖掘方法

获取原文
获取原文并翻译 | 示例
           

摘要

Text mining is the process of deriving high-quality information from text so that it can focus on extracting useful information from text or web documents. IoT devices generate massive structured or unstructured data including text data. The opportunity coming behind big data and unstructured data is a great impulse for governments or companies to choose solutions based on text mining approaches to improve strategic business activities and boost decision making. Expert information is an important reference information for decision making. How to collect the expert information from text or web documents is a problem. In this paper, a text mining approach is introduced to crawl and extract expert information from Internet. We build a basic framework and main modules including information extraction, data cleaning and deduplication, expert recommendation model to cope with text data from Web content. We also define several metrics, data structures and propose some algorithms to help text mining. Finally, the experiment is implemented with datasets and the results show that our text mining approach can extract expert attributes accurately.
机译:文本挖掘是从文本中导出高质量信息的过程,以便它可以专注于从文本或Web文档中提取有用信息。 IoT设备生成大量结构化或非结构化数据,包括文本数据。落后于大数据和非结构化数据的机会是政府或公司基于文本挖掘方法选择解决方案的巨大冲动,以改善战略业务活动和提升决策。专家信息是决策的重要参考信息。如何从文本或Web文档中收集专家信息是一个问题。在本文中,引入了一种文本挖掘方法来爬网和从互联网中提取专家信息。我们构建一个基本框架和主模块,包括信息提取,数据清洁和重复数据删除,专家推荐模型来应对从Web内容的文本数据。我们还定义了几个指标,数据结构并提出了一些算法来帮助文本挖掘。最后,实验用数据集实现,结果表明,我们的文本挖掘方法可以准确提取专家属性。

著录项

  • 来源
    《Future generation computer systems》 |2020年第10期|859-872|共14页
  • 作者单位

    National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan 430074 China;

    National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan 430074 China;

    National Engineering Research Center for Big Data Technology and System Services Computing Technology and System Lab Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong University of Science and Technology Wuhan 430074 China;

    School of Computer Science and Technology Huazhong University of Science and Technology Wuhan 430074 China Henan University Kaifeng 475004 China;

    Mobile E-Business Collaborative Innovation Center of Hunan Province Key Lab of Hunan Province for Mobile Business Intelligence Hunan University of Commerce ChangSha 410205 China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Big data; Text mining; Expert database; Information extraction;

    机译:大数据;文字挖掘;专家数据库;信息提取;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号