首页> 外文期刊>Arabian Journal for Science and Engineering >NLP-MTFLR: Document-Level Prioritization and Identification of Dominant Multi-word Named Products in Customer Reviews
【24h】

NLP-MTFLR: Document-Level Prioritization and Identification of Dominant Multi-word Named Products in Customer Reviews

机译:NLP-MTFLR:客户评论中文档级优先级和主要多词命名产品的标识

获取原文
获取原文并翻译 | 示例
           

摘要

The accessibility to large amount of datasets in commercial domains has accentuated the importance of data mining in the last few years. Practitioners as well as researchers rely on them to reflect on the magnitude and effect of data-related problems that require solution in business environments. In recent years, the volume of online data submissions (e-commerce data) on products, services and organizations has increased exponentially. However, the submitted data are highly unstructured and largely dependent on language. Mining and extracting useful information from such data is a colossal task, as analysis of the data should include opinion word identification/extraction, aspect extraction and entity extraction. Of the three, the entity extraction is one of the governing approaches in text analysis and plays a major role in e-commerce, biomedical and automobile industries and supports the categorization of the records based on the entity names, generation of short summary on the entities and grouping of the similar records. The existing approaches in entity extraction are capable of recognizing and extracting single-word named entities. However, the product names are often given as a sequence of words (multiple words or multi-word named entities) and, therefore, cannot be recognized by the existing methods. To resolve this issue, this paper presents a novel approach of NLP-Modified Token-based Frequencies of Left and Right (NLP-MTFLR), which is considered as an effective approach to detect and extract the multi-word named products and dominant multi-word named product from the customer review corpus. Using this NLP-MTFLR approach, from the review corpus the subwords and multi-subwords are identified and mapped them with its multi-word named products to recognize dominant product of that corpus. With this dominant product identification, the proposed method reveals in that corpus that the identified dominant product is highly reviewed by the reviewers compared to other products. This NLP-MTFLR approach is achieved 97% accuracy, 77% precision, 89% recall and 82% F-score.
机译:在最近几年中,对商业领域中大量数据集的可访问性突显了数据挖掘的重要性。从业者和研究人员都依靠他们来思考需要在业务环境中解决的数据相关问题的严重性和影响。近年来,关于产品,服务和组织的在线数据提交(电子商务数据)的数量呈指数增长。但是,提交的数据高度非结构化,并且很大程度上取决于语言。从此类数据中提取和提取有用信息是一项艰巨的任务,因为对数据的分析应包括意见词识别/提取,方面提取和实体提取。在这三种方法中,实体提取是文本分析中的主要方法之一,并且在电子商务,生物医学和汽车行业中起着重要作用,并且支持基于实体名称的记录分类,生成实体的简短摘要。和类似记录的分组。实体提取中的现有方法能够识别和提取单个单词命名的实体。但是,产品名称通常以单词序列(多个单词或多个单词命名的实体)的形式给出,因此,现有方法无法识别。为解决此问题,本文提出了一种基于NLP修改的基于令牌的左右频率(NLP-MTFLR)的新方法,该方法被认为是检测和提取多词命名产品和占主导地位的多词的有效方法。客户评论语料库中名为产品的词。使用这种NLP-MTFLR方法,从复习语料库中识别出子词和多子词,并将其与多词命名产品进行映射,以识别该语料库的主导产品。通过这种主导产品识别,所提出的方法在该语料库中表明,与其他产品相比,审阅者对识别出的主导产品进行了高度审查。这种NLP-MTFLR方法可实现97%的精度,77%的精度,89%的召回率和82%的F分数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号