首页> 外文会议>Chinese Lexical Semantics Workshop >A Classification Method for Chinese Word Semantic Relations Based on TF-IDF and CNN
【24h】

A Classification Method for Chinese Word Semantic Relations Based on TF-IDF and CNN

机译:基于TF-IDF和CNN的中文语义关系的分类方法

获取原文

摘要

The classification of semantic relations between words is an important part of semantic analysis in natural language research. The automatic achievement of this classification is of significance to construction of the Knowledge Graph and Information Retrieval. In NLPCC2017 shared task on Chinese Word Semantic Relations Classification, the semantic relations have been classified into four categories: synonym, antonym, hyponymy and meronym. This paper presents a classification method for Chinese word semantic relations based on TF-IDF and CNN, and uses words' literal and semantic features. Four new literal features are proposed including whether a word is part of another word and the ratio of their common substring. The extraction of semantic features is a four-step process - training a vector model of words on BaiduBaike Corpus, selecting a set of words most related to a given word from BaiduBaike based on TF-IDF, constructing a vector matrix for the set of related words, and using CNN to get the semantic features of the given word from the vector matrix. The experiment on the NLPCC2017 dataset demonstrates that the F_1-score is up to 83.91%, which proves effective to eliminate the influence of the OOV words.
机译:单词之间语义关系的分类是自然语言研究中语义分析的重要组成部分。这种分类的自动实现对知识图和信息检索的构建具有重要意义。在NLPCC2017共享任务上汉语语义关系分类,语义关系已被分为四类:同义词,反义词,次喻和同性义词。本文介绍了基于TF-IDF和CNN的中文语义关系的分类方法,并使用单词的文字和语义特征。提出了四个新的文字特征,包括单词是否是另一个单词的一部分和它们的常见子字符串的比率。语义特征的提取是一项四步过程 - 培训在Baidubaike语料库上训练一个单词的传染媒介模型,基于TF-IDF选择与Baidubaike的给定单词最多相关的单词,构建该组的矢量矩阵单词,并使用CNN从向量矩阵获取给定字的语义特征。 NLPCC2017数据集上的实验表明,F_1分数高达83.91%,这证明了消除OOV词的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号