...
首页> 外文期刊>International journal on Semantic Web and information systems >Efficient Weighted Semantic Score Based on the Huffman Coding Algorithm and Knowledge Bases for Word Sequences Embedding
【24h】

Efficient Weighted Semantic Score Based on the Huffman Coding Algorithm and Knowledge Bases for Word Sequences Embedding

机译:基于Huffman编码算法和知识库的高效加权语义分数嵌入Word序列的Word序列

获取原文
获取原文并翻译 | 示例
           

摘要

Learning text representation is forming a core for numerous natural language processing applications. Word embedding is a type of text representation that allows words with similar meaning to have similar representation. Word embedding techniques categorize semantic similarities between linguistic items based on their distributional properties in large samples of text data. Although these techniques are very efficient, handling semantic and pragmatics ambiguity with high accuracy is still a challenging research task. In this article, we propose a new feature as a semantic score which handles ambiguities between words. We use external knowledge bases and the Huffman Coding algorithm to compute this score that depicts the semantic relatedness between all fragments composing a given text. We combine this feature with word embedding methods to improve text representation. We evaluate our method on a hashtag recommendation system in Twitter where text is noisy and short. The experimental results demonstrate that, compared with state-of-the-art algorithms, our method achieves good results.
机译:学习文本表示正在为许多自然语言处理应用形成核心。单词嵌入是一种文本表示,允许具有类似含义的单词来具有相似的表示。 Word嵌入技术基于在大型文本数据的大型样本中的分布属性基于它们的语言项之间分类语义相似性。虽然这些技术非常有效,但以高精度处理语义和语用歧义仍然是一个具有挑战性的研究任务。在本文中,我们提出了一个新功能,作为一个语义分数,它在单词之间处理歧义。我们使用外部知识库和霍夫曼编码算法来计算这个分数,描绘了构成给定文本的所有片段之间的语义相关性。我们将此功能与单词嵌入方法组合以改进文本表示。我们在Twitter中评估我们的方法在Twitter中,文本嘈杂和短。实验结果表明,与最先进的算法相比,我们的方法达到了良好的效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号