【24h】

Effects of Positivization on the Paragraph Vector Model

机译:正化对段落向量模型的影响

获取原文
获取原文并翻译 | 示例

摘要

Natural language processing (NLP) is an important field of Artificial Intelligence. One of the fundamental problems in NLP is to create vector (distributed) representations of words so that vectors of words that have similar meaning lie closer in space. One of the most popular algorithms for creating these representations are word embedding models such as word2vec and fastText. Similarly the paragraph vector model (doc2vec) is used to create distributed representations of documents while simultaneously creating distributed representations for the words in these documents. These models create a dense, and low dimensional (usually in the low hundreds) vector representations which may include negative values. In this study we focus on these negative values and introduce a family of regularization methods in which document, word and/or context vectors of the paragraph vector model are forced to have only positive components. We measure its effects on several tasks; text classification, semantic similarity, and analogy tasks. Although positivization greatly increases the sparsity of the word embeddings, and should be expected to result in a loss of information, our results show that there is almost no reduction in the performance of the regularized embeddings in these tasks. We also observe an increase in the classification accuracy in one case. We foresee that these approaches can be beneficial in machine learning systems which require non-negative vectors.
机译:自然语言处理(NLP)是人工智能的重要领域。 NLP中的基本问题之一是创建单词的矢量(分布式)表示形式,以使具有相似含义的单词的矢量在空间上更靠近。用于创建这些表示形式的最流行算法之一是词嵌入模型,例如word2vec和fastText。同样,段落矢量模型(doc2vec)用于创建文档的分布式表示,同时为这些文档中的单词创建分布式表示。这些模型创建了一个密集的,低维的(通常为几百个)矢量表示,其中可能包含负值。在这项研究中,我们集中于这些负值,并介绍了一系列正则化方法,在这些方法中,段落矢量模型的文档,单词和/或上下文矢量仅被强制具有正成分。我们测量它对几个任务的影响;文本分类,语义相似性和类比任务。尽管实证化大大增加了词嵌入的稀疏性,应该预期会导致信息丢失,但我们的结果表明,在这些任务中,正规化嵌入的性能几乎没有降低。我们还观察到在一种情况下分类准确性的提高。我们预见到,这些方法在需要非负向量的机器学习系统中可能是有益的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号