【24h】

Glyph-aware Embedding of Chinese Characters

机译:字形意识的汉字嵌入

获取原文
获取原文并翻译 | 示例

摘要

Given the advantage and recent success of English character-level and subword-unit models in several NLP tasks, we consider the equivalent modeling problem for Chinese. Chinese script is logographic and many Chinese logograms are composed of common substructures that provide semantic, phonetic and syntactic hints. In this work, we propose to explicitly incorporate the visual appearance of a character's glyph in its representation, resulting in a novel glyph-aware embedding of Chinese characters. Being inspired by the success of convolutional neural networks in computer vision, we use them to incorporate the spatio-structural patterns of Chinese glyphs as rendered in raw pixels. In the context of two basic Chinese NLP tasks of language modeling and word segmentation, the model learns to represent each character's task-relevant semantic and syntactic information in the character-level embedding.
机译:鉴于英语字符级和子词单元模型在多个NLP任务中的优势和最近的成功,我们考虑了中文的等效建模问题。中文脚本是逻辑记录的,许多中文徽标由常见的子结构组成,这些子结构提供了语义,语音和句法提示。在这项工作中,我们建议将字符字形的视觉外观明确地并入其表示中,从而产生新颖的汉字字形感知嵌入。受卷积神经网络在计算机视觉中成功的启发,我们使用它们来结合以原始像素渲染的中国字形的时空结构模式。在两个基本的中文NLP语言建模和分词任务中,该模型学会了在字符级嵌入中表示每个字符与任务相关的语义和句法信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号