首页> 外文会议>European Conference on Speech Communication and Technology >Say-as Classification for Alphabetic Words in Japanese Texts
【24h】

Say-as Classification for Alphabetic Words in Japanese Texts

机译:说 - 作为日语文本中的字母单词的分类

获取原文

摘要

Modern Japanese texts often include Western sourced words written in Roman alphabet. For example, a shopping directory in a web portal, which lists more than 8,000 shops, includes a total of 6,400 alphabetic words. As most of them are very new and idiosyncratic proper nouns, it is impractical to assume all those alphabetic words can be registered in the word dictionary of a text-to-speech synthesis system; their pronunciations must be derived automatically. Our solution consists of two steps. Step 1 classifies each unknown alphabetic word into a say-as class (English, Japanese, French, Italian or English spell-out), which indicates how it is to be read, and Step 2 derives the pronunciation using the grapheme-to-phoneme conversion rules for the classified say-as class. This paper proposes a method of say-as classification (i.e. Step 1) that uses the Support Vector Machine. After some trial and error, we achieved 89.2% accuracy for web shop data, which we think sufficient for practical use.
机译:现代日文文本通常包括以罗马字母表编写的西部源词。例如,Web门户中的购物目录列出了超过8,000个商店,包括总共6,400个字母单词。由于大多数是非常新的和特殊的专有名词,假设所有这些字母单词都可以在文本到语音合成系统的单词词典中注册是不切实际的;必须自动派生他们的发音。我们的解决方案包括两个步骤。步骤1将每个未知的字母单词分类为单位(英文,日语,法语,意大利语或英语拼写),这表示如何读取,步骤2使用GraphEme-to-phoneme源发音分类说的转换规则作为类。本文提出了一种用于使用支持向量机的分类(即步骤1)的方法。在某些试验和错误之后,我们为Web Shop数据进行了89.2%的准确性,我们认为足以进行实际使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号