首页> 外文学位 >A soft computing based approach for multi-accent classification in IVR systems.
【24h】

A soft computing based approach for multi-accent classification in IVR systems.

机译:在IVR系统中基于软计算的重音分类方法。

获取原文
获取原文并翻译 | 示例

摘要

A speaker's accent is the most important factor affecting the performance of Natural Language Call Routing (NLCR) systems because accents vary widely, even within the same country or community. This variation also occurs when nonnative speakers start to learn a second language, the substitution of native language phonology being a common process. Such substitution leads to fuzziness between the phoneme boundaries and phoneme classes, which reduces out-of-class variations, and increases the similarities between the different sets of phonemes. Thus, this fuzziness is the main cause of reduced NLCR system performance. The main requirement for commercial enterprises using an NLCR system is to have a robust NLCR system that provides call understanding and routing to appropriate destinations. The chief motivation for this present work is to develop an NLCR system that eliminates multilayered menus and employs a sophisticated speaker accent-based automated voice response system around the clock. Currently, NLCRs are not fully equipped with accent classification capability. Our main objective is to develop both speaker-independent and speaker-dependent accent classification systems that understand a caller's query, classify the caller's accent, and route the call to the acoustic model that has been thoroughly trained on a database of speech utterances recorded by such speakers. In the field of accent classification, the dominant approaches are the Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM). Of the two, GMM is the most widely implemented for accent classification. However, GMM performance depends on the initial partitions and number of Gaussian mixtures, both of which can reduce performance if poorly chosen. To overcome these shortcomings, we propose a speaker-independent accent classification system based on a distance metric learning approach and evolution strategy. This approach depends on side information from dissimilar pairs of accent groups to transfer data points to a new feature space where the Euclidean distances between similar and dissimilar points are at their minimum and maximum, respectively. Finally, a Non-dominated Sorting Evolution Strategy (NSES)-based k-means clustering algorithm is employed on the training data set processed by the distance metric learning approach. The main objectives of the NSES-based k-means approach are to find the cluster centroids as well as the optimal number of clusters for a GMM classifier. In the case of a speaker-dependent application, a new method is proposed based on the fuzzy canonical correlation analysis to find appropriate Gaussian mixtures for a GMM-based accent classification system. In our proposed method, we implement a fuzzy clustering approach to minimize the within-group sum-of-square-error and canonical correlation analysis to maximize the correlation between the speech feature vectors and cluster centroids. We conducted a number of experiments using the TIMIT database, the speech accent archive, and the foreign accent English databases for evaluating the performance of speaker-independent and speaker-dependent applications. Assessment of the applications and analysis shows that our proposed methodologies outperform the HMM, GMM, vector quantization GMM, and radial basis neural networks.
机译:说话者的口音是影响自然语言呼叫路由(NLCR)系统性能的最重要因素,因为即使在同一国家或社区内,口音的差异也很大。当母语非母语的人开始学习第二语言时,也会发生这种变化,母语语音的替代是一个常见的过程。这样的替换导致音素边界和音素类别之间的模糊性,从而减少了类别外的变化,并增加了不同音素集之间的相似性。因此,这种模糊性是NLCR系统性能降低的主要原因。使用NLCR系统的商业企业的主要要求是拥有一个健壮的NLCR系统,该系统可以提供呼叫理解和路由到适当的目的地。这项当前工作的主要动机是开发一种NLCR系统,该系统消除了多层菜单,并采用了全天候的基于复杂说话者口音的自动语音响应系统。当前,NLCR尚未完全具备口音分类功能。我们的主要目标是开发与说话者无关的和与说话者无关的口音分类系统,以理解呼叫者的查询,对呼叫者的口音进行分类,并将呼叫路由到在此类语音记录数据库中经过全面训练的声学模型扬声器。在口音分类领域,主要方法是高斯混合模型(GMM)和隐马尔可夫模型(HMM)。在这两者中,GMM是最广泛用于口音分类的工具。但是,GMM性能取决于初始分区和高斯混合的数量,如果选择不当,两者都会降低性能。为了克服这些缺点,我们提出了一种基于距离度量学习方法和进化策略的独立于说话人的口音分类系统。这种方法依赖于来自不同重音符号对的辅助信息,将数据点转移到新的特征空间,在该特征空间中,相似点和不相似点之间的欧几里得距离分别处于最小和最大距离。最后,在距离度量学习方法处理的训练数据集上,采用了基于非支配排序演化策略(NSES)的k均值聚类算法。基于NSES的k均值方法的主要目标是为GMM分类器找到聚类质心以及最佳聚类数。在说话者相关应用的情况下,提出了一种基于模糊典型相关分析的新方法,可以为基于GMM的重音分类系统找到合适的高斯混合。在我们提出的方法中,我们实现了一种模糊聚类方法,以最小化组内平方误差之和,并进行规范相关分析,以使语音特征向量与聚类质心之间的相关性最大化。我们使用TIMIT数据库,语音口音档案库和外国口音英语数据库进行了许多实验,以评估独立于说话者和依赖说话者的应用程序的性能。对应用程序和分析的评估表明,我们提出的方法优于HMM,GMM,矢量量化GMM和径向基神经网络。

著录项

  • 作者

    Ullah, Sameeh.;

  • 作者单位

    University of Waterloo (Canada).;

  • 授予单位 University of Waterloo (Canada).;
  • 学科 Engineering Computer.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 178 p.
  • 总页数 178
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号