首页> 外文OA文献 >Arabic Language WEKA-Based Dialect Classifier for Arabic Automatic Speech Recognition Transcripts
【2h】

Arabic Language WEKA-Based Dialect Classifier for Arabic Automatic Speech Recognition Transcripts

机译:阿拉伯语基于WEKa的阿拉伯语自动语音识别成语的方言分类器

摘要

This paper describes an Arabic dialect identification system which we developed for the Discriminating Similar Languages (DSL) 2016 shared task. We classified Arabic dialects by using Waikato Environment for Knowledge Analysis (WEKA) data analytic tool which contains many alternative filters and classifiers for machine learning. We experimented with several classifiers and the best accuracy was achieved using the Sequential Minimal Optimization (SMO) algorithm for training and testing process set to three different feature-sets for each testing process. Our approach achieved an accuracy equal to 42.85% which is considerably worse in comparison to the evaluation scores on the training set of 80-90% and with training set 60:40 percentage split which achieved accuracy around 50%. We observed that Buckwalter transcripts from the Saarland Automatic Speech Recognition (ASR) system are given without short vowels, though the Buckwalter system has notation for these. We elaborate such observations, describe our methods and analyse the training dataset.
机译:本文介绍了我们为区分相似语言(DSL)2016共享任务而开发的阿拉伯语方言识别系统。我们使用怀卡托知识分析环境(WEKA)数据分析工具对阿拉伯语进行了分类,该工具包含许多用于机器学习的替代过滤器和分类器。我们对多个分类器进行了实验,使用序列最小优化(SMO)算法将训练集和测试过程集设置为每个测试过程三个不同的功能集,从而获得了最高的准确性。我们的方法达到的准确度等于42.85%,与训练集上80-90%的评估得分相比,以及训练集以60:40的百分比拆分(达到50%左右的准确度)时,评估结果差很多。我们观察到,萨尔兰自动语音识别(ASR)系统提供的Buckwalter成绩单没有短元音,尽管Buckwalter系统对此有注释。我们将详细阐述这些观察结果,描述我们的方法并分析训练数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号