首页> 外文会议>International conference on human-computer interaction;International conference on design, user experience and usability >Usability Evaluation of Two Chinese Segmentation Methods in Subtitles to Scaffold Chinese Novice
【24h】

Usability Evaluation of Two Chinese Segmentation Methods in Subtitles to Scaffold Chinese Novice

机译:支架中文新手字幕中两种中文分割方法的可用性评估

获取原文

摘要

Recently the number of people who learn Chinese as a Foreign Language (CFL) increased. New comers, international students, and denizened spouses all need to improve their Chinese reading fluency and listening comprehension for daily communication and work requirements. However, not everyone gets opportunity for formal education in a language school. Thus, informal learning is very important for CFL learners in Taiwan. For novice Chinese learners, they should first master a skill to grouping Chinese words into meaningful chunks, i.e. Chinese segmentation. For instance, "老師對教育的貢獻" (teachers' contribution in education). After Chinese word segmentation, the sentence becomes "老師 (teachers)/ 對 (P)/ 教育 (education)/ 的 (DE)/ 貢獻 (contribution)" from "老/師/對/教/育/的/貢/獻". Consequently, this study used two Chinese segmentation methods to highlight meaningful and important word chunks in subtitles of Chinese videos and evaluate its usability for CFL learners. The first method adopted the top 800 and 1600 high-frequency words from an analysis report based on Academia Sinica Balanced Corpus of Modern Chinese to identify proper word segmentation in video subtitles and analyze its performance based on the forward maximum matching method. The statistical results show that most Chinese subtitles still remain unsegmented (62.3%) which means the Chinese subtitles in the videos are not appropriately segmented based on the corpus that contains the top 800 high frequency words. However, with the integration of the top 1600 high frequency words in the corpus, approximately 60% of the subtitles in each video are effectively segmented, and numerous unknown words still remain. Active phrases, idioms, and short phrases in Chinese subtitles may lead to the difficulty in word segmentation; moreover, the usability testing result of using high frequency words to conduct word segmentation is not significant. The second method used natural language processing technique to split Chinese subtitles into its separate morphemes. The study adopted CKIP Chinese parser, which is a word segmentation tool for Chinese, to split subtitles according their part-of-speech tagging (i.e. grammatical tagging). The statistical results show that 97.26% subtitles are split, but the usability testing shows that subjective satisfaction is not good enough. To further investigation, we asked subjects to identify the "improper" word segmentation. For instance, the subtitle "接受治療很久了" (treated for a long time) will be split into "接受/治療/很久/了", but most novices think that the proper segmentation should be "接受/治療/很久了". The "improper" rate is about 22.30% on average. In other words, the segmentation results from Chinese parser based on natural language processing technique are not best scaffolding for Chinese novice while watching videos with Chinese subtitles. The preliminary results of usability testing show that the second method can provide effective scaffolding for novice, but the granularity of chunked words may be too fine to read fluently sometimes (i.e. less than thirty percentage in results). Consequently, adaptation mechanism is required for learners to achieve the balance point of provided scaffolding between aforementioned two methods. For example, the Chinese function words, such as 很 and T, serve only grammatical functions (i.e. they have no meaning by themselves). Those function words should not be separated out from subtitles for learning purpose. Further work is necessary to find out the proper granularity for chunking words, design adaptation mechanism of segmentation, and prevent segmentation errors in new or unknown words.
机译:最近,学习汉语作为外国语言(CFL)的人数有所增加。新来者,留学生和固定的配偶都需要提高他们的中文阅读流利度和听力理解能力,以应对日常的沟通和工作需求。但是,并不是每个人都有在语言学校接受正规教育的机会。因此,非正式学习对台湾的CFL学习者来说非常重要。对于新手汉语学习者,他们应该首先掌握一种将汉语单词分组为有意义的块的技巧,即汉语分割。例如,“老师对教育的贡献”(教师对教育的贡献)。中文分词后,句子从“老/师/对/教/育/的/贡/献“。因此,本研究使用两种中文分割方法来突出显示中文视频字幕中有意义和重要的词块,并评估其对CFL学习者的可用性。第一种方法是根据中国现代学术界平衡语料库的分析报告中的前800个和1600个高频词来识别视频字幕中的正确分词,并根据前向最大匹配法分析其性能。统计结果表明,大多数中文字幕仍未分段(62.3%),这意味着视频中的中文字幕没有根据包含前800个高频词的语料库进行适当的分段。但是,通过在语料库中整合前1600个高频词,每个视频中大约60%的字幕得到了有效的分割,并且仍然有许多未知的词。中文字幕中的活动短语,惯用语和短短语可能会导致分词困难;而且,使用高频词进行词分割的可用性测试结果并不显着。第二种方法是使用自然语言处理技术将中文字幕拆分为单独的语素。这项研究采用了CKIP中文解析器(这是一种中文分词工具),根据字幕的词性标记(即语法标记)来分割字幕。统计结果表明,对97.26%的字幕进行了拆分,但可用性测试表明,主观满意度还不够好。为了进一步调查,我们要求受试者识别“不正确”的词段。例如,副标题“接受治疗很久了”(治疗了很长时间)将被拆分为“接受/治疗/很久/了”,但是大多数新手都认为适当的分段应该是“接受/治疗/很久了” 。 “不当”率平均约为22.30%。换句话说,基于自然语言处理技术的中文解析器的分割结果对于观看带有中文字幕的视频的新手而言并不是最佳的支架。可用性测试的初步结果表明,第二种方法可以为新手提供有效的脚手架,但是分块单词的粒度有时可能太细而无法流利阅读(即,结果少于30%)。因此,学习者需要适应机制来实现上述两种方法之间所提供的脚手架的平衡点。例如,中文功能词(例如很和T)仅具有语法功能(即,它们本身没有意义)。出于学习目的,不应将这些功能词与字幕分开。有必要进行进一步的工作,以找到适合分词的粒度,设计分段的适应机制,并防止新词或未知词出现分段错误。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号