首页> 外文期刊>Education and information technologies >A study of readability of texts in Bangla through machine learning approaches
【24h】

A study of readability of texts in Bangla through machine learning approaches

机译:通过机器学习方法研究孟加拉语中文本的可读性

获取原文
获取原文并翻译 | 示例
           

摘要

In this work, we have investigated text readability in Bangla language. Text readability is an indicator of the suitability of a given document with respect to a target reader group. Therefore, text readability has huge impact on educational content preparation. The advances in the field of natural language processing have enabled the automatic identification of reading difficulty of texts and contributed in the design and development of suitable educational materials. In spite of the fact that, Bangla is one of the major languages in India and the official language of Bangladesh, the research of text readability in Bangla is still in its nascent stage. In this paper, we have presented computational models to determine the readability of Bangla text documents based on syntactic properties. Since Bangla is a digital resource poor language, therefore, we were required to develop a novel dataset suitable for automatic identification of text properties. Our initial experiments have shown that existing English readability metrics are inapplicable for Bangla. Accordingly, we have proceeded towards new models for analyzing text readability in Bangla. We have considered language specific syntactic features of Bangla text in this work. We have identified major structural contributors responsible for text comprehensibility and subsequently developed readability models for Bangla texts. We have used different machine-learning methods such as regression, support vector machines (SVM) and support vector regression (SVR) to achieve our aim. The performance of the individual models has been compared against one another. We have conducted detailed user survey for data preparation, identification of important structural parameters of texts and validation of our proposed models. The work posses further implications in the field of educational research and in matching text to readers.
机译:在这项工作中,我们研究了孟加拉语的文本可读性。文本的可读性是给定文档相对于目标读者组的适用性的指标。因此,文本可读性对教育内容的准备有巨大的影响。自然语言处理领域的进步使得能够自动识别文本的阅读难度,并有助于设计和开发合适的教材。尽管孟加拉语是印度的主要语言之一,也是孟加拉国的官方语言,但孟加拉语的文本可读性研究仍处于起步阶段。在本文中,我们提出了基于句法属性来确定孟加拉文本文档的可读性的计算模型。由于孟加拉语是一种数字资源贫乏的语言,因此,我们需要开发一种适用于自动识别文本属性的新颖数据集。我们的初步实验表明,现有的英语可读性指标不适用于孟加拉语。因此,我们已经着手建立用于分析孟加拉语文本可读性的新模型。我们已经在这项工作中考虑了孟加拉语文本的特定于语言的句法特征。我们已经确定了负责文本理解的主要结构性贡献者,并随后开发了孟加拉语文本的可读性模型。我们使用了不同的机器学习方法,例如回归,支持向量机(SVM)和支持向量回归(SVR)来实现我们的目标。各个模型的性能已相互比较。我们已经进行了详细的用户调查,以进行数据准备,识别文本的重要结构参数以及验证我们提出的模型。这项工作在教育研究领域以及为读者提供匹配文本方面具有进一步的意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号