首页> 外文会议>International Scientific-Technical Conference on Actual Problems of Electronics Instrument Engineering >Criteria and Algorithm for the Russian Language Text Recognition Based on the Frequency Characteristics Set
【24h】

Criteria and Algorithm for the Russian Language Text Recognition Based on the Frequency Characteristics Set

机译:基于频率特征集的俄语文本识别标准和算法

获取原文

摘要

Text language recognition is necessary for formal analysis of texts. The paper describes the algorithm based on frequency characteristics to identify Russian language texts varied in size. Required to solve the problem, unigram and big ram characteristics (the number of characters, big rams and digrams being used, frequency of each character) and based on them criteria (the coincidence index, the conj unction index, and comprehensiveness of characters and bigrams being used in the text) are determined. Critical values and application ways for the criteria are determined. The algorithm accuracy is obtained experimentally for different samples. Compared to machine learning methods (such as neural networks or naive Bayes classifiers), the algorithm does not require additional information and training samples, simple in implementation and rather efficient in computing.
机译:文本语言识别对于文本的形式分析是必需的。本文介绍了基于频率特征的算法,以识别大小不同的俄语文本。解决此问题所需的字符,单字母和大字母的特征(使用的字符数,大字母和字母的数目,每个字符的出现频率)以及基于它们的标准(重合指数,连接指数以及字符和双字母的综合性)确定在文本中使用)。确定该标准的临界值和应用方式。通过实验获得不同样本的算法精度。与机器学习方法(例如神经网络或朴素贝叶斯分类器)相比,该算法不需要额外的信息和训练样本,实现起来很简单,并且计算效率很高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号