...
首页> 外文期刊>Information retrieval >Automatic Alphabet Recognition
【24h】

Automatic Alphabet Recognition

机译:自动字母识别

获取原文
获取原文并翻译 | 示例
           

摘要

The last step of the Information Retrieval process is to display the found documents to the user. However, some difficulties might occur at that point. English texts are usually written in the ASCII standard. Unlike the English language, many languages have different character sets, and do not have one standard. This plurality of standards causes problems, especially in a web environment, where one may download a document with an unknown standard. This paper suggests a purely automatic way of finding the standard which was used by the document writer based on the statistical letters distribution in the language. We developed a vector-space-based method that creates frequencies vectors for each letter of the language and then matches a new document's vectors to the pre-computed templates. The algorithm was applied on various types of corpora in Hebrew, Russian and English, and provides an efficient solution to the stated problem in most cases.
机译:信息检索过程的最后一步是向用户显示找到的文档。但是,此时可能会遇到一些困难。英文文本通常以ASCII标准编写。与英语不同,许多语言具有不同的字符集,并且没有一种标准。多种标准导致了问题,尤其是在Web环境中,在该环境中,人们可能会下载具有未知标准的文档。本文提出了一种基于语言中统计字母分布的完全自动的查找文档编写者使用的标准的方法。我们开发了一种基于向量空间的方法,该方法为语言的每个字母创建频率向量,然后将新文档的向量与预先计算的模板匹配。该算法已应用于希伯来语,俄语和英语的各种类型的语料库,并在大多数情况下提供了一种解决上述问题的有效方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号