虚词作为藏文文献中重要成分,对文献识别过程也造成了很大的难度.本文通过传统藏文文法和语法规则,主要研究并提出了三种藏文历史文献中大量藏文自由虚词的识别算法,同时建立了具有284条规则的藏文自由虚词消歧规则库.使文献数字化过程中快速地识别并消除藏文句子中不自由虚词的歧义问题,提高藏文文献自动识别的准确率.%Functional words, as an important component of Tibetan literature, has caused great difficulties in the process of document recognition. Based on the traditional Tibetan grammar and grammar rules, this paper mainly studies and puts forward three kinds of recognition algorithms for a large number of Tibetan free function words in Tibetan historical documents, and establishes a rule base of 284 rules for Tibetan free function words disambiguation. In the process of digitalization, the ambiguity of unfree function words in Tibetan sentences can be quickly identified and eliminated, and the accuracy of automatic identification of Tibetan documents can be improved.
展开▼