藏文分词是藏文信息处理的一个基本步骤,该文描述了我们将一个基于HMM的汉语分词系统Segtag移植到藏文的过程,取得了91%的准确率.又在错误分析的基础上,进行了训练词性的取舍、人名识别等处理,进一步提高了准确率.%This paper describes the porting of a Chinese segmentation system to handle Tibetan. The F-measure of the new Yangjin system is above 91% over a test corpus although the training corpus is relatively small . It also describes more processing upon error analysis which led to further improvement.
展开▼