Chinese Unknown Word Identification Based on Local Bigram Model

ZHUORAN WANG; TING LIU

首页> 外文期刊>International Journal of Computer Processing of Oriental Languages >Chinese Unknown Word Identification Based on Local Bigram Model

【24h】

Chinese Unknown Word Identification Based on Local Bigram Model

机译：基于局部Bigram模型的中文未知词识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a Chinese unknown word identification system based on a local bigram model. Generally, our word segmentation system employs a statistical-based unigram model. But to identify those unknown words, we take advantage of their contextual information and apply a bigram model locally. By adjusting the value of interpolation which is derived from a smoothing method, we combine these two models with different dimensions. As a simplification of bigram, this method is simple as well as feasible, since the complexity of its algorithm is quite low and not so many training corpora are needed. The results of our experiments show the solution is effective.

机译：本文提出了一种基于局部二元模型的中文未知词识别系统。通常，我们的分词系统采用基于统计的单字模型。但是要识别这些未知单词，我们利用它们的上下文信息并在本地应用bigram模型。通过调整从平滑方法得出的插值值，我们将这两个具有不同尺寸的模型组合在一起。作为bigram的简化，该方法既简单又可行，因为其算法的复杂性非常低，并且不需要太多的训练语料库。我们的实验结果表明该解决方案是有效的。

著录项

来源
《International Journal of Computer Processing of Oriental Languages》 |2005年第3期|p.185-196|共12页
作者
ZHUORAN WANG; TING LIU;
展开▼
作者单位

Information Retrieval Laboratory, School of Computer Science and Technology, Harbin Institute of Technology, P. O. Box 321, HIT, Harbin, P.R. China, 150001;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
unknown word identification; Chinese word segmentation; local bigram model;

机译：未知词识别汉语分词局部二元模型;

相似文献

外文文献
中文文献
专利

1. Chinese Dialect Identification Based on Genetic Algorithm for Discriminative Training of Bigram Model [J] . Wuei-He Tsai, Wen-Whei Chang IEICE Transactions on Information and Systems . 2000,第12期

机译：基于遗传算法的汉语方言识别对Bigram模型的判别训练
2. A NOVEL SPACE-COMPRESSED CHINESE WORD BIGRAM BASED ON BI-CHARACTER CO-ARTICULATION FREQUENCY [J] . Zhao Yibao, Qiao Liyan, Tan Jianxun Journal of Electronics (CHINA) . 2000,第2期

机译：基于双字符协同发音频率的新型空域压缩中文单词图
3. Discriminative training of Gaussian mixture bigram models with application to Chinese dialect identification [J] . Wuei-He Tsai, Wen-Whei Chang Speech Communication . 2002,第3a4期

机译：高斯混合二元模型的判别训练及其在汉语方言识别中的应用
4. Chinese unknown word identification as known word tagging [C] . Guo-Hong Fu, Kang-Kwong Luke . 2004

机译：中文未知单词识别为已知单词标记
5. Hybrid models for Chinese unknown word resolution. [D] . Lu, Xiaofei. 2006

机译：中文未知单词解析的混合模型。
6. Identification of the high-risk area for schistosomiasis transmission in China based on information value and machine learning: a newly data-driven modeling attempt [O] . Yan-Feng Gong, Ling-Qian Zhu, Yin-Long Li, 2021

机译：基于信息价值和机器学习的中国血吸虫病传播高风险区域的鉴定：新数据驱动的建模尝试
7. The Identification and Classification of Unknown Words in Chinese : A N-Grams-Based Approach [O] . Wang Mei-Chu, Huang Chu-Ren, Chen Keh-jiann 1995

机译：基于N语法的汉语未知词的识别与分类

Chinese Unknown Word Identification Based on Local Bigram Model

摘要

著录项

相似文献

相关主题

期刊订阅