基于迭代算法的新词识别

赵小宝; 张华平

首页> 中文期刊> 《计算机工程》 >基于迭代算法的新词识别

基于迭代算法的新词识别

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

新词识别是中文信息处理的重要基础，但中文字符极强的构词能力给新词检测带来较大困难。受对偶原理的启发，提出一种基于迭代算法的新词识别算法。对目标语料进行分词和词性标注，通过两遍扫描进行字符串统计并提取重复模式。结合词语结构的特征，迭代使用重复模式互信息、左(右)熵，左(右)邻右(左)平均熵等特征进行新词识别，获得候选新词列表。利用中文词语搭配库对候选新词列表进行最后一次过滤得到最终新词列表。实验结果表明，利用该方法进行新词识别，P@10值达到100%，P@100值提高至90%，左(右)邻右(左)平均熵可在一定程度上提高新词识别的准确率。%New words identification is an important foundation for Chinese information processing. However, the energetic word building ability of Chinese makes it difficult to automatically identify new words. Inspired by the duality principle, a new word identification algorithm based on iterative algorithm is proposed. The target corpus is analyzed for segmentation and part-of-speech tagging. The repetitive patterns are extracted after statistic of string frequency through scanning twice. Combining with word structure's characteristics, the candidate list of new words is obtained through iteratively using characteristics of repetitive patterns such as Mutual Information(MI), the left(right) entropy, the right(left) average entropy of the left(right) neighbor. The final list of new words is obtained by filtering the candidate list with the help of the library of Chinese words collocation. With this method for identification of new words, results show that the value of P@10 reaches 100%, and that of P@100 increases to 90%, the use of the right(left) average entropy of the left(right) neighbor can raise the accuracy of new words identification.

著录项

来源
《计算机工程》 |2014年第7期|162-167|共6页
作者
赵小宝; 张华平;
展开▼
作者单位

北京理工大学计算机学院;

北京 100081;

北京理工大学计算机学院;

北京 100081;

展开▼
原文格式 PDF
正文语种 chi
中图分类人工智能理论;
关键词
对偶原理; 新词识别; 迭代算法; 信息熵; 重复模式; 中文词语搭配库;

相似文献

中文文献
外文文献
专利

1. 一种基于主动学习的中文新词识别算法 [J] . 王博 ,代翔 ,时聪 . 电讯技术 . 2020,第011期
2. 基于规则和N-Gram算法的新词识别研究 [J] . JIANG Ruxia ,HUANG Shuiyuan ,DUAN Longzhen . 现代电子技术 . 2019,第004期
3. 基于标签传播算法的新词情感极性识别 [J] . 洪旭东 ,余正涛 ,严馨 . 计算机科学与探索 . 2015,第012期
4. 基于上下文感知的中文新词识别算法 [J] . 李钝 ,屠卫 ,石磊 . 计算机工程与设计 . 2012,第010期
5. 一种基于免疫遗传算法的网络新词识别方法 [J] . 丁建立 ,慈祥 ,黄剑雄 . 计算机科学 . 2011,第001期
6. 锅炉结渣的非线性迭代PLS模式识别算法 [C] . 徐志明 ,郑娇丽 ,文孝强 . 中国工程热物理学会2010年传热传质学学术会议 . 2010
7. 基于傅里叶变换的快速迭代收缩阈值反卷积声源识别算法研究 [A] . 陈才慧 . 2018

基于迭代算法的新词识别

摘要

著录项

相似文献

相关主题

期刊订阅