首页> 外国专利> METHOD AND APPARATUS FOR DISCOVERING NEW WORD

METHOD AND APPARATUS FOR DISCOVERING NEW WORD

机译：发现新词的方法和装置

页面导航

摘要
著录项
相似文献

摘要

The embodiments of the present invention relate to a method and apparatus for discovering a new word. The method comprises: extracting a morpheme from a target text in a target text library, constructing a morpheme set H, making statistics on an appearance frequency of the morpheme, representing the morpheme and the appearance frequency of the morpheme as a two-tuple form, and forming a two-tuple set T; calculating a context association degree d of a subset w of a morpheme t_i, and summarizing the subsets w of morphemes t_i with the d value being greater than or equal to a pre-set association degree threshold value to form a first candidate word set W_s; calculating a support degree and a confidence degree of the morpheme t_i, and summarizing morphemes t_i with both the support degree and the confidence degree being greater than or equal to a corresponding minimum threshold value to form a second candidate word set W_t; and obtaining an intersection between the first candidate word set W_s and the second candidate word set W_t as a candidate new word set W_h, filtering the candidate new word set W_h, extracting a new word and saving same as a new word set W. In the embodiments of the present invention, information entropy algorithm analysis and association rule algorithm analysis are effectively combined, and thus the accuracy degree of new word discovery can be effectively improved.

机译：本发明的实施例涉及用于发现新单词的方法和设备。该方法包括：从目标文本库中的目标文本中提取一个词素，构造一个词素集H，对所述词素的出现频率进行统计，以两个元组的形式表示所述词素和所述词素的出现频率，形成一个二元组集合T;计算词素t _{i 的子集w的上下文关联度d，并总结d值大于或等于a的词素t _{i 的子集w。预设关联度阈值，以形成第一候选词集W _{s ;计算词素t _{i 的支持度和置信度，并总结词素t _{i 的词素，其支持度和置信度均大于或等于相应的形成第二候选词集合W _{t 的最小阈值;获得第一候选单词集W _{s 和第二候选单词集W _{t 之间的交集作为候选新单词集W _{h ，过滤候选新词集W _{h ，提取新词并保存为新词集W。在本发明实施例中，有效地结合了信息熵算法分析和关联规则算法分析，因此可以有效地提高新词发现的准确性。}}}}}}}}}}

著录项

公开/公告号WO2017185674A1

专利类型
公开/公告日2017-11-02

原文格式PDF
申请/专利权人 LE HOLDINGS (BEIJING) CO. LTD.;LE SHI INTERNET INFORMATION & TECHNOLOGY CORP. BEIJING;
展开▼

申请/专利号WO2016CN102448
发明设计人 KANG CHAOMING;
展开▼

申请日2016-10-18
分类号G06F17/27;
国家 WO
入库时间 2022-08-21 13:29:12

相似文献

专利
外文文献
中文文献