首页> 外国专利> Acronym extraction system and method of identifying acronyms and extracting corresponding expansions from text

Acronym extraction system and method of identifying acronyms and extracting corresponding expansions from text

机译:首字母缩略词提取系统和识别首字母缩略词并从文本中提取相应扩展名的方法

摘要

An acronym expansion system of the present invention receives electronic documents and extracts acronyms and their corresponding expansions. A part-of-speech tagger decomposes text into string tokens or words and tags them with their part-of-speech, while an acronym identifier determines whether a word is a potential acronym based on various conditions. An expansion identifier retrieves lists of words preceding and following a potential acronym to search for the expansion. The resulting word lists are examined sequentially to identify and retrieve an expansion for the potential acronym. An expansion extractor receives the potential acronym and a processed word list to retrieve the expansion of the potential acronym from that list. The extractor may utilize information from prior search iterations, and verifies an extracted expansion against a set of rules to remove spurious expansions.
机译:本发明的首字母缩略词扩展系统接收电子文档并提取首字母缩略词及其相应的扩展词。词性标记器将文本分解为字符串标记或单词,并用其词性标记它们,而首字母缩写词标识符则根据各种条件确定单词是否为潜在的首字母缩写词。扩展标识符检索潜在首字母缩写词前后的单词列表以搜索扩展。顺序检查得到的单词列表,以识别和检索潜在首字母缩写的扩展名。扩展提取器接收潜在首字母缩写词和处理的单词列表,以从该列表中检索潜在首字母缩写词的扩展。提取器可利用来自先前搜索迭代的信息,并对照一组规则验证提取的扩展以去除虚假扩展。

著录项

  • 公开/公告号US7236923B1

    专利类型

  • 公开/公告日2007-06-26

    原文格式PDF

  • 申请/专利权人 KALYAN M GUPTA;

    申请/专利号US20020212914

  • 发明设计人 KALYAN M GUPTA;

    申请日2002-08-07

  • 分类号G06F17/30;G06F17/27;G06F17/28;

  • 国家 US

  • 入库时间 2022-08-21 21:01:22

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号