首页> 外文期刊>International Journal of Innovative Computing Information and Control >UPDATING FIELD ASSOCIATION WORD DICTIONARY USING WORD ATTRIBUTES, MORPHOLOGICAL ANALYSIS, AND COMPOUND WORDS
【24h】

UPDATING FIELD ASSOCIATION WORD DICTIONARY USING WORD ATTRIBUTES, MORPHOLOGICAL ANALYSIS, AND COMPOUND WORDS

机译:使用词属性,词法分析和复合词来更新字段关联词词典

获取原文
获取原文并翻译 | 示例
           

摘要

Document classification and summarization are certainly important for document text retrieval. There are some pioneer researches using Field Association (FA) words to identify the subject of a text (document field) when extracting specific words in that text. However, these works have disadvantages by extracting irrelevant FA words selection and therefore, results giving huge amount of unwanted texts. To treat these disadvantages in text retrieval, two techniques are used: the first technique is using attributes for extracting FA words and classifying the texts in the document proposed. The key point of this technique is to use attributes to recognize specific field information as well as extracting relevant FA words. The second technique proposes a method for filtering automatically the FA words dictionary by deleting irrelevant word using morphological analysis and words that has no more information than single FA. From experimental results Precision and Recall are improved by 11-18% and 15-28% respectively using word attribute (first technique) than traditional method. Moreover, the second technique could delete around 15% of irrelevant FA word from word candidates using morphological analysis and words that has no more information than single FA. Furthermore, Precision and Recall increases by 18-25% after using the second technique as the dictionary words become clear and specific. Finally, the New_m (new method) gains higher classification accuracy over all models by 10-15%. This model achieves high classification accuracy because it gains the advantage of the FA words classification using extraction attributes.
机译:文档分类和摘要对于文档文本检索当然很重要。有一些先驱研究在提取文本中的特定单词时使用字段关联(FA)单词来识别文本(文档字段)的主题。但是,这些工作由于提取了不相关的FA词选择而具有缺点,因此,结果会产生大量不需要的文本。为了解决文本检索中的这些缺点,使用了两种技术:第一种技术是使用属性来提取FA单词并将分类的文本在建议的文档中。该技术的关键是使用属性来识别特定的字段信息以及提取相关的FA字。第二种技术提出了一种方法,该方法通过使用形态学分析删除不相关的单词和信息量不超过单个FA的单词来自动过滤FA单词词典。从实验结果来看,与传统方法相比,使用单词属性(第一种技术)可以将精度和召回率分别提高11-18%和15-28%。此外,第二种技术可以使用词法分析和单词信息不超过单个FA的单词从候选单词中删除大约15%的无关FA单词。此外,使用第二种技术后,由于字典单词变得清晰而具体,精度和查全率提高了18-25%。最后,New_m(新方法)在所有模型上的分类精度都提高了10-15%。该模型获得了高分类精度,因为它获得了使用提取属性进行FA词分类的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号