首页> 外文会议>International Conference on Tools with Artificial Intelligence >Semi-automatic Dictionary Curation for Domain-Specific Ontologies
【24h】

Semi-automatic Dictionary Curation for Domain-Specific Ontologies

机译:特定领域本体的半自动词典管理

获取原文

摘要

Within the broad area of information extraction, we study the problem of effective dictionary curation in an enterprise setting. Equipped with an ontology, representative of the domain of an enterprise, our approach populates the attributes of leaf nodes of the ontology with instances extracted from the enterprise corpus. For an attribute of interest, given a few seed examples or indicative features for the attribute, we first obtain a ranked list of 'list pages' potentially containing additional dictionary terms. Our ranking model ranks pages from the enterprise corpus based on their 'list' content using several visual and lexical features. We gather users' judgement of the result pages and the model continuously learns from this feedback. We compare different techniques of dictionary curation using rule based extractors and visual features of pages. Based on rule writing exercise, we show the benefit of dictionaries for leaf node attributes, in writing rule based extractors for higher level nodes in an ontology. We have implemented a dictionary curation system based on these ideas. Experimental analysis using academic domain ontology and universities corpora, reveal (in the context of enterprise analytics) (i) the merit of dictionary support in rule based information extraction (ii) the viability and effectiveness of an interactive approach for dictionary creation.
机译:在广泛的信息提取领域中,我们研究了企业环境中有效的词典管理问题。配备了代表企业域的本体,我们的方法使用从企业语料库中提取的实例填充本体的叶节点的属性。对于感兴趣的属性,给定几个种子示例或该属性的指示性特征,我们首先获取可能包含其他词典术语的“列表页面”的排名列表。我们的排名模型使用几种视觉和词汇功能,根据企业语料库的“列表”内容对页面进行排名。我们收集用户对结果页面的判断,并且模型会从该反馈中不断学习。我们比较了使用基于规则的提取器和页面的视觉特征的字典管理的不同技术。在编写规则的基础上,我们展示了字典对于叶子节点属性的好处,在为本体中更高层次的节点编写基于规则的提取器时。我们基于这些想法实施了词典管理系统。使用学术领域本体和大学语料库进行的实验分析揭示了(在企业分析的背景下)(i)基于规则的信息提取中词典支持的优点(ii)交互式交互式词典创建方法的可行性和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号