首页> 外文会议>International Workshop on Resource Discovery >Building Specialized Multilingual Lexical Graphs Using Community Resources
【24h】

Building Specialized Multilingual Lexical Graphs Using Community Resources

机译:使用社区资源构建专门的多语言词汇图表

获取原文

摘要

We are describing methods for compiling domain-dedicated multilingual terminological data from various resources. We focus on collecting data from online community users as a main source, therefore, our approach depends on acquiring contributions from volunteers (explicit approach), and it depends on analyzing users' behaviors to extract interesting patterns and facts (implicit approach). As a generic repository that can handle the collected multilingual terminological data, we are describing the concept of dedicated Multilingual Preterminological Graphs MPGs, and some automatic approaches for constructing them by analyzing the behavior of online community users. A Multilingual Preterminological Graph is a special lexical resource that contains massive amount of terms related to a special domain. We call it preterminological, because it is a raw material that can be used to build a standardized terminological repository. Building such a graph is difficult using traditional approaches, as it needs huge efforts by domain specialists and terminologists. In our approach, we build such a graph by analyzing the access log files of the website of the community, and by finding the important terms that have been used to search in that website, and their association with each other. We aim at making this graph as a seed repository so multilingual volunteers can contribute. We are experimenting this approach with the Digital Silk Road Project. We have used its access log files since its beginning in 2003, and obtained an initial graph of around 116000 terms. As an application, we used this graph to obtain a preterminological multilingual database that is serving a CLIR system for the DSR project.
机译:我们正在描述从各种资源编译域专用的多语言术语数据的方法。我们专注于将数据从网上社区用户收集到主要来源,因此,我们的方法取决于获取志愿者(明确方法)的贡献,这取决于分析用户的行为,以提取有趣的模式和事实(隐式方法)。作为能够处理收集的多语言术语数据的通用存储库,我们正在描述专用的多语种前言方式MPG的概念,以及通过分析在线社区用户的行为来构建它们的一些自动方法。多语种前言图是一种特殊的词汇资源,包含与特殊域相关的大量术语。我们称之为前言,因为它是一种原料,可用于构建标准化的术语存储库。建立这种图形是难以使用传统方法的,因为它需要域名专家和术语学家的巨大努力。在我们的方法中,我们通过分析社区网站的访问日志文件来构建此类图形,并通过查找已用于在该网站中搜索的重要术语,以及它们相互关联。我们的目标是使这个图形作为种子存储库,因此多语种志愿者可以贡献。我们正在用数字丝绸之路项目试验这种方法。自2003年开始以来,我们使用了访问日志文件,并获得了大约116000左右的初始图。作为应用程序,我们使用此图来获取为DSR项目提供CLIR系统的原料多语言数据库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号