首页> 外文会议>7th Global wordnet conference >Building a WordNet for Sinhala
【24h】

Building a WordNet for Sinhala

机译:为僧伽罗语构建WordNet

获取原文
获取原文并翻译 | 示例

摘要

Sinhala is one of the official languages of Sri Lanka and is used by over 19 million people. It belongs to the Indo-Aryan branch of the Indo-European languages and its origins date back to at least 2000 years. It has developed into its current form over a long period of time with influences from a wide variety of languages including Tamil, Portuguese and English. As for any other language, a WordNet is extremely important for Sinhala to take it into the digital era. This paper is based on the project to develop a WordNet for Sinhala based on the English (Princeton) WordNet. It describes how we overcame the challenges in adding Sinhala specific characteristics which were deemed important by Sinhala language experts to the WordNet while keeping the structure of the original English WordNet. It also presents the details of the crowdsourcing system we developed as a part of the project -consisting of a NoSQL database in the backend and a web-based frontend. We conclude by discussing the possibility of adapting this architecture for other languages and the road ahead for the Sinhala WordNet and Sinhala NLP.
机译:僧伽罗语是斯里兰卡的官方语言之一,已有1900万人使用。它属于印欧语系的Indo-Aryan分支,其起源可追溯到至少2000年。在很长一段时间内,它已发展成目前的形式,受到泰米尔语,葡萄牙语和英语等多种语言的影响。对于任何其他语言,WordNet对于Sinhala进入数字时代都至关重要。本文基于基于英语(普林斯顿)WordNet开发用于僧伽罗语的WordNet的项目。它描述了我们如何克服了将僧伽罗语语言专家认为重要的僧伽罗语特定特征添加到WordNet的挑战,同时又保持了原始英语WordNet的结构。它还提供了我们作为项目一部分开发的众包系统的详细信息-由后端的NoSQL数据库和基于Web的前端组成。最后,我们讨论了将这种体系结构用于其他语言的可能性,以及Sinhala WordNet和Sinhala NLP的发展之路。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号