首页> 外文期刊>Informatica: An International Journal of Computing and Informatics >*MWELex – MWE Lexica of Croatian, Slovene and Serbian Extracted from Parsed Corpora
【24h】

*MWELex – MWE Lexica of Croatian, Slovene and Serbian Extracted from Parsed Corpora

机译:* MWELex –克罗地亚,斯洛文尼亚和塞尔维亚人的MWE Lexica摘录自解析语料库

获取原文
           

摘要

The paper presents *MWELex, a multilingual lexical repository of Croatian, Slovene and Serbian multiwordexpressions that were extracted from parsed corpora. The lexica were built with the custom-builtDepMWEx tool which uses dependency syntactic patterns to identify MWE candidates in parse trees. Theextracted MWE candidates are subsequently scored by co-occurrence and organized by headwords producinga resource of 23 to 48 thousand headwords and 3.2 to 12 million MWE candidates per language. Theevaluation of the lexicon, performed on Croatian and Slovene, shows an overall precision of just over 50%for Croatian but as high as 85% for Slovene. Similarly, precision over specific syntactic patterns variesgreatly, 0.167-0.859 for Croatian, 0.158-1.00 for Slovene. The possible extension of the tool is demonstratedon a simplistic distributional-based extraction of non-transparent MWEs and cross-lingual linkingof the extracted lexicons.
机译:本文介绍了* MWELex,这是从已解析的语料库中提取的克罗地亚语,斯洛文尼亚语和塞尔维亚语多词表达的多语言词汇库。使用自定义的DepMWEx工具构建了词典,该工具使用依赖句法模式来识别解析树中的MWE候选对象。随后,通过共现对提取的MWE候选人进行评分,并由headwords组织,产生每种语言23至4.8万headwords和3.2至1200万MWE候选人的资源。对克罗地亚语和斯洛文尼亚语进行的词典评估显示,克罗地亚语的整体精度略高于50%,而斯洛文尼亚则高达85%。同样,特定句法模式的精度差异很大,克罗地亚语为0.167-0.859,斯洛文尼亚语为0.158-1.00。该工具的可能扩展在非透明MWE的基于分布的简单提取以及所提取词典的跨语言链接中得到了证明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号