首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Maori Loanwords: A Corpus of New Zealand English Tweets
【24h】

Maori Loanwords: A Corpus of New Zealand English Tweets

机译:Maori Loanwords:新西兰英语推文的一个语料库

获取原文

摘要

Maori loanwords are widely used in New Zealand English for various social functions by New Zealanders within and outside of the Maori community. Motivated by the lack of linguistic resources for studying how Maori loanwords are used in social media, we present a new corpus of New Zealand English tweets. We collected tweets containing selected Maori words that are likely to be known by New Zealanders who do not speak Maori. Since over 30% of these words turned out to be irrelevant (e.g., mana is a popular gaming term, Moana is a character from a Disney movie), we manually annotated a sample of our tweets into relevant and irrelevant categories. This data was used to train machine learning models to automatically filter out irrelevant tweets.
机译:毛利借词广泛应用于毛利社区内外新西兰人的新西兰英语。由于缺乏学习毛利人借词在社交媒体中使用的语言资源的动机,我们展示了新西兰英语推文的新语料库。我们收集了含有选定的毛利语单词的推文,这些词可能会被毛利人讲的新西兰人所知。由于这些词的超过30%被证明是无关紧要的(例如,MANA是一个受欢迎的游戏期限,Moana是来自迪士尼电影的一个角色),我们手动向相关和无关的类别手动注释了我们的推文的样本。此数据用于培训机器学习模型,以自动过滤出无关的推文。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号