首页> 外文会议>Moratuwa Engineering Research Conference >Classification of Cyberbullying Sinhala Language Comments on Social Media
【24h】

Classification of Cyberbullying Sinhala Language Comments on Social Media

机译:网络欺凌僧伽罗语语言分类在社交媒体上的评论

获取原文

摘要

Due to technological revolution over the years, bullying which was confined to physical boundaries has now moved online. Denigration or insult is one form of cyberbullying. According to Sri Lanka Computer Emergency Readiness Team, social media cyberbullying incidents are escalating. Insulting words are dynamic, and same word can have several meanings according to the context. Simply because a comment contains such a word, it cannot be classified as bullying. Hence, when labeling comments, simple keyword spotting techniques are inadequate. Other languages have addressed this issue using lexical databases such as WordNet which provides synonyms and homonyms of words. Since there is no proper lexical database developed for Sinhala language, detecting a word as bullying is a challenge. Therefore, we used rules to overcome this issue. Twitter comments with profane words were collected, outliers were removed, and remaining tweets were pre-processed. To determine insult in the text, five rules were used for feature extraction. Afterward, we applied Support Vector Machine (SVM), K-nearest neighbor (KNN) and Naïve Bayes algorithms. The results show that SVM with an RBF kernel performs better with an F1-score of 91%. Novelty of this research is the focus on Sinhala language cyberbully detection which has not been addressed before.
机译:由于多年来的技术革命,仅限于物理边界的欺凌行为现在已经在线上转移。 ig毁或侮辱是网络欺凌的一种形式。根据斯里兰卡计算机应急准备小组的说法,社交媒体网络欺凌事件正在升级。侮辱性单词是动态的,并且根据上下文,同一个单词可以具有多种含义。仅仅因为评论包含这样的单词,就不能将其归类为欺凌。因此,在标记注释时,简单的关键字发现技术是不够的。其他语言已经使用词汇数据库(如WordNet)解决了这个问题,WordNet提供了单词的同义词和同音异义词。由于没有针对僧伽罗语语言开发的适当词汇数据库,因此将单词检测为欺凌是一个挑战。因此,我们使用规则来解决此问题。收集了带有亵渎词语的Twitter评论,排除了异常值,并对剩余的推文进行了预处理。为了确定对文本的侮辱,使用了五个规则进行特征提取。之后,我们应用了支持向量机(SVM),K近邻(KNN)和朴素贝叶斯算法。结果表明,带有RBF内核的SVM的F1得分为91%,表现更好。这项研究的新颖性是针对Sinhala语言的网络欺凌检测,以前从未解决过。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号