首页> 外文期刊>Journal of digital information management >Non-words Spell Corrector of Social Media Data in Message Filtering Systems
【24h】

Non-words Spell Corrector of Social Media Data in Message Filtering Systems

机译:邮件过滤系统中社交媒体数据的非单词拼写校正

获取原文
获取原文并翻译 | 示例
           

摘要

We develop an extended version of spell checker and corrector to check non-word errors in social media datasets, which will be used in message filtering systems especially for cyberbullying detection. We use the dictionary techniques to check words, twelve-word spell error checking and correction approaches to correct the non-word errors, and n-gram and Levenshtein distance to select the most suitable word among corrected words. If there is more than one corrected word we get from each approach, we use n-gram techniques to choose the corrected and reasonable word from the words in n-gram database. When we used the Levenshtein distance in our previous work, we found that it selected the first corrected word and it was not a reasonable one in some sentences. Therefore, we use the n-gram database in this paper.
机译:我们开发了扩展版本的拼写检查器和纠正器,以检查社交媒体数据集中的非单词错误,这些错误将用于消息过滤系统中,尤其是用于网络欺凌检测。我们使用字典技术来检查单词,使用十二个单词的拼写错误检查和纠正方法来纠正非单词错误,并使用n-gram和Levenshtein距离在纠正的单词中选择最合适的单词。如果从每种方法中得到的纠正单词不止一个,我们将使用n-gram技术从n-gram数据库中的单词中选择纠正和合理的单词。当我们在以前的工作中使用Levenshtein距离时,我们发现它选择了第一个更正的单词,并且在某些句子中不是一个合理的单词。因此,我们在本文中使用n-gram数据库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号