Non-words Spell Corrector of Social Media Data in Message Filtering Systems

Zar Zar Wint; Theo Ducros; Masayoshi Aritsugi

首页> 外文期刊>Journal of digital information management >Non-words Spell Corrector of Social Media Data in Message Filtering Systems

【24h】

Non-words Spell Corrector of Social Media Data in Message Filtering Systems

机译：邮件过滤系统中社交媒体数据的非单词拼写校正

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We develop an extended version of spell checker and corrector to check non-word errors in social media datasets, which will be used in message filtering systems especially for cyberbullying detection. We use the dictionary techniques to check words, twelve-word spell error checking and correction approaches to correct the non-word errors, and n-gram and Levenshtein distance to select the most suitable word among corrected words. If there is more than one corrected word we get from each approach, we use n-gram techniques to choose the corrected and reasonable word from the words in n-gram database. When we used the Levenshtein distance in our previous work, we found that it selected the first corrected word and it was not a reasonable one in some sentences. Therefore, we use the n-gram database in this paper.

机译：我们开发了扩展版本的拼写检查器和纠正器，以检查社交媒体数据集中的非单词错误，这些错误将用于消息过滤系统中，尤其是用于网络欺凌检测。我们使用字典技术来检查单词，使用十二个单词的拼写错误检查和纠正方法来纠正非单词错误，并使用n-gram和Levenshtein距离在纠正的单词中选择最合适的单词。如果从每种方法中得到的纠正单词不止一个，我们将使用n-gram技术从n-gram数据库中的单词中选择纠正和合理的单词。当我们在以前的工作中使用Levenshtein距离时，我们发现它选择了第一个更正的单词，并且在某些句子中不是一个合理的单词。因此，我们在本文中使用n-gram数据库。

著录项

来源
《Journal of digital information management》 |2018年第2期|64-75|共12页
作者
Zar Zar Wint; Theo Ducros; Masayoshi Aritsugi;
展开▼
作者单位

Computer Science and Electrical Engineering Graduate School of Science and Technology Kumamoto University, Kumamoto 860-8555, Japan ,Department of Computer Engineering and Information Technology Mandalay Technological University Mandalay, Myanmar;

Polytech Clermont-Ferrand University ClermontAuvergne Clermont-Ferrand, France;

Computer Science and Electrical Engineering Graduate School of Science and Technology Kumamoto University, Kumamoto 860-8555, Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Non-word Error; Spell Checker; Spell Corrector; Spell Checking And Correction Approaches; N-gram;

机译：非单词错误;拼写检查程序;拼写校正器;拼写检查和纠正方法;克;

相似文献

外文文献
中文文献
专利

1. Filtering big data from social media - Building an early warning system for adverse drug reactions [J] . Yang Ming, Kiang Melody, Shang Wei Journal of biomedical informatics. . 2015,第Null期

机译：从社交媒体过滤大数据-建立药物不良反应预警系统
2. Filtering big data from social media - Building an early warning system for adverse drug reactions [J] . Yang Ming, Kiang Melody, Shang Wei Journal of biomedical informatics. . 2015,第Null期

机译：过滤来自社交媒体的大数据 - 构建一个用于不良药物的预警系统
3. Filtering Entities to Optimize Identification of Adverse Drug Reaction From Social Media: How Can the Number of Words Between Entities in the Messages Help? [J] . Redhouane Abdellaoui, Nathalie Texier, Anita Burgun JMIR public health and surveillance. . 2017,第2期

机译：筛选实体以优化从社交媒体进行的药物不良反应的识别：消息中实体之间的单词数量如何提供帮助？
4. Spell corrector to social media datasets in message filtering systems [C] . Zar Zar Wint, Theo Ducros, Masayoshi Aritsugi International Conference on Digital Information Management . 2017

机译：消息过滤系统中社交媒体数据集的拼写校正器
5. The effect of message framing on environmental behavior within an exploratory study about using social media as a method to collect data [D] . Ford, David K. 2010

机译：信息框架对探索性研究中的环境行为的影响，从而使用社交媒体作为收集数据的方法
6. Hybrid Median Filter Background Estimator for Correcting Distortions in Microtiter Plate Data [O] . Paul J. Bushway, Behrad Azimi, Susanne Heynen-Genel, -1

机译：混合中值滤波器背景估计器用于校正微量滴定板数据中的失真
7. Filtering big data from social media – Building an early warning system for adverse drug reactions [O] . Yang Ming, Kiang Melody, Shang Wei 2015

机译：从社交媒体过滤大数据–建立药品不良反应预警系统

Non-words Spell Corrector of Social Media Data in Message Filtering Systems

摘要

著录项

相似文献

相关主题

期刊订阅