首页> 外文会议>East Indonesia Conference on Computer and Information Technology >Detection of Online Prostitution in Twitter Platform Using Machine Learning Approach
【24h】

Detection of Online Prostitution in Twitter Platform Using Machine Learning Approach

机译:用机器学习方法检测推特平台的在线卖淫

获取原文

摘要

Twitter is one of the social media used for online prostitution. Machine learning is a technological approach that can detect the existence of accounts on Twitter. This research used a CRISP-DM Method. The algorithms used are SVM, Random Forest, and Naive Bayes. Crawling using hashtags containing online prostitution is a solution to get data about online prostitution accounts. From the results of data labeling, there are two data set models. The first set of data models is the data set of accounts for prostitution and accounts for non-prostitution without hashtags prostitution. And the second is the data set of accounts for prostitution with non-prostitution accounts with prostitution hashtags. The study results show that for the data set 1 model, features that can distinguish between prostitution accounts and non-prostitution accounts are the number of followers, tweets, age of accounts, and content (words and hashtags). For data set 2, distinguishing between prostitution accounts and non-prostitution accounts with prostitution hashtags is the number of tweets and content (hashtags and words). Afterward, it is known that SVM has the highest accuracy rate, namely 98,83% for data set model 1, while Random Forest has the highest accuracy for dataset model 2, namely, 82,93% for data set model 2. Furthermore, to know the best model between model dataset1 and model dataset2, we also test the model with the same new data as 150 random data. The result is that dataset model 2 is better than dataset model l because it makes fewer errors in predictions, which is only 29 errors compared to dataset 1 with 37 errors.
机译:Twitter是用于在线卖淫的社交媒体之一。机器学习是一种技术方法,可以检测到推特上的账户存在。这项研究使用了CRISP-DM方法。使用的算法是SVM,随机林和幼稚贝叶斯。使用包含在线卖淫的Hashtags爬行是一个解决有关在线卖淫账户数据的解决方案。根据数据标签的结果,有两个数据集模型。第一组数据模型是卖淫和占卖淫卖淫卖淫的卖淫和账户的数据集。第二个是卖淫账户卖淫占卖淫标签的卖淫账户的数据集。该研究结果表明,对于数据集1模型,可以区分卖淫账户和非卖淫账户的功能是追随者,推文,账户年龄和内容(单词和哈希特)的数量。对于数据集2,与卖淫标签的卖淫账户和非卖淫账目的区别是推文和内容(HASHTAGS和单词)的数量。之后,已知SVM具有最高的精度率,即数据集模型1的98,83%,而随机林具有最高的数据集模型2的精度,即数据集模型2的82,93%。此外,要了解模型数据集1和型号数据集2之间的最佳模型,我们还测试模型与150个随机数据相同的新数据。结果是数据集模型2比数据集模型L更好,因为它在预测中的错误较少,与数据集1相比只有29个错误,具有37个错误。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号