首页> 外文期刊>Mathematical Problems in Engineering >The Prediction of Diatom Abundance by Comparison of Various Machine Learning Methods
【24h】

The Prediction of Diatom Abundance by Comparison of Various Machine Learning Methods

机译:通过比较各种机器学习方法的硅藻丰度预测

获取原文
获取原文并翻译 | 示例
           

摘要

This study adopts two approaches to analyze the occurrence of algae at Haman Weir for Nakdong River; one is the traditional statistical method, such as logistic regression, while the other is machine learning technique, such as kNN, ANN, RF, Bagging, Boosting, and SVM. In order to compare the performance of the models, this study measured the accuracy, specificity, sensitivity, and AUC, which are representative model evaluation tools. The ROC curve is created by plotting association of sensitivity and (1-specificity). The AUC that is area of ROC curve represents sensitivity and specificity. This measure has two competitive advantages compared to other evaluation tools. One is that it is scale-invariant. It means that purpose of AUC is how well the model predicts. The other is that the AUC is classification-threshold-invariant. It shows that the AUC is independent of threshold because it is plotted association of sensitivity and (1-specificity) obtained by threshold. We chose AUC as a final model evaluation tool with two advantages. Also, variable selection was conducted using the Boruta algorithm. In addition, we tried to distinguish the better model by comparing the model with the variable selection method and the model without the variable selection method. As a result of the analysis, Boruta algorithm as a variable selection method suggested PO4-P, DO, BOD, NH3-N, Susp, pH, TOC, Temp, TN, and TP as significant explanatory variables. A comparison was made between the model with and without these selected variables. Among the models without variable selection method, the accuracy of RF analysis was highest, and ANN analysis showed the highest AUC. In conclusion, ANN analysis using the variable selection method showed the best performance among the models with and without variable selection method.
机译:本研究采用两种方法来分析Nakdong River汉南堰藻类的发生;一个是传统的统计方法,如逻辑回归,而另一个是机器学习技术,如KNN,ANN,RF,袋装,升压和SVM。为了比较模型的性能,本研究测量了代表性模型评估工具的准确性,特异性,灵敏度和AUC。通过绘制敏感度和(1特异性)的关联来创建ROC曲线。作为ROC曲线区域的AUC代表了敏感性和特异性。与其他评估工具相比,这项措施具有两个竞争优势。一个是它是不变的。这意味着AUC的目的是模型预测的程度。另一种是AUC是分类阈值不变的。它表明AUC与阈值无关,因为它被绘制了敏感性和(1特异性)的阈值。我们选择了AUC作为最终的模型评估工具,具有两个优点。此外,使用Boruta算法进行变量选择。此外,我们尝试通过将模型与变量选择方法和模型进行比较来区分更好的模型,而没有变量选择方法。作为分析的结果,Boruta算法作为可变选择方法的表明PO4-P,DO,BOD,NH3-N,SEAV,PH,TOC,TEMP,TN和TP作为显着的解释变量。在模型与没有这些所选变量之间的比较。在没有可变选择方法的模型中,RF分析的准确性最高,ANN分析显示了最高的AUC。总之,使用可变选择方法的ANN分析显示了具有且不可变选择方法的模型中的最佳性能。

著录项

  • 来源
    《Mathematical Problems in Engineering》 |2019年第12期|5749746.1-5749746.13|共13页
  • 作者单位

    Natl Inst Environm Res Dept Water Environm Res Incheon 22689 South Korea;

    K Water Daejeon 34045 South Korea;

    K Water Daejeon 34045 South Korea;

    K Water Daejeon 34045 South Korea;

    Chungbuk Natl Univ Dept Informat & Stat Chungbuk 28644 South Korea;

    Chungbuk Natl Univ Dept Informat & Stat Chungbuk 28644 South Korea;

    Chungbuk Natl Univ Dept Informat & Stat Chungbuk 28644 South Korea;

    Chungbuk Natl Univ Dept Informat & Stat Chungbuk 28644 South Korea;

    Korea Inst Sci Technol Informat Idea Commercializat Ctr Seoul 02456 South Korea;

    Chungbuk Natl Univ Dept Informat & Stat Chungbuk 28644 South Korea;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号