首页> 外文会议>International Conference on Artificial Neural Networks >Evaluation of Domain Adaptation Approaches for Robust Classification of Heterogeneous Biological Data Sets
【24h】

Evaluation of Domain Adaptation Approaches for Robust Classification of Heterogeneous Biological Data Sets

机译:异构生物数据集鲁棒分类的域适应方法评估

获取原文

摘要

Most machine learning algorithms require that training data are identically distributed to ensure effective learning. In biological studies, however, even small variations in the experimental setup can lead to substantial deviations. Domain adaptation offers tools to deal with this problem. It is particularly useful for cases where only a small amount of training data is available in the domain of interest, while a large amount of training data is available in a different, but relevant domain. We investigated to what extent domain adaptation was able to improve prediction accuracy for complex biological data. To that end, we used simulated data and time-lapse movies of differentiating blood stem cells in different cell cycle stages from multiple experiments and compared three commonly used domain adaptation approaches. EasyAdapt, a simple technique of structured pooling of related data sets, was able to improve accuracy when classifying the simulated data and cell cycle stages from microscopic images. Meanwhile, the technique proved robust to the potential negative impact on the classification accuracy that is common in other techniques that build models with heterogeneous data. Despite its implementation simplicity, EasyAdapt consistently produced more accurate predictions compared to conventional techniques. Domain adaptation is therefore able to substantially reduce the amount of work required to create a large amount of annotated training data in the domain of interest necessary whenever the domain changes even a little, which is common not only in biological experiments, but universally exists in almost all data collection routines.
机译:大多数机器学习算法都要求训练数据均匀分布以确保有效学习。但是,在生物学研究中,即使实验设置中的微小变化也可能导致明显的偏差。域适应提供了解决此问题的工具。对于在感兴趣的域中只有少量训练数据可用而在不同但相关的域中有大量训练数据可用的情况下,此功能特别有用。我们研究了域适应在多大程度上能够改善复杂生物学数据的预测准确性。为此,我们使用了来自多个实验的在不同细胞周期阶段分化出的血干细胞的模拟数据和延时影片,并比较了三种常用的域适应方法。 EasyAdapt是一种对相关数据集进行结构化合并的简单技术,当从微观图像对模拟数据和细胞周期阶段进行分类时,能够提高准确性。同时,该技术被证明具有鲁棒性,可对分类准确性产生潜在的负面影响,这在使用异构数据构建模型的其他技术中很常见。尽管实现简单,但EasyAdapt始终比传统技术产生更准确的预测。因此,领域自适应能够显着减少在领域变化甚微时在感兴趣的领域中创建大量带注释的训练数据所需的工作量,这不仅在生物学实验中很普遍,而且在几乎所有领域都普遍存在所有数据收集例程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号