首页> 外文会议>Brazilian symposium on bioinformatics >S~2FS: Single Score Feature Selection Applied to the Problem of Distinguishing Long Non-coding RNAs from Protein Coding Transcripts
【24h】

S~2FS: Single Score Feature Selection Applied to the Problem of Distinguishing Long Non-coding RNAs from Protein Coding Transcripts

机译:S〜2FS:单分数特征选择应用于区分蛋白质编码转录本中的长非编码RNA的问题

获取原文

摘要

The task of distinguishing long non-coding RNAs (IncRNAs) from protein coding transcripts (PCTs) has been previously addressed with machine learning (ML) algorithms using hundreds of features. However, the use of a large number of features can negatively affect the predictive performance of these algorithms since it can lead to problems like overfitting due to a phenomenon known as the curse of dimensionality. In order to deal with these problems, dimensionality reduction techniques have been proposed, among them, feature selection. This work proposes and experimentally evaluates a simple and fast feature selection technique, called Single Score Feature Selection - S~2FS. For such, initially, frequencies of 2-mers, 3-mers and 4-mers were extracted from public databases of PCTs and IncRNAs of Homo sapiens, resulting in a dataset composed of two groups of RNA sequences, one for PCTs and the other for IncRNAs, and a large number of features. To reduce the number of features, S~2FS was applied to the dataset. Experimental results showed that relevant features were selected, keeping the predictive accuracy, with a lower processing cost than some existing feature selection techniques.
机译:以前已经使用数百种功能通过机器学习(ML)算法解决了区分长的非编码RNA(IncRNA)与蛋白质编码转录本(PCT)的任务。但是,使用大量功能可能会对这些算法的预测性能产生负面影响,因为由于称为维数诅咒的现象,它可能导致诸如过度拟合之类的问题。为了解决这些问题,提出了降维技术,其中包括特征选择。这项工作提出并通过实验评估了一种简单快速的特征选择技术,称为单分数特征选择-S〜2FS。为此,最初,从智人的PCT和IncRNA的公共数据库中提取2聚体,3聚体和4聚体的频率,得到的数据集由两组RNA序列组成,一组用于PCT,另一组用于PCT。 IncRNA和许多功能。为了减少特征数量,将S〜2FS应用于数据集。实验结果表明,与现有的某些特征选择技术相比,选择了相关特征可以保持预测的准确性,并降低了处理成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号