首页> 外文期刊>Database >3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families
【24h】

3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families

机译:3PFDB +:改进的搜索协议和更新,用于识别蛋白质序列域家族的代表

获取原文
           

摘要

Protein domain families are usually classified on the basis of similarity of amino acid sequences. Selection of a single representative sequence for each family provides targets for structure determination or modeling and also enables fast sequence searches to associate new members to a family. Such a selection could be challenging since some of these domain families exhibit huge variation depending on the number of members in the family, the average family sequence length or the extent of sequence divergence within a family. We had earlier created 3PFDB database as a repository of best representative sequences, selected from each PFAM domain family on the basis of high coverage. In this study, we have improved the database using more efficient strategies for the initial generation of sequence profiles and implement two independent methods, FASSM and HMMER, for identifying family members. HMMER employs a global sequence similarity search, while FASSM relies on motif identification and matching. This improved and updated database, 3PFDB+ generated in this study, provides representative sequences and profiles for PFAM families, with 13 519 family representatives having more than 90% family coverage. The representative sequence is also highlighted in a two-dimensional plot, which reflects the relative divergence between family members. Representatives belonging to small families with short sequences are mainly associated with low coverage. The set of sequences not recognized by the family representative profiles, highlight several potential false or weak family associations in PFAM. Partial domains and fragments dominate such cases, along with sequences that are highly diverged or different from other family members. Some of these outliers were also predicted to have different secondary structure contents, which reflect different putative structure or functional roles for these domain sequences. Database URL: http://caps.ncbs.res.in/3pfdbplus/
机译:通常根据氨基酸序列的相似性对蛋白质结构域家族进行分类。为每个家族选择单个代表性序列为结构确定或建模提供了目标,并且还使得能够进行快速序列搜索以将新成员与家族相关联。这种选择可能具有挑战性,因为其中一些域家族根据家族成员的数量,平均家族序列长度或家族内序列分歧的程度而表现出巨大的差异。我们早先创建了3PFDB数据库作为最佳代表性序列的存储库,该数据库是在高覆盖率的基础上从每个PFAM域家族中选择的。在这项研究中,我们已经使用更有效的策略对数据库进行了初步生成,并改进了数据库,并实施了两种独立的方法FASSM和HMMER来识别家族成员。 HMMER采用全局序列相似性搜索,而FASSM依赖基序识别和匹配。此研究中生成的经过改进和更新的数据库3PFDB +提供了PFAM家庭的代表性序列和概况,其中13 519个家庭代表的家庭覆盖率超过90%。代表性序列也以二维图突出显示,该图反映了家庭成员之间的相对分歧。属于序列短的小家族的代表主要与覆盖率低有关。家族代表谱无法识别的序列集突出了PFAM中几种潜在的错误或弱家族关联。在这种情况下,部分结构域和片段以及与其他家族成员高度差异或不同的序列占主导地位。还预测了其中一些离群值具有不同的二级结构内容,这反映了这些域序列的不同假定结构或功能。数据库网址:http://caps.ncbs.res.in/3pfdbplus/

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号