FCCC LOGO Faculty Publications
Ting D , Wang GL , Shapovalov M , Mitra R , Jordan MI , Dunbrack RL
Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model
Plos Computational Biology. 2010 Apr;6(4) :e1000763
PMID: ISI:000278125300038    PMCID: PMCID: PMC2861699    URL: http://www.ncbi.nlm.nih.gov/pubmed/20442867
Back to previous list
Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion ( resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data ( e. g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids ( with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp.
Ting, Daniel Wang, Guoli Shapovalov, Maxim Mitra, Rajib Jordan, Michael I. Dunbrack, Roland L., Jr. Nih [p20 gm76222, r01 gm84453] This work was supported by grants NIH P20 GM76222 (R. Dunbrack, PI) and NIH R01 GM84453 (R. Dunbrack, PI). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. 53 Public library science; 185 berry st, ste 1300, san francisco, ca 94107 usa 602ga