Homological Modeling
المؤلف:
S. Dalal, S. Balasubramanian, and L. Regan
المصدر:
Nat. Struct. Biol. 4, 548–552
الجزء والصفحة:
18-5-2016
2429
Homological Modeling
Proteins that have similar amino acid sequences, or primary structures, adopt the same fold, or conformation of the polypeptide backbone, similar tertiary structures. In other words, the relationship of the three-dimensional protein structure to the amino acid sequence does not have a one-to-one correspondence, but one-to-many. Therefore, when the 3-D structure of one of the proteins encoded by a gene family is known, it can be assumed that all of the other homologous protein members of the family adopt essentially the same fold. This is a strictly empirical observation that has held valid for natural proteins, and it provides the basis for homological modeling. It is not easy to give a precise numerical threshold, but two proteins with sequences that are identical at 30% or more of the amino acid residues over a span of more than 100 residues, are almost certainly homologous to each other and belong to the same family (1). Again, this is an empirical rule deduced from natural proteins, which have evolved from a common evolutionary ancestor by the accumulation of individual mutations, and it need not be applicable to de novo designed proteins. In fact, if the sequence is deliberately designed in one step, the fold of a protein can be converted from totally alpha-helical to beta-sheet while keeping the sequences 50% identical (2). There is, however, no guarantee that de novo-designed proteins will fold to a unique 3-D structure.
Homological modeling begins with alignment of a query sequence against the sequence of a protein homologue of known structure. The sequence alignment is best performed by a mathematical technique called dynamic programming. The two sequences aligned may contain insertions or deletions (indels) here and there, shown as gaps in one of the sequences. As the sequence similarity decreases, the number of gaps increases, and the entire alignment becomes less certain. If the structure is modeled according to an incorrect alignment, the resulting model will also be incorrect. Thus, a sequence identity of 40 to 50% or more is usually required for accurate homological modeling. Given the sequence alignment, the query amino acid sequence is mounted onto the known structure, which supplies a template backbone, and the necessary amino-acid side chains are replaced according to the sequence alignment.
Once a proper protein of known structure has been found for a query sequence, the main problems of homological modeling are twofold. The first is to fill in any missing polypeptide backbone by generating an appropriate loop structure. The other is to put all of the new side chains into the correct orientation. If the query structure has residues inserted, there is no template for that part of the sequence. The procedure for generating loops should generate additional polypeptide backbone that joins its two termini smoothly to the template structure and also has an energetically favorable conformation. Indels generally occur at the protein surface, so there are few interactions or steric hindrance to guide the structural design. The orientations of the new side chains of interior residues are determined by the packing of all of the atoms within the protein interior. A simple way to incorporate the new side chains is to adjust their conformation against the fixed conformations of nonsubstituted side chains and the polypeptide backbone. The conformations can also be selected from those observed most frequently in known protein structures, collected in “rotamer libraries,” and from those calculated to have the most favorable energies. A “dead-end elimination” algorithm is more advanced (3). More automatically, the simulated annealing method (4) can be applied to the entire model structure, allowing even the backbone conformation to vary, while seeking the energetically most stable and optimum conformation as a whole. Several computer packages for homological modeling are commercially available.
References
1. C. Sander and R. Schneider (1991) Proteins: Struct Function Genet. 9, 56–68.
2. S. Dalal, S. Balasubramanian, and L. Regan (1997) Nat. Struct. Biol. 4, 548–552.
3. A. Desmet, M. D. Maeyer, B. Hazes, and I. Lasters (1992) Nature 356, 539–542.
4. A. Sali and T. L. Blundell (1990) J. Mol. Biol. 212, 403–428.
الاكثر قراءة في مواضيع عامة في الاحياء الجزيئي
اخر الاخبار
اخبار العتبة العباسية المقدسة