In Silico Study of Secondary Structure of Hemoglobin Protein

 

Roma Chandra

Assistant Professor, Department of Biotechnology, IILM College of Engineering & Technology,

Greater Noida, Uttar Pradesh, India.

*Corresponding Author E-mail: roma.chandra@iilm.edu

 

ABSTRACT:

Protein structure prediction is one of the important goals in the area of bioinformatics and biotechnology. Prediction methods include structure prediction of both secondary and tertiary structures of protein. Protein secondary structure prediction infers knowledge related to presence of helixes, sheets and coils in a polypeptide chain whereas protein tertiary structure prediction infers knowledge related to three dimensional structures of proteins. Protein secondary structures represent the possible motifs or regular expressions represented as patterns that are predicted from primary protein sequence in the form of alpha helix, betastr and and coils. The secondary structure prediction is useful as it infers information related to the structure and function of unknown protein sequence. There are various secondary structure prediction methods used to predict about helixes, sheets and coils. Based on these methods there are various prediction tools under study. This study includes prediction of hemoglobin using various tools. The results produced inferred knowledge with reference to percentage of amino acids participating to produce helices, sheets and coils. PHD and DSC produced the best of the results out of all the tools used.

 

KEYWORDS: Protein, hemoglobin, secondary structure prediction.

 

 


INTRODUCTION:

Protein is a biomolecule which is an important dietary source which we as humans consume. Protein is a polypeptide made up of amino acids and is present in its three-dimensional arrangement in nature. Protein is studied in four different levels as primary, secondary, tertiary and quaternary which are mentioned as following:

 

Primary structure:

It is the first level of protein structure which contains amino acid sequence in the form of polypeptide chain. During protein biosynthesis amino acids are bound together with peptide bonds. Based on the nature of free groups at the extremities of the sequence the protein has two ends: carboxyl terminal (C-terminus) end and the amino terminal (N- terminus) end. Primary structure of any protein is determined from the gene from which it is translated.

 

As we know as per central dogma DNA transcribes to produce mRNA which further translates to produce protein. Primary structure of any protein can be studied from protein databases. The annotated information for any protein sequence can be retrieved from various protein databases like UNIPROT, PDB, etc. Primary sequences of proteins can be extracted from these databases in fasta format and can be used for secondary as well as tertiary structure prediction.

 

Secondary structure:

It is the second level of protein structure that is represented in the form of alpha helixes, beta sheets and turns or coils. Basically the backbone of any protein consists of structures that are in the form of helixes or sheets which are connected by the help of turns or coils. Thus, connections producing structures such as helix-helix, sheet-sheet and helix-sheet are seen in the backbone of protein structure. Secondary structure prediction of proteins provides information regarding presence of helix, sheet and coil that is which amino acid participates in specific type of secondary structure represented as H, S, C (helix, sheet and coil). Secondary structure prediction of any protein can be done using any secondary structure prediction methods like Chou Fasman, GOR, Artificial Neural Network, etc.

 

Tertiary structure :

Itis the third level of protein structure which represents the three-dimensional conformation of protein that includes arrangement of amino acids into helixes, sheets and turns with backbone structure arrangement based on psi, phi and omega angles. Ramachandran plot explains the possible allowed and disallowed regions for the amino acids that further participates to form the three-dimensional structure of protein. The polypeptide chain is a folded structure produced due to interactions between the R groups of participating amino acids. The possible interactions seen in tertiary structure includes hydrogen bonds, hydrophobic interactions, Wander wall interaction, disulphide bonds etc. Tertiary structure databases such as PDB provides annotated information regarding three dimensional structures of proteins. X-ray crystallography and NMR spectroscopy are the techniques used to predict the three dimensional of proteins. There are in silico methods that can also predict the three-dimensional protein structures. The methods includes Ab-initio method, Homology modeling and Threading.

 

Quaternary structure:

It is the fourth level of protein structure which represents multiple polypeptide chains connected to produce a single protein structure. Basicaly quaternary level represents number of polypeptide chains that are connected to each other like for example hemoglobin is a protein that is made up of four subunits with two alpha and two beta types.

 

Protein Secondary Structure Prediction

Secondary structure prediction1 is an important method used in the field of bioinformatics. Its main motive is to predict secondary structures of proteins based on their amino acid sequences. It provides the complete information of the amino acid sequence like alpha helices, beta strands or turns along with their parameters. The prediction process to search for helices, sheets and coils includes the following six methods:-

 

Chou Fasman Method:

Chou Fasman algorithm is extensively used to predict secondary structure of proteins. This method is based on an algorithm that calculates prediction values of each participant amino acid. Conformational parameter for each amino acid is calculated on the basis of specific position frequency of every amino acid present in given polypeptide chain. The conformational parameters are calculated for the 20 amino acids based on information collected from standard proteins and are represented as P(α) P (β) and P(turn) for helixes, sheets and coils. The algorithm includes various steps initialized by assigning relevant parameters to all the amino acid residues of the protein for which prediction needs to be done.In further steps combination of six residues is identified for helixes, five residues are sheets and four residues for turns. This method shows 50- 60% accuracy for secondary structure prediction.2

 

Nearest neighbor method:

Nearestneighbor method is also known as homologous method, memory-based method and exemplar-based method as it is based on a hypothesis that small length homologous sequences of polypeptide chain will represent similar secondary structures. This method uses structural databases for standard protein information. In this method small fragments are collected to prepare a sliding window. For every window the central amino acid residue is predicted for its secondary structure based on the rest of the residues from the training dataset. The same process is followed for prediction of other residues in the protein to be predicted.3

 

HMM (Hidden Markov model):

Hidden markov model is another method used for prediction of protein sequences based on markov model. The output producing probabilities to produce helix, sheet and coil are used while predicting the secondary structure of protein needed.4, 5

 

GOR (Garnier-Osguthorpe-Robson):

GOR is another secondary structure prediction method that is based on information theory. It can also predict the helix, β sheets, turn or random coils. The method is better for helix as compared to sheets because sheet depends on interactions with long range between two non-adjacent amino acid residues. In this method sliding window of 17 amino acid residues is used to predict the secondary structure of central residue for the polypeptide chain classifying amino acids into helixes, sheets and coils. The method shows 64% accuracy as being sheet, helix or coil.6, 7

 

Artificial Neural Network:

ANN is based on biological neural network and is used to predict secondary structure of proteins based on standard protein training datasets. ANN uses classification method to categorize amino acid residues into helixes sheets and coils. Information is given as primary protein sequence to the ANN tool which is predicted for the presence of helix, sheet and coil based on weight training and updation of output produced to predict the secondary structure of proteins. The method shows 63% accuracy as being sheet, helix or coil.8, 9

 

Self-optimized prediction method (SOPMA):

SOPMA is a secondary structure prediction method based on predicting helixes, sheets and coils on multiple alignments using self optimization method. The method shows 63% accuracy as being sheet, helix or coil.10


 

Protein Secondary Structure Prediction Tools:

There are various tools based on secondary structure prediction method that includes the following:

· AGADIR (http://agadir.crg.es/)

· APSSP (http://crdd.osdd.net/raghava/apssp/)

· CFSSP(http://www.biogem.org/tool/chou-fasman/)

· GOR (https://npsa-prabiR.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_gor4.html)

· HHPRED(http://toolkit.tuebingen.mpg.de/hhpred)

· JPRED (http://www.compbio.dundee.ac.uk/www-jpred/)

· PROF (https://www.aber.ac.uk/~phiwww/prof/)

· PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/)

· SOPMA (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html)

· STRAP (http://www.bioinformatics.org/strap/Scripting.html)

· TOPMATCH (https://bio.tools/topmatch)

· SPIDER2(http://sparks-lab.org/yueyang/server/SPIDER2/)

· SYMPRED (http://www.ibi.vu.nl/programs/sympredwww/)

· YASSPP(http://glaros.dtc.umn.edu/yasspp/)

· PSSPRED (http://bioinf.cs.ucl.ac.uk/psipred/)

· FRAG1D(http://frag1d.bioshu.se/)

· SPIDER2(http://sparks-lab.org/yueyang/server/SPIDER2/)

· RAPTORX-SS8(http://raptorx.uchicago.edu/)

 

MATERIALS AND METHODOLOGY:

The main aim of this research work is the comparative analysis of various secondary structure prediction tools. Primary sequence for hemoglobin used to study the secondary structure results. For this prediction analysis Hemoglobin subunit gamma-2 protein sequence was retrieved from UNIPROT database in FASTA file format (https://www.uniprot.org/uniprot/P69892.fasta). The tools used for comparative analysis includes


·       CFSSP

·       GOR

·       PHD

·       SOPMA

·       DSC

·       MLRC


 

RESULTS:

Secondary structure was predicted using hemoglobin sequence taken from the UNIPROT database. Prediction tools produced results that are represented in percentage form for the percentage of amino acids converted into helices, sheets and coils. Comparative analysis of all the following tools revealed that amino acids participating in hemoglobin tend to produce helices greater then sheets or coils. Comparative analysis of the results also reveals that these tools tend to produce variant results out of which PHD and DSC tools were close enough to the actual protein secondary structure studied from protein data bank.

 

The results of the six tools under study are mentioned from Fig. 1.1 to Fig 1.6.as following:

 

Figure 1.1 : Result of CFSSP

 

Figure 1.2 : Result of DSC

 

Figure 1.3 : Result of GOR4

 

Figure 1.4 : Result of MLRC

 

Figure 1.5 : Result of PHD

 

Figure 1.6 : Result of SOPMA

 

Table 1: Comparative analysis of prediction tools used

Name

GOR4

PHD

SOPMA

DSC

CFSSP

MLRC

Helix

27.89%

75.51%

66.67%

75.51%

83.7%

58.50%

Sheets

28.57%

2.04%

7.48%

0%

59.2%

2.72%

Coils

43.54%

22.45%

19.73%

24.49%

12.2%

38.78%

 


CONCLUSION:

The results produced using CFSSP, GOR, PHD, SOPMA, DSC, MLRC tools were different and variation in results was seen with respect to helices, strands and coils. Out of all these tools PHD and DSC had predicted helixes to be approximately 75%. Literature studies reveal that almost 75% of all the amino acids are participating in the formation of helical structures in hemoglobin. Thus, we conclude had predicted the best of the results. The same will be verified after predicting the three-dimensional structure of hemoglobin and comparing the same from tertiary databases. Future work will include study of three-dimensional structures of hemoglobin and possible predicted helices, sheets and coils from the secondary structure.

 

REFERENCES:

1.      Rost B, Sander C, Schneider R. Redefining the goals of protein secondary structure prediction. J Mol Biol. 1994; 235:13–26.

2.      Hang Chen, Fei Gu and Zhengge Huang Improved Chou-Fasman method for protein secondary structure prediction.BMC Bioinformatics.2006, 7(Suppl 4):S14.

3.      Salzberg, S. and Cost, S. Predicting Protein Secondary Structure with a Nearest-neighbor Algorithm. Journal of Molecular Biology.1992; 227:371–374.

4.      Martin J, Gibrat JF, Rodolphe F. Analysis of an optimal hidden markov model for secondary structure prediction. BMC Struct Biol. 2006; 6:25. 30.

5.      Won KJ, Hamelryck T, Prügel-Bennett A, Krogh A. An evolutionary method for learning HMM structure: prediction of protein secondary structure. BMC Bioinformatics. 2007; 8:357.

6.      Garnier J, Gibrat JF, Robson B. GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol. 1996; 266:540–53.

7.      Sen TZ, Jernigan RL, Garnier J, Kloczkowski A, GOR V. server for protein secondary structure prediction. Bioinformatics. 2005; 21:2787–8.

8.      Lin K, Simossis VA, Taylor WR, Heringa J. A simple and fast secondary structure prediction method using hidden neural networks.Bioinformatics. 2005; 21:152–9. 29.

9.      Cuff, J. A., Clamp, M. E., and Barton, G. J.JPred: A consensus secondary structure prediction server. Bioinformatics .1998; 14:892–893.

10.   Geourjon C., Deléage G. SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple sequences Comput. Appl. Biosci.1995; 11: 681-684.

 

 

 

 

Received on 03.03.2020            Modified on 26.09.2020

Accepted on 13.12.2020           © RJPT All right reserved

Research J. Pharm. and Tech 2021; 14(12):6245-6249.

DOI: 10.52711/0974-360X.2021.01080