Phyre 2 and I-Tasser web portal for Protein modeling, Prediction and Validation of gel Q and gel K genes from gellan gum producing bacterial strain Sphingomonas paucimobilis ATCC 31461

 

Manjusha CM1, Santhiagu A1*, Soumiya S1, Adarsh VK1, Jaya Prakash S2

1School of Biotechnology, National Institute of Technology, Calicut-673601, Kerala, India

2Department of Pharmaceutical Biotechnology, MNR College of Pharmacy, Sangareddy-502294

*Corresponding Author E-mail: manjuabijith@gmail.com, asanthiagu@nitc.ac.in

 

ABSTRACT:

Gellan gum, an anionic, high-molecular-weight, hetero exo-polysaccharide produced by Sphingomonas paucimobilis ATCC 31461 has potential applications in food and pharmaceutical industries, as a gelling agent, a highly-viscous biogum, a stabilizing agent etc. Three dimensional structure of a protein encoded by a gene could be useful to identify the function of the gene. This study investigates about the 3-D structure prediction of gel Q and gel K protein of Sphingomonas paucimobilis ATCC 31461 using two different protein modeling tools, Phyre2 and I-Tasser. Amplified gel Q and gel K genes were sequenced and the template protein structure identification was carried out using BLASTp with PDB. Superpositioning of the model with the template was analyzed with PyMol. Structure Validation servers of RAMPAGE, PROSA, Verify 3D, ERRAT and Qmean endorsed the 3D structure. The Phyre2 and I-Tasser model of gel Q protein showed best probability conformation with 94.3% and 80.7% residue respectively in the core region of Ramachandran plot showing greater accuracy of model prediction compared to gel K protein with 92.3% and 78.6% residue for Phyre2 and I-Tasser models respectively. Phyre2 server generated a finer prediction, analysis and validation for gel Q and gel K protein structure than the I-Tasser server. The results generated using Expasy tool also suggested that glycosyl transferase protein, encoded by the genes gel Q and gel K in the gel cluster may be directly involved in the fourth sugar addition of the repeating unit and in the incorporation of GlcA from UDP-glucuronic acid, into glucosyl-α-pyrophosphorylpolyprenol intermediate respectively. In order to develop a recombinant Sphingomonas paucimobilis ATCC 31461, gel Q and gel K gene might be a powerful successor for the overexpression of the gene to enhance gellan gum production.

 

KEYWORDS: Sphingomonas paucimobilis ATCC 31461; gel Q; gel K; Phyre2; I-Tasser; RAMPAGE.

 

 


INTRODUCTION:

Gellan gum, an anionic, high molecular mass polysaccharide is produced by the gram negative bacterium Sphingomonas paucimobilis (formerly known as Pseudomonas elodea). The biopolymer gellan consists of monosaccharides β-D-glucose, α-(1-4)-L-rhamnose and β-D-glucoronic acid in the molar ratio of 2:1:1 linked together forming a linear structure.1

 

 

The unique property of forming a thermoreversible gel when heated and cooled is the major characteristic of gellan, that makes it well known as a stabilizing, texturing, emulsifying, and gelling agent. Gellan plays a major role in ophthalmic preparations due to its viscous nature and in various pharmaceutical preparations.2 The organization of the gellan gene cluster is apparently similar to the sphingan S88 biosynthetic cluster present in Sphingomonas S88.3 There are 18 genes in the gel cluster which consists of gel I, K, Q, L, J, F, D, E, C, M, N, B, rml A, B, C, D and atr D, B. These genes synthesize dTDP-L-Rha, glycosyltransferases and proteins which are required for gellan export and polymerization.4,5 The glycosyltransferase involved in the assembly of the tetrasaccharide unit are believed to be encoded in the gel cluster by gel K and gel Q, required for the addition of GlcA from UDP-glucuronic acid, into glucosyl-α-pyrophosphorylpolyprenol intermediate and in the incorporation of the fourth sugar of the repeat unit.6

 

Three dimensional protein structures possess greater interest for the functional prediction of proteins, structure-based discovery of specific inhibitors or site-directed mutagenesis.7 The most favorable approach for predicting the structure of proteins involves the homolog detection of known three-dimensional (3D) structure, known as fold-recognition or template-based homology modeling. These methods rely on the fact that there will be a limited number of folds in nature and that many distinct remotely homologous protein sequences adopt exceptional similar structures.8 Homology modeling has many applications, such as determination of the function of proteins, virtual screening, rationalizing the effects of sequence variations etc.9 The four steps in building a homology model includes structural template identification, target sequence and template structure alignment, building model and quality evaluation of the model.7 Protein structure homology modeling depends on the evolutionary relationship between the template and target proteins.10

 

The most widely used web servers for protein modeling include Phyre2, I-Tasser, Swiss-Model, HHpred, PSI-BLAST–based secondary structure prediction (PSIPRED), Robetta, Raptor etc. Phyre 2 generates a 3D model of a protein sequence by four different methods which comprises of assembling homologous sequences, fold library scanning, loop modeling and multiple template modeling with poing and side-chain placement.11  I-Tasser suite pipeline comprises of four steps: identification of threading template, iterative structure assembly simulation, model selection and refinement, and the structure-based function annotation.12 In the present study, we attempt to harness Phyre2 and I-Tasser online servers to perform a comparative homology modeling of gel Q and gel K gene, a component of gellan gum biosynthetic gene cluster, for establishing the probable function of the gene.

 

MATERIALS AND METHODS:

Isolation of genomic DNA from Sphingomonas paucimobilis ATCC 31461:

Sphingomonas paucimobilis ATCC 31461 was purchased from ATCC and was maintained as pure culture in YPG agar slants. Genomic DNA was isolated from Sphingomonas paucimobilis ATCC-31461 using phenol-chloroform method.13 DNA concentration and purity were also determined.

 

PCR amplification:

The desired gene (gel K and gel Q) from genomic DNA was amplified by designing specific primers (Primer premier software tool) with restriction enzyme site based on the vector (pBBR122) used for cloning. Amplification was done by adding the following reagents: Template DNA, the two primers flanking the region to be amplified (gel K: Forward primer: 51-ACCCGAATTCATGGCAGAAGCGACCGAGG-31; Reverse primer: 51-ACCCCCATGGTCATCGCTTCGCCCCCCAT-31; gel Q: Forward primer: 51-ACCCGAATTCATGACCGACCAGACCCTGC-31; Reverse primer: 51-ACCCCCATGGTCACTTCTTGGCGGGATATC-31), nucleotides, buffer, and pfu DNA polymerase. PCR products were purified, sequenced and the gene sequences of gel K and Q were submitted to National Center for Biotechnology Information (NCBI) with accession numbers KY996484 and KY979104 respectively. The nucleotide sequences were converted to protein sequences using the online tool ‘ExPASy’ and the protein coded for gel Q and gel K were also determined.

 

Template Identification:

Protein structure prediction can be classified in to three categories: comparative modeling, threading, and ab initio folding. The comparative modeling and threading approaches build protein models by the alignment of query sequences onto solved template structures. When closely similar templates are identified, high-resolution models could be built by using template-based methods. If templates are absent from the Protein Data Bank library, the models require ab initio folding for building models, which is the most difficult category to predict protein-structures.14 The sequence similarities with several members within PDB database was performed with the protein BLAST program.15 The protein sequences of gel K and gel Q were submitted to p-BLAST for the template identification.

 

Homology Modeling:

Swiss-Model:

Swiss model, the automated comparative modeling of three-dimensional protein structures, 16 was used to generate a model of target protein. But the model obtained had templates with partial sequence length which illustrated that the model was inadequate. Therefore another homology, threading and ab initio protein modeling tools were selected namely Phyre2 and I-Tasser. Reliable homology modeling usually requires query sequence with at least 30% sequence identity with the template structure for each domain. Domain rearrangements and lack of domain structures decrease the effectiveness of homology modeling for the entire protein structure.17

Phyre 2:

The primary objective of Phyre2 which helps to predict and analyze the structure and functions of proteins includes providing a user-friendly interface to cutting-edge bioinformatics methods. Advanced facilities of phyre2 includes Backphyre, for searching a structure against a range of genomes, batch submission of a number of protein sequences for modeling, one-to-one threading facility of a user sequence onto a user structure, Phyrealarm, for the automatic weekly scan of proteins that are difficult to model and Phyre investigator for the in-depth analysis of model quality, function and mutations.11 The gel K and gel Q protein sequences were submitted to the Phyre2 server for generating the protein model and comparison was performed to determine the best and most suitable protein model for the two proteins.

 

I-Tasser:

I-Tasser, an integrated platform used for the automated protein structure and function prediction 18 creates the full length protein models by the excision of continuous fragments from threading alignments and reconstructs them further by using replica-exchanged Monte Carlo simulations.The structural quality estimation is performed by C-score or confidence score, TM-score and RMSD values19. The obtained phyre2 structures of gel K and gel Q proteins were regenerated using I-Tasser and compared. Secondary structure prediction was also performed using I -TASSER server.

 

Model Validation:

Loop regions of the predicted models of gel Q and gel K were refined using Modeller 9.V13. The stereo chemical quality and reliability of the models were evaluated with RAMPAGE20 by Ramachandran plot analysis. The best model was determined based on the number of residues in favoured region. The models were further analyzed by VERIFY 3D21, ERRAT22, ProSA23 and QMEAN score values.24 The protein was finally visualized with Pymol.

 

RESULT AND DISCUSSION:

Isolation of genomic DNA from Sphingomonas paucimobilis ATCC 31461:

The concentration of DNA isolated from Sphingomonas paucimobilis ATCC 31461 was 1160 μg/ml and the estimated purity of DNA was 1.89. The accepted ratio for pure DNA is generally ~1.8. 25 Using Expasy tool, the protein encoded by the genes gel Q and gel K determined were glycosyl transferase. Genomic DNA obtained was analysed using agarose gel electrophoresis and is given in Fig 1.

 

 

 

 

Fig 1.Genomic DNA of Sphingomonas paucimobilis ATCC 31461 (Lane 1: Genomic DNA; Lane 2: 1 kb DNA marker)

 

Primer design and PCR amplification:

Primer premier software tool 26  was used to design the forward and reverse primers for gel K and Q. The size of gel Q and K are 942 bp and 1047 bp respectively. The optimized PCR reaction conditions for gel Q gene amplification includes 32 cycles of strand denaturation, primer annealing  and primer extension at 95OC (2 min and 30 seconds), 55OC (45 seconds) and 72OC (13 min) respectively. The cycling protocol for gel K PCR amplification consisted of 32 cycles: strand denaturation at 95OC, primer annealing at 58OC and primer extension at 72OC, typically for 2 min and 30 seconds, 45 seconds, and 13 min respectively. Fig 2 and 3 represents PCR amplified DNA of gel Q and gel K respectively.  Amplified PCR product of gel Q and gel K were sequenced and the data are shown in Fig 4 and Fig 5 respectively.

 

Fig 2.  PCR amplified DNA of gel Q [Lane 1: PCR amplified gel Q (942 bp), Lane 2: DNA Marker (1 kb)]

 

Fig 3.  PCR amplified DNA of gel K [Lane 1: DNA Marker (1 kb),

Lane 2: PCR amplified gel K (1047 bp)]

 

>Gel.K_Gel.K.Forward_2696-3_P0339,Trimmed Sequence (948 bp) GTGCCTCGCAGCGTCCGGCGGGGGCCATCTGCGGCAGATCCCGGATCTGGAGTCGGTCTGGCGCGAACACGATTACTTCTTCGTCACCGAAGATACGGCCCTTGGCCGCAGTCTCGCGGAGAAACATCCGGTGGAGCTGGTCGGCCATTATGCGCTTGGCCAGGCCCGGCTGGGCCACCCCTTCAAGATGCTCGGCGGCGCCCTGCGCAACCTGCGGCAGAGCCTCGCCATCGTCCGCAGGCACAAGCCGGACGTCGTAATCTCGACAGGTGCCGGCGCGGTCTATTTCACCGCGCTGTTCGCCAAGCTGTTCGGCGCGAAGTTCATCCATATCGAAAGCTTCGCCCGCTTCGATCACCCCTCCGCCTTCGGCAAGATGGTGAAGGGCATCGCCACGATCTCGATCGTCCAGTCGCCGGCGCTGAAGCAGATCTGGCCCGACGCCGAACTTTTCGATCCGTTCCGGATGCTGGACACGCCGCGCCCGCCCAAGCAGGCGCTGACCTTCGCCACGGTGGGCGCCACCCTGCCCTTCCCGCGACTGGTGCAGGCGGTGCTCGACCTGAAGCGTGCGGGCGGCCTGCCGGGCAAGCTGATCCTGCAATATGGCGATCAGGCCCTGACCGATCCCGGCATCCCCGACGTCGAGATCCGCCCCACCATCCCGTTCGACGAATTGCAGCTGATGCTGCGCGACGCCGACATCGTGATCTGCCACGGCGGCACCGGCTCGCTGGTTACCGCGCTGCGTGCCGGCTGCCGGGTGATCGCCTTCCCGCGCCGCTTCGACCTCGGCGAACATTATGACGATCACCAGGAAGAGATCGCCCAGACCTTCGCCGACCGCGGCCTGCTCCAGGCGGTGCGCGACGAACGAGAACTCGGCGCGGCGGTCGCCGCCGCCAAAGGCGACGGAGCCCAGGCTCGCCACCACCGATCACACCGCGC

Fig 4. Sequence of amplified PCR product of gel Q in Fasta format

 

>Gel.K_Gel.K.Forward_2696-3_P0339,Trimmed Sequence (948 bp) GTGCCTCGCAGCGTCCGGCGGGGGCCATCTGCGGCAGATCCCGGATCTGGAGTCGGTCTGGCGCGAACACGATTACTTCTTCGTCACCGAAGATACGGCCCTTGGCCGCAGTCTCGCGGAGAAACATCCGGTGGAGCTGGTCGGCCATTATGCGCTTGGCCAGGCCCGGCTGGGCCACCCCTTCAAGATGCTCGGCGGCGCCCTGCGCAACCTGCGGCAGAGCCTCGCCATCGTCCGCAGGCACAAGCCGGACGTCGTAATCTCGACAGGTGCCGGCGCGGTCTATTTCACCGCGCTGTTCGCCAAGCTGTTCGGCGCGAAGTTCATCCATATCGAAAGCTTCGCCCGCTTCGATCACCCCTCCGCCTTCGGCAAGATGGTGAAGGGCATCGCCACGATCTCGATCGTCCAGTCGCCGGCGCTGAAGCAGATCTGGCCCGACGCCGAACTTTTCGATCCGTTCCGGATGCTGGACACGCCGCGCCCGCCCAAGCAGGCGCTGACCTTCGCCACGGTGGGCGCCACCCTGCCCTTCCCGCGACTGGTGCAGGCGGTGCTCGACCTGAAGCGTGCGGGCGGCCTGCCGGGCAAGCTGATCCTGCAATATGGCGATCAGGCCCTGACCGATCCCGGCATCCCCGACGTCGAGATCCGCCCCACCATCCCGTTCGACGAATTGCAGCTGATGCTGCGCGACGCCGACATCGTGATCTGCCACGGCGGCACCGGCTCGCTGGTTACCGCGCTGCGTGCCGGCTGCCGGGTGATCGCCTTCCCGCGCCGCTTCGACCTCGGCGAACATTATGACGATCACCAGGAAGAGATCGCCCAGACCTTCGCCGACCGCGGCCTGCTCCAGGCGGTGCGCGACGAACGAGAACTCGGCGCGGCGGTCGCCGCCGCCAAAGGCGACGGAGCCCAGGCTCGCCACCACCGATCACACCGCGC

Fig 5. Sequence of amplified PCR product of gel K in Fasta format

 

Template identification:

The pair-wise sequence alignment of gel Q protein identified the homolog of the query sequence with the template sequence of known PDB structure having a very poor identity of 23% with 35% query coverage to Chain A- native (Magnesium-containing) spsa from Bacillus subtilis; 22% identity with 35% query coverage to Chain A- crystal structure of a putative sugar phosphate isomerase/epimerase (ava4194) from Anabaena variabilis ATCC 29413 at 1.78 A Resolution and 36% identity with 14% query coverage to Chain A- crystal structure of methanol dehydrogenase from P. denitrificans. In the case of gel K protein a similar case of lesser identity and query coverage was obtained for all the queries. Some of such queries showing 27%, 27% and 28% poor identity and 43%, 43% and 32% query coverage are Chain A (Nmr solution structure of Alg13), Chain A (Nmr solution structure of Alg13: The sugar donor subunit of a yeast n-acetylglucosamine transferase-Northeast structural genomics consortium target Yg1) and Chain A (Crystal structure of CalG2, Calicheamicin Glycosyl transferase, TDP and calicheamicin T0 bound form) respectively. Hence at PDB due to the unavailability of suitable template with optimal query coverage and identity, two programs of phyre2 and I-Tasser with a combination of homology modeling, Ab Initio and threading methods were used to predict protein models. Models are always selected by heuristics to maximize both confidence and coverage of the query sequence 11. Fig 6 represents the results obtained using protein BLAST for gel Q and K protein sequences respectively.

 

Fig 6. p-BLAST result for gel Q (left) and gel K (right) protein sequences with partial query coverage

Homology modeling using Swiss Model:

The Swiss model for gel Q protein generated a total of 302 templates showing matches with the target sequence and the modeled structure named glycosyl transferase based on 5tzi.1.A with a resolution of 2.3Å showed only 16.7% identity covering 80–648 residues with a Q-mean Z-score of −3.44. In the case of gel K protein, 795 templates matched the target sequence and the modeled structure, EspG2 glycosyltransferase, 5du2.1.A, 2.7Å covering 105-419 residues with a Q-mean Z-score of −5.2 depicted only 19.42% identity. The protein sequences of gel Q and gel K submitted to Swiss-Model server generated an inadequate model as the templates possess a partial sequence length. Model reliability reduces as the sequence identity decreases. Template-target pairs sharing <50% sequence identity may pinpoint the requirement of manual adjustment of the protein sequence alignment.16 PyMol visualization of Gel Q and gel K protein models generated using Swiss-Model are depicted in Fig 7.

 

Fig 7. PyMol visualization of gel Q (left) and gel K (right) protein models generated using Swiss-Model server

 

Phyre2 modeling:

Modeling using Phyre2 server revealed that 201 and 258 residues have been modeled successfully with 100.0% confidence for gel Q and gel K proteins respectively by the single highest scoring template. For gel Q and gel K proteins, the predicted model attains a 100.00% confidence with 82% and 88% coverage and 14% and 19% identity to the templates c2ffuA (PDB header of transferase, Chain A, PDB molecule of polypeptide n-acetylgalactosaminyltransferase 2 and a PDB title of crystal structure of human ppgalnact-2 complexed with udp and ea2) and c3s2uA (PDB header of transferase, Chain A, PDB molecule of polypeptide udp- -n-acetylglucosamine--n-acetylmuramyl-pentapeptide and a PDB title of crystal structure of the Pseudomonas aeruginosa murg:udp-glcnac2 substrate complex) respectively . PyMol visualization of Gel Q and K protein models generated using Phyre2 server is depicted in Fig 8.

 

Fig 8. PyMol visualization of gel Q (left) and gel K (right) protein model generated using Phyre2 server

 

I-Tasser modeling:

Five models were predicted using I-Tasser. Among the five models, one model showed good estimated values of C-score, TM-score and RMSD which revealed it as the best model. The model showed a C-score value of 0.19 and -0.16 for gel Q and gel K respectively, which is well within the acceptable range of −5 to 2 and a TM score value of 0.74±0.11 and 0.69±0.12 for gel Q and gel K respectively, which is suitable within the acceptable range ie., TM-score greater than 0.5 depicted a model of ideal topology whereas TM-score lower than 0.17 revealed a random similarity in the predicted model. 19 RMSD for the first and best quality model of gel Q and gel K proteins generated using I-Tasser are 5.9±3.7Å and 6.9±4.1Å respectively which also came under the acceptable limit. The sequence based secondary structure prediction of gel Q and gel K proteins using I-Tasser analyzed the structural and functional aspects of the protein. The C-score reflected high confidence for the quality of secondary structure and showed a correlation with the TM-score and RMSD. TM-score and RMSD corroborated the quality of the predicted structure of gel Q and gel K proteins. Higher score meant more confident prediction of the secondary structure.27 PyMol visualization of gel Q and gel K protein models generated using I-Tasser server is depicted in Fig 9. The sequence-based prediction of secondary structure of gel Q and K proteins generated using I-Tasser are depicted in Fig 10 and Fig 11 respectively.

 

Fig 9. PyMol visualization of gel Q (left) and gel K (right) protein models generated using I-Tasser

 

 

 

 


Fig 10. The sequence-based prediction of secondary structure of gel Q protein generated using I-Tasser

 

Fig 11. The sequence-based prediction of secondary structure of gel K protein generated using I-Tasser

 


The protein models of gel Q and gel K generated using Phyre2 and I-Tasser showed very close similarity in structures and hence the models of Phyre2 and I-Tasser were superimposed with PyMol. The result suggested a successful protein modeling of gel Q and gel K proteins using Phyre2 and I-Tasser. Superimposed Phyre2 and I-Tasser protein models of gel Q and gel K protein models are shown in Fig 12.

 

 

Fig 12. Superimposed Phyre2 (green) and I-Tasser (red) protein models of gel Q (left) and gel K (right) protein models

Model validation:

RAMPAGE:

RAMPAGE analysis 28 revealed that gel Q protein model obtained from Phyre2 showed good probability conformation with 94.3% residue in the core region of Ramachandran plot, showing high accuracy of model prediction compared to I-Tasser (80.7%) whereas for gel K, phyre2 gave more amino acid residues in favoured region (92.3%) of Ramachandran plot compared to I-Tasser model (78.6%). The result proved that homology model of gel Q and gel K obtained for Phyre2 is a better model. Also the result illustrated that gel Q protein models were more accurate and reliable with respect to gel K protein models. The results obtained using RAMPAGE for Phyre2 and I-Tasser for gel Q and gel K proteins are depicted in Fig 13 and 14 respectively.

 


 

      

Fig 13: Ramachandran plot analysis for gel Q (Phyre2: left; I-Tasser : right)

 

    

 

Fig 14: Ramachandran plot analysis for gel K (Phyre2: left; I-Tasser: right)


PROSA:

In ProSA, Z-score indicates overall model quality and helps to measure the deviation of the total energy of the structure with respect to an energy distribution derived from random conformations. 23 The ProSA-web based validation analysis depicted all negative Z-score values. 29 In general, positive values will only correspond to problematic or erroneous part of a model. 30 The estimated Z-score obtained from ProSA for gel Q and gel K proteins were -5.1 and -8.03 and -4.19 and -6.03 for Phyre2 and I-Tasser respectively. 31The scores are well suited within the range of scores typically found for proteins of identical size illustrating a highly reliable structure. The energy plot showed the local model quality by plotting energies as the function of amino acid sequence position.30

 

Fig 15.  Z-score and energy plot for gel Q using ProSA [Phyre2 (left) and I-Tasser models (right) respectively]

 

 

Fig 16.  Z-score and energy plot for gel K using ProSA [Phyre2 (left) and I-Tasser models (right) respectively]

 

The residue energies consisting of pair energy, combined energy and surface energy were all negative and had identical surface energy tendency with template. 29 Z-score and energy plot obtained using PROSA for gel Q and gel K Phyre2 and I-Tasser models respectively were depicted in Fig 15 and Fig 16.

 

ERRAT:

ERRAT, the protein structure verification algorithm is used for evaluating the progress of crystallographic model building and refinement. Error values are plotted as a function of the position of a sliding 9-residue window. The error function is on the basis of the statistics of non-bonded atom-atom interactions in the reported structure.22 In the ERRAT histogram, the correct regions are depicted in black, and the incorrect regions are represented in grey color. Good models always score above 70% with ERRAT evaluation methods. 32 ERRAT analysis showed that the overall quality factor for gel Q Phyre2 and I-Tasser protein models were 88.53 and 58.23 respectively whereas in the case of gel K protein, the overall quality factor was 84.54 and 33.3 for Phyre2 and I-Tasser respectively. This suggested that the Phyre2 model is more reliable compared to I-Tasser with respect to both gel Q and gel K protein models. Also gel Q models showed better quality factor values compared to gel K models illustrating that gel Q protein model is finer and the best with respect to the other protein. The overall quality factor estimation using ERRAT for gel Q and gel K Phyre2 and I-Tasser protein models are depicted in Fig 17 and Fig 18 respectively.

 

Fig 17. The overall quality factor estimation using ERRAT for gel Q Phyre2 (left) and I-Tasser (right) protein models respectively

Fig 18. The overall quality factor estimation using ERRAT for gel K Phyre2 (left) and I-Tasser (right) protein models respectively

 

VERIFY 3D:

Verify3D analyzed the compatibility of a 3D atomic model with its own amino acid sequence. The accuracy of a 3D model may be assessed by its 3D profile, regardless of whether the model has been produced by NMR, X-ray or computational procedures. 33 In Verify 3D, 90.7% and 82.11%  79.89 % and % and 74.43 % of the residues of gel Q and gel K proteins had an averaged 3D-1D score >= 0.2 for Phyre2 and I-Tasser respectively. As there is no residue with the negative compatibility score, it can be predicted that the model is compatible with its sequence. 34

 

 

Fig 19. The Verify3D curve between residue numbers and 3-1 dimensions score of gel Q phyre2 and I-Tasser protein models respectively

The results suggested that Phyre2 is a powerful model compared to I-Tasser with respect to gel Q and gel K proteins. Also based on the Verify 3D results, gel Q protein may be demonstrated as a finer protein model compared to gel K protein. Fig 19 and 20 shows the Verify3D curve for the light-chain model between residue numbers and 3-1 dimensions score of gel Q and gel K phyre2 and I-Tasser protein models respectively.

 

 

Fig 20. The Verify3D curve between residue numbers and 3-1 dimensions score of gel K phyre2 and I-Tasser protein models respectively

 

 

Fig 21. Plot showing Z-score value for gel Q Phyre2 and I-Tasser models respectively

 

Q-MEAN SCORE:

QMEAN is a composite scoring function which can derive both global (for the entire structure) and local (per residue) absolute quality estimates based on one single model. The two global score values are QMEAN4 and QMEAN6. Both global scores are mainly in a range (0, 1). 35 QMEAN4 Value obtained for gel Q protein was -3.86 and -7.89 and for gel K protein was -5.59 and -9.41 for Phyre2 and I-Tasser respectively proving that Phyre2 model value which was closer to 1 was a better modeling tool compared to I-Tasser. 31 Plot showing Z-score value for gel Q and gel K Phyre2 and I-Tasser protein models were depicted in Fig 19 and Fig 20 respectively.

 

 

Fig 22. Plot showing Z-score value for gel K Phyre2 and I-Tasser models respectively

 

CONCLUSIONS:

Homology modeling plays a crucial role in the determinination of protein structure and to enable functional prediction. It is becoming more important due to its higher reliability and accuracy. The three dimensional structure prediction of gel Q and gel K proteins of Sphingomonas paucimobilis ATCC 31461 was performed using two protein modeling tools, Phyre2 and I-Tasser. Structure Validation servers of RAMPAGE, PROSA, Verify 3D, ERRAT and Q-mean score were used to confirm the reliability of the model. The Phyre2 and I-Tasser model of gel Q showed good stereochemical property with a probability conformation of 94.3% and 80.7% residue in the core region of Ramachandran plot demonstrating more accuracy of model prediction with respect to gel K protein with 94.2% residue for Phyre2 and 73.4% residue for I-Tasser models respectively. Homology modeling using Phyre2 may be concluded as a better modeling tool for gel Q and gel K proteins compared to I-Tasser. Also gel Q protein generated a finer protein model with respect to gel K protein. The results generated using Expasy tool also suggested that glycosyl transferase required for the tetrasaccharide unit assembly might be encoded in the gene gel K and gel Q of gel cluster, probably involved in the addition of GlcA from UDP-glucuronic acid, into glucosyl-α-pyrophosphorylpolyprenol intermediate and incorporating forth sugar of the repeating unit respectively. This study is the first report about the homology modeling of the protein that is encoded by gel Q and K genes of Sphingomonas paucimobilis ATCC 31461. The research work concluded that gel Q and gel K gene might be a better target to develop recombinant Sphingomonas paucimobilis ATCC 31461 by the overexpression of the genes to increase gellan gum production.

 

ACKNOWLEDGEMENT:

The authors are thankful for the financial support provided by the Kerala State Council for Science, Technology and Environment (KSCSTE), Govt. of Kerala.

 

REFERENCES:

1.       Ioannis Giavasis, Linda Harvey M and Brian McNeil. The effect of agitation and aeration on the synthesis and molecular weight of gellan in batch cultures of Sphingomonas paucimobilis. Enzyme and Microbial Technology. 38; 2006:101–108.

2.       Santhiagu Arockiasamy and Rathindra Mohan Banik. Optimization of gellan gum production by Sphingomonas paucimobilis ATCC 31461 with nonionic surfactants using Central Composite Design. Journal of Bioscience and Bioengineering. 105; 2008: 204–210.

3.       Paula Videira, Arsenio Fialho, Roberto Geremia A, Christelle Breton and Isabel SA. Biochemical characterization of theb-1,4-glucuronosyltransferase GelK inthe gellan gum-producing strain Sphingomonas paucimobilis A.T.C.C. 31461. Biochem. J. 358; 2001: 457-464.

4.       Sa-Correia, Fialho AM, Videira P, Moreira LM, Marques AR and Albano H. Gellan gum biosynthesis in Sphingomonas paucimobilis ATCC 31461: genes, enzymes and exopolysaccharide production engineering. Journal of Industrial Microbiology and Biotechnology. 29; 2002: 170-176.

5.       Harding NE, Patel YN and Coleman RJ. Organization of genes required for gellan polysaccharide biosynthesis in Sphingomonas elodea ATCC 31461. Journal of Industrial Microbiology and Biotechnology. 31; 2004:70-82.

6.       Arsenio Fialho M, Leonilde Moreira M, Ana Teresa Granja, Karen Hoffmann, Alma Popescu and Isabel Sa-Correia M. Seabra Pereira.  Biotechnology of the bacterial gellan gum: genes and enzymes of the biosynthetic pathway. M. Seabra Pereira (ed.), A Portrait of State-of-the-Art Research at the Technical University of Lisbon. 2007; pp.233–250.

7.       Konstantin Arnold, Lorenza Bordoli, Jurgen Kopp and Torsten Schwede. The Swiss-model workspace: a web-based environment for protein structure homology modeling. Structural Bioinformatics. 22; 2007: 195–201.

8.       Lawrence Kelley A and Michael Sternberg JE. Protein structure prediction on the Web: a case study using the Phyre server. Nature Protocols. 4; 2009: 363-371.

9.       Hillisch A. Utility of homology models in the drug discovery process. Drug Discov. Today. 9; 2004: 659–669.

10.     Lorenza Bordoli, Florian Kiefer, Konstantin Arnold, Pascal Benkert, James Battey and Torsten Schwede. Protein structure homology modeling using SWISS-MODEL workspace. Nature Protocols. 4; 2009: 1-13.

11.     Lawrence Kelley A, Stefans Mezulis, Christopher Yates M, Mark Wass N and Michael Sternberg JE. The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc. 10; 2015: 845–858.

12.     Yang J, Roy A and Zhang Y. Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics. 29; 2013: 2588–2595.

13.     Sambrook J and Russel DW. Molecular Cloning: A Laboratory Manual New York:  Cold Spring Harbor, 2001.

14.     Sitao Wu, Jeffrey Skolnick and Yang Zhang.  Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biology. 5; 2007: 1-10.

15.     Altschul SF, Gish W, Miller W, Myers E.W and Lipman DJ. Basic local alignment search tool. J Mol Biol.  215; 1990: 403-410.

16.     Schwede T, Kopp J, Guex N and Peitsch MC. SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res. 31; 2003: 3381-3385.

17.     Lei Xie and Philip E. Bourne Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets and Homology Models Functional and Structural Space. PLOS Computational biology. 1; 2005: 0222-0229.

18.     Ambrish Roy, Alper Kucukural and Yang Zhang.  I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 5; 2010: 725–738.

19.     Muhammad Ramzan Manwar Hussain, Noor Ahmad Shaik, Jumana Yousuf Al-Aama, HaniZ.Asfour, Fatima Subhani Khan, Tariq Ahmad Masoodi, MuhammadAkhtarKhan and Nazia Sultana Shaik. In silico analysis of Single Nucleotide Polymorphisms (SNPs) in human BRAF gene. Gene. 508; 2012: 188-196.

20.     Lovell SC, Davis IW, Arendall WB, De Bakker PIW, Word JM, Prisant MG, Richardson JS and Richardson DC. Structure validation by Calpha geometry: phi,psi and Cbeta deviation. Proteins. 50; 2002: 437-450.

21.     David Eisenberg, Roland Luethy and James Bowie U. VERIFY3D: Assessment of Protein Models with Three-Dimensional Profiles. Methods in Enzymology. 277; 1997: 396-404.

22.     Colovos C and Yeates TO. Verification of protein structures: Patterns of non-bonded atomic interactions. Protein Science. 2; 1993: 1511-1519.

23.     Markus Wiederstein and Manfred Sippl J. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Research. 35; 2007: W407–W410.

24.     Benkert, P. QMEAN server for protein model quality estimation. Nucleic Acids Res. 37; 2009: W510-W514.

25.     William Wilfinger W, Karol Mackey and Piotr Chomczynski. Effect of pH and Ionic Strength on the Spectrophotometric Assessment of Nucleic Acid Purity. BioTechniques. 22; 1997: 474-481.

26.     Kamel Abd Elsalam A. Bioinformatic tools and guideline for PCR primer design. African Journal of Biotechnology. 2; 2003: 91-95.

27.     Yang Zhang. I-TASSER: Fully automated protein structure prediction in CASP8. Proteins. 77; 2009: 100-113.

28.     Wei Wang, Minxuan Xia, Jie Chen, Fenni Deng, Rui Yuan and Xiaopei Zhangand Fafu Shen. Data set for phylogenetic tree and RAMPAGE Ramachandran plot analysis of SODs in Gossypium raimondii and G. arboretum. Data in Brief. 9; 2016: 345-348.

29.     Satish Kumar, Lingaraja Jena, Vidya Bhomale W and Sangeeta Daf. In silico structural prediction of E6 and E7 Proteins of Human Papillomavirus Strains by Comparative Modeling. Int. J. Bioautomation. 16; 2012: 101-110.

30.     Vinita Hooda, Prasada babu Gundala and Paramageetham Chinthala. Sequence analysis and homology modeling of peroxidase from Medicago sativa.  Bioinformation. 8; 2012: 974–979.

31.     Piyush Agrawal, Zoozeal Thakur and Mahesh Kulharia. Homology modeling and structural validation of tissue factor pathway inhibitor. Bioinformation. 9; 2013: 808–812.

32.     Hatem R, Pierre B and Elie E. Structural and functional analysis of the C-terminal STAS (sulfate transporter and anti-sigma antagonist) domain of the Arabidopsis thalianasulfate transporter SULTR. The Journal of Biological Chemistry. 280; 2005: 15976-15983.

33.     Elham Mahgoub O. and Ahmed Bolad. Correctness and accuracy of template-based modeled single chain fragment variable (scFv) protein anti-breast cancer cell line (MCF-7). Open Journal of Genetics. 3; 2013: 183-194.

34.     Siavoush Dastmalchi and Maryam Hamzeh Mivehrod. Molecular modeling of human aldehyde oxidase and identification of the key interactions in the enzyme-substrate complex. DARU Journal of Pharmaceutical Sciences. 13; 2005: 82-93.

35.     Studer G, Biasini M and Schwede T. Assessing the local structural quality of transmembrane protein models using statistical potentials (QMEANBrane). Bioinformatics. 30; 2014: i505-i511.

 

 

 

 

 

Received on 02.05.2018          Modified on 14.10.2018

Accepted on 30.11.2018        © RJPT All right reserved

Research J. Pharm. and Tech 2019; 12(1): 27-36.

DOI: 10.5958/0974-360X.2019.00006.4