Bio Python Application for Comparative Analysis of COVID-19 Virus Genome with Other Calamitous Genome

 

Narendra Kumar Dewangan1, RPS Chauhan2, Naveen Jain3

1Dept. of CSE(AI&ML), SSIPMT Raipur, India.

2Dept. of CSE(AI&ML), SSIPMT Raipur, India.

3Dept. of Mechanical Engg., SSIPMT Raipur, India.

*Corresponding Author E-mail: narendra.nic@gmail.com, chauhanrudra72@gmail.com, naveenjainbit@gmail.com

 

ABSTRACT:

Since last decade mankind has been adversely affected due to prominent viruses such as Ebola, SARS-CoV and MERS. Recently COVID-19 has created a worldwide pandemic situation resulting in huge losses of human lifes. To safeguard humans and prevent economical losses the deep understanding of genetic structure is essential for developing proper medicine/vaccination.The present workanalyses and compare the genomic structure of COVID-19 virus with other calamitous viruses’trough pairwise comparison of local and global alignment of DNA sequences, measuring length of DNA sequence, Hamming Distance, GC content. Further Bio python has been used for the study and the results have been presented in the form of 3D structures and Dot plots. The result helps in understanding the virus relationships.It has been observed that COVID-19 and SARS have an 89% similarity which means both are the same genus and belong to the same family, COVID-19 and MERS have a 71% similarity, COVID-19 and Ebola have a 58% similarity, COVID-19 and HIV have a 61% similarity while COVID-19 and swine flu have a 62% similarity content. Since, Ebola, HIV, and swine flu have less percentage of similarity with COVID-19 they belong to a different family of viruses. This research emphasis on finding the similarities among the viruses and helps the scientist to develop appropriate medicine.

 

KEYWORDS:  COVID-19, Pairwise Alignment, Dot plot, GC contents, Hamming Distance, Bio Python.

 

 


1. INTRODUCTION: 

COVID-19 virus spreads through large respiratory droplets and objects/surfaces contaminated by virus through cough, sneeze, speak, sing or breathe. Generally, the virus growth shows mild symptoms like fever, fatigue, dyspnoea, cough, etc. For elderly people the virus causes pneumonia, Acute Respiratory Disease Syndrome (ARDS), resulting in multi organ failure, hindering breathing ability of the patient and finally leading to ventilator support and sometimes to death.1

 

The virus spreads COVID-19 virus can be prevented by practicing social distancing, wearing mask and using sanitizer. However, another method of prevention of spread of COVID 19 virus is through development of a vaccination which not only prevents the spreading of virus but also proves to be a permanent and reliable solution against the virus spread. However, to developed an effective vaccine against any virus, the foremost requirement is a thorough knowledge of the characteristic of that virus and requires a study of genomic analysis. Genomic analysis of the virus under consideration produces maximum number of similarity content among the viruses and helps in determining the identical characteristics of the viruses2.

 

Further, with the help of genomic analysis, the virus originated from bat i.e. SARS, MERS and CoV-1 were found to be phylogenetically a like in characteristics as that of SARS-CoV-2 virus Based on this study it has been confirmed that bats are primary source of these viruses 3.

To address this issues this paper uses Bio Python an open source tool for DNA sequence analysis for determining the sequence of individual genes, full chromosomes 4.

 

DNA sequencing has immersed as a promising and efficient way for RNA and protein sequencing. An efficient DNA sequence analysis requires clear understanding of the source of the data and experimental methods used to perform the analysis5.

 

Analytic strategies based on the genomic, transcriptomic or proteomic sequence need to followed for sequencing. Warehousing of enormous data from the databases on these biomolecules needs to be verified first in order to determine the presence of similarity between the sequences and carry out functional investigations 6.

 

To understand the relation of protein to genome mapping DNA, RNA and protein, sequence analysis are performed with the help of Bio python 7.

 

Here, Bio python is used to read the DNA FASTA sequence data from the dataset and extract the DNA sequence from the FASTA file to disseminate the information of biological data[8].

 

Bio python is also used to determine the DNA sequence length of each virus under consideration, calculating the GC content for finding the heat stability of different viruses, calculating the hamming distance to quantifying the similarity of DNA sequences among the different viruses.The objective of this research is to find the similarity of DNA and protein sequences among the viruses for discovering appropriate medicine. In this work a graphical analysis in the form of individual protein analysis and pairwise local and global alignment of DNA sequences is performed. Further, translation of DNA sequences in to protein sequences for understanding the function necessary for life. Computation of hamming distance between two sequences is used to quantifying the similarity among all the viruses. To identify percentage of nitrogen bases and the heat stability of virus under consideration GC contentment is calculated higher the GC content more stable is the virus with respect to temperature. Similarity characteristics of the viruses can also be supported by obtaining a dot plot of COVID-19 virus with another virus. A diagonal line on dot plot shows that the two viruses’ genome will have maximum number of similar characteristics. The proposed analysis shows that the COVID-19 virus has high degree of similar characteristics and alike nature with SARS and MERS viruses which shows that these viruses belong to same family. When the likeness achieves a higher percentage, we can surmise the purpose of the function with some reliability, this is verified with experimental observation that COVID-19 and SARS has maximum percentage of global alignment of DNA sequence and minimum value of hamming distance that refers to the corresponding position the two strings are differ. A comparative study has been done among the pairs of sequences to instinctively explore alike sequences which share the same symbols in an identical direction. Here, the objective is to contemplate the roots of inconsistency between pairs of sequences that share a collective predecessor9,10.

 

Any similarities in the characteristics and behaviour of genome sequences will help in finding the appropriate drug assessment and vaccine preparation. This paper can be presented with the discussion of following section. Section 2 shows the literature review to define the objective of research, proposed methodology discussed in section 3, application of proposed analysis is discussed in section 4, section 5 shows the result and discussion, section 6 shows the conclusion and section 7 shows the future scope for further research.

 

2. LITERATURE SURVEY:

A well-known fact about viruses is that their germ cannot duplicate or repeat itself without a host cell. It has been noticed that, when viruses come in contact with a host cell, it injects their composition genetically into the targeted host and have taken complete control over its functionality regarding cell division and multiplication. Bio-medical fields require informative data in the form of DNA classification, virus pattern disseminations pair wise sequence analysis, finding the GC content and calculating hamming distance. These data are more useful for emerging logical classifications for medical analysis. To find the remedy for viral instinctive diseases this medical analysis helps to represent the relationships among different viruses. In the following section, we review the various methods of DNA sequence analysis presented by the previous authors.

 

A DNA sequencing method for next-generation. This method detects the DNA sequence patterns of various viruses that adversely affect mankind for decades. The outcome of his research is that the DNA sequence analysis helps medical practitioners to finding medical disorders, It is essential to express the biological sequences which are under observation. Frequently, we choose DNA and protein sequences for sequence alignment. For sequence explanation protein sequences play a very important role, i.e., contributing utility of sequences and chunks of sequences, on the other hand, alignment of DNA sequence represents annotation as well as the application in the area of phylogenetic analysis. Sequence alignments are broadly classified as global and local alignments 11.

Further, he suggested that statistical interpretation plays a major outcome in calculation of the DNA sequence, Bio Python contains a particular module named Bio.pairwise2 which is used to implement the algorithms of dynamic programming. When performing the pair wise alignment by Bio.pairwise2 12,13 module, some functions are existing such as ‘global’ and ‘local’, subjective to the nature of alignment proposed that the Illumina Genome Analyzer (IGA) is a well-recognized platform for genetic analysis and functional genomics. High output sequencing is possible through a mathematical and probabilistic model of IGA for providing efficient information from datasets having complex in nature 14.

 

Proposed a new approach for the calculation of frequency countenance of gene, local and global pair wise alignments of DNA sequences to extract meaningful insights from the genome data15,16.

Introduced the human genome project in the direction to influence the DNA sequencing method 17.

 

Meta genomic investigation of protein content in amino acid sequence by investigation of SARS-CoV-2 genomes. The result of the analysis shows that the resultant mutations have significant inference’s physical structure 18.

 

Structural analysis, simulation method for identifying appropriate drug against COVID-19 and SARS virus. This comparative study of COVID-19 and SARS virus amino acid sequence analysis presents a useful          solution 19.

 

As the outcome of literature review it is found that pair wise local and global alignment of DNA and protein sequences of viruses provides good insights to define the similar characteristics and behaviour of the viruses. GC content shows the stability of virus with respect to temperature. Calculation of other parameters such as length of DNA sequences, hamming distance and visual representation of dot plot also find suitable to analyse the characteristics and behaviour which helps us to discover appropriate medicine against these viruses. Hence there is a need of a technique which can handle voluminous data and can perform relevant analysis producing efficient and accurate results. Bio-python proves to be promising solution to this. Hence in this work bio python has been applied for sequence analysis to show the similarities in the characteristics and behaviour of genome sequences of various viruses. The similarities have been found out by calculating pair wise local and global alignment, GC content and Hamming distance which helps to facilitate in finding the appropriate drug assessment and vaccine preparation.

 

3. PROPOSED METHODOLOGY:

This work has been carried out in Jupyter notebook. Bio informatics mainly uses Bio python an open source python library used to perform sequence analysis, pairwise2 alignment of DNA sequences, calculation of hamming distance, GC content and displaying dot plot, 3D structures of viruses under consideration20.

 

Bio python seq function is used to obtain DNA, mRNA, protein sequence alignment between the viruses by arranging sequences in specific manner to identify close similarity between them21.

 

A meaningful insight from the sequence analysis such as similar region in the DNA sequences of the viruses shows relative information between the virus species, genetically closeness and how the species are evolved. DNA sequence from FASTA file is obtained by reading the FASTA file with help of SeqIO function of Bio python for this we need to import the function with help of “from Bio import SeqIO” Bio python command 21.

 

Pairwise2 alignment is calculated by using Bio python command from “Bio import pairwise2” and “from Bio.pairwise2 import format_alignment”. Bio python len() function is used to calculate the length of each sequence. 22

 

To find the heat stability of virus GC contents are calculated with “from Bio.SeqUtils import GC” Bio python function. Bio.AlignIO is a Bio python module is used for reading and writing DNA sequence 23.

 

3.1 Algorithm of Proposed Methodology:

Input: FASTA file of COVID-19 and other Calamitous viruses.

Output: Display DNA Sequence, Protein Sequence, Hamming Distance, Length of Sequences, GC contents3D structure.

The following are the steps used in this Experimental Analysis of the Proposed method:

1)    From Bio python SeqIO.read function read FASTA file of different viruses.

2)    Use the covid.seq, sars.seq, mers.seq, ebola.seq, hiv.seq, swine_flu.seq function to get the sequence of all the viruses

3)    Perform pair wise alignment between viruses using pairwise2.align.globalxx and pairwise2.align.localxx function to find the similarity percentage.

4)    Calculate the hamming distance between two sequences of COVID-19 and other viruses

5)    Calculate the length of the sequence of each virus using Bio python len() function.

6)    Calculate the GC content to find out which virus is more heat stable.

7)    Translate each DNA sequence into protein using Bio python translate() function.

8)    Calculate frequency count for Amino Acids in the protein sequence

9)    Distribute the protein sequence of each virus and obtain the dot plot to find the similarity.

10) Display the 3-D structure of each virus by using nglview library function.

 

The complete flow diagram of comparative sequence analysis is depicted below in figure 1.DNA sequence in the FASTA format is loaded first and then we go for genome alignment-based analysis 24.

 

 In this proposed method first read the data from the NCBI machine learning repository with the help of Bio python SeqIO. read function, extracting the DNA sequence from the FASTA file with the help seq() function, calculating the length of DNA sequence, finding the heat stability with the help of GC content, calculating the hamming distance to find the error or mismatch between the two sequences. Translation from DNA to protein is performed with Bio python translate function. Graphical analysis is carried out with individual protein analysis and pairwise analysis of virus sequences. This analysis helps to find more insights/observations in terms of mathematical data and visual representation of dot plots and 3D. structure of different viruses.

 

 

Figure 1: Flow chart to perform comparative sequence analysis.

 

 

3.2 The Sequence Alignment:

The analysis of sequence alignment of DNA and Protein of COVID-19, MERS, SARS, EBOLA, HIV, and Swine Flu Virus is presented with a flow chart in figure2 28.

 

If the comparative global and local protein sequence alignmentshavea maximum number of similarities then the viruses belong to the same family.

 

 

Figure 2: Flowchart of sequence alignment of DNA and Protein of COVID-19, MERS, SARS, EBOLA, HIV, and Swine Flu Viruses

 

3.3 Comparing Sequences and Sequence Alignment of various viruses:

Changes in one or more than one nucleotide are obtained by a biological mutation 25. However, there are also addition and removal of nucleotides which leads the comparisons of sequences critical, because we do not compare the point of the sequence by point. This difficulty can be overcome by reordering the alignment of sequence. The method used for the alignment of sequences generally used to represent typographical characters, usually presented by gaps as a shared characters between two characters of the sequence 26.

 

Here, sequences have represented by a maximum number of identical characters, after an appropriate procedure of sequence alignment 27.

 

Comparison between all the characters of sequences is achieved by performing pairwise2 global alignment on the other hand local alignment is used to find the relation between sub-sequences which are align. Protein and DNA sequences global and local alignment are illustrated in figure 3, here hyphen symbols are generally used to represent gaps between two characters.

 

Figure 3: Global and Local alignment of Protein and DNA sequences

 

3.4 Visual Alignments with Dot Plots:

In Bio Python dot plot is a useful tool for visual alignments of two sequences. Dot plots explore the visual representation of observed sequences by representing one sequence on x-axis and the other sequence on y-axis, it highlights the area with the diagonal line when the similarity percentage between two sequences are high. The flow diagram of visual alignment all the viruses with respect to COVID-19 virus is illustrated in figure 414.

 

Visual alignment between the virus sequences is generally performed by dot plot to determine the similarity among the viruses. Presence of diagonal line on dot plot shows that two sequences are alike in nature and have common characteristics 29.

 

 

Figure 4: Flowchart for visual alignment all the viruses with respect to COVID-19 virus.

 

A dot plot is the simplest method used for graphical representation of two sequences. It has been observed that in a dot plot when we find that the residues of two sequences are matched on a same position on the plot, the respective position is filled by the dot.Following figure illustrates the dot plot between two sequences.

 

Figure 5: Dot plot between Squence1 and Sequence2

 

3.5 Hamming distance:

Hamming distance is defined as number of positions the respective symbols in a string of same lengths are differ 15.

 

It is also expressed as measure of minute changes required to transform one string to another. Error detection and error correction are the useful parameters to quantifyingthe similarity of DNA sequences, lowering the value of hamming distance, higher is the percentage of similarity of DNA sequences. Since, hamming distance between COVID-19 and SARS is found to be 30 which is less as compared to others shown in Table - 4. Hence, it is clear that the DNA sequences of COVID-19 and SARS have a high degree of similarity.

 

3.6 GC Contents in DNA:

GC-content represents the number of nitrogenous bases is present in DNA and RNA molecules these molecules are called guanine (G) or cytosine (C) in DNA and adenine (A) and Thymine (T) in RNA.The importance of GC content is used to determine the annealing temperature of DNA template and it is significantly used in polymerase chain reaction.

High percentage of GC-content represents a usually high melting temperature 30.

This shows that DNA having less percentage of GCcontent is less stable as compare to DNA with high percentage of GCcontent 31.

 

The percentage of GC-content is obtained by following formula as:                     

                     Guanine + Cytosine

 ------------------------------------------------------x 100%

Adenine + Thymine + Guanine + Cytosine

 

Whereas, Adenine-Thymine/Guanine-Cytosine ratio is calculated as:

 

 

4. APPLICATION:

This analysis helps to determine a numeric and visual representation among all the viruses consider here for this research 32.

 

Any similarities in the characteristics and behaviour of genome sequences will help us to facilitate finding the appropriate drug assessment and vaccine preparation 33

 

5. RESULT AND DISCUSSION:

Illustrated in figure 6. The end of the protein sequence is represented by using three-stop codon marks. The beginning of a protein is identified on codon table by one start mark called “AUGwhich corresponds to the amino acid methionine 28.

 

The primary objective of this work is to perform a comparative analysis using Bio python an open-source python library. Bio python supports various functions and libraries useful to perform computational bioinformatics analysis of genome. A complete workflow and the analysis of the virus sequences are clearly illustrated with the algorithm presented in section 3. To establish a relationship between DNA sequence and protein sequence, DNA sequence is transform into protein sequence by using Bio python translate() function. The standard codon table representing sequence in the form of trinucleotide of DNA also refers as amino acid34.

 

Figure 6: DNA Standard Codon Table

 

 

Length of protein sequence is calculated in Table 1 to determine the presence of amino acid sequence of protein which offer relative information regarding genetic variants leads to certain disease. Here, base pairs is multiplied with the adjoining base pairs distance to find the length of protein sequence 35.

 

 

 

 

Table 1: Protein sequence length of different viruses

Virus used for analysis

Corresponding Length of Virus

COVID-19

29903

SARS

29751

MERS

30119

Ebola

18959

HIV

9181

Swine Flu

982

 

Stability of virus with respect to temperature is defined by calculating the percentage of GC content. GC content value in percentage is shown in Table2. Swine Flu is more stable having a GC content of 47.04 followed by HIV while COVID-19 is least stable having a lowest GC content of 37.97.

 

Table 2: Percentage value of GC content of different viruses

Virus used for analysis

Percentage value of GC content

COVID-19_Seq

37.97

SARS_Seq

40.76

MERS_Seq

41.23

Ebola_Seq

41.07

HIV_Seq

42.11

Swine Flu_seq

47.04

 

Figure 7: Pairwise2 global sequence alignment of COVID-19 DNA Sequence with different virus DNA sequences.

 

Above figure 7 illustrate the comparative global and local protein sequence alignments of SARS MERS, Ebola, HIV, and Swine Flu with COVID-19viruses. It has been observed that COVID-19 and SARS have 51 similar characters, COVID-19 and MERS have 41 similar characters, COVID-19 and Swine Flu have 39 similar characters, COVID-19 and HIV have 33 similar characters and COVID-19 and Ebola have 32 similar characters in the protein sequence. Hence, COVID-19 and the SARS virus have belonged to the same family.

The pairwise global alignment percentage of the DNA sequence of various viruses is depicted below in the Table 3. Pairwise DNA sequence alignment is applicable to determine the areas of regularity happening between sequence arrangements which can be nucleic acid or protein 36.

 

Global alignment refers to alignment of all the characters of the entire sequences which contains alphabeticalcharacters in both inquiry and quarry level sequence37.

 

COVID-19 and SARS pairs has the maximum 89 percentage of pairwise global alignment due to same family.

 

Table 3: Global pairwise2 alignment percentage value of virus pairs used for the analysis

Virus pairs used for analysis

Pairwise Global Alignment percentage

COVID-19 and SARS

89.0

COVID-19 and MERS

71.0

COVID-19 and Ebola

58.0

COVID-19 and HIV

61.0

COVID-19 and Swine Flu

62.0

 

In terms of DNA sequence analysis, hamming distance is used to calculate the amount of bases by which this two sequence codewords of equal length differ 38.

 

Table4 indicates the hamming distance calculated for virus pairs by experimental method using Bio python. As hamming distance is found to be minimum in the case of the COVID-19 and SARS virus sequence hance both the virus are belongs to a similar family.

 

Table 4: Hamming Distance of virus pairs

Virus pairs used for analysis

Hamming Distance

COVID-19 and SARS

30

COVID-19 and MERS

38

COVID-19 and Ebola

47

COVID-19 and HIV

42

COVID-19 and Swine Flu

42

 

Bar plot is frequently used to visualize a data and to find meaningful insights from the datasets.

 

Each bar in bar plot shows value correspond to discrete level, longer the bars higher is the value. Bar plot of protein frequency value of virus used for analysis is illustrated in figure 8.

 

 

a) Bar Plod Covid frequency value

 

 

b) Bar Plot SARS frequency value

 

 

c) Bar Plot MERS frequency value

 

 

d) Bar Plot Ebola frequency value

 

 

e) Bar Plot freqency value

 

f) Bar Plot Swine Flu frequency value

Figure 1: Bar plots shows the representation protein sequence virus used for analysis (a) COVID-19, (b) SARS, (c) MERS, (d) Ebola, (f) HIV Virus, (e) Swine flu virus

 

In dot plot when the two viruses are similar, a diagonal line is present in the graphical plot or dot plot which is obtained in the case of COVID-19 and the SARS virus. Hence, COVID-19 and SARS virus have similar characteristics they belong to the same family.

 

 

a) Dot plot of Covid19 and Sars

 

b) Dot Plot of Covid19 and Mers

 

c) Dot Plot of Covid19 and Ebola

 

d) Dot plot of Covid19 and HIV

 

e) Dot plot of Covid19 and Swine Flu

Figure 9 Comparison of all viruses with COVID-19 with respect to Dot Plot. a) COVID-19 and SARS Virus, b) COVID-19 and MERS Virus, c) COVID-19 and Ebola Virus, d) COVID-19 and HIV Virus, E) COVID-19 and Swine Flu Virus.

Regardingthe dot plot of COVID-19 and SARS, it is observed in figure 9 that the diagonal line is present in the dot plot of COVID-19 and Sars shown in figure 9(a), which means the similarity percentage between the two sequences is high that is 89% while in case of dot plot of figure 9 (b), (c), (d) and (e) it is absent because of less percentage of similarity between the sequences.

 

 

a) 3D Strictire of Covid – 19 Virus

 

 

b) 3-D Structture of SARS Virus

 

c) 3D Structure of MERS Virus

 

d) 3-d Structure of Ebola Virus

 

e) 3-D Structure of HIV Virus

 

f) 3-D Structure of Swine Flu Virus

 

Figure 10: Visualisation of the 3D plot of the Virus. A) COVID-19, b) SARS, c) MERS, d) Ebola, e) HIV, f) Swine Flu.

 

6 CONCLUSION:

An experimental analysis of COVID-19 and other Calamitous viruses has been made using Bio Python. This research aims to find out the difference between the gene structures of various viruses and helps to discover appropriate medicines. The paper addresses the comparative analysis of COVID-19 against other viruses in terms of the length of DNA, GC contents39

 

 pairwise global alignment, and Hamming Distance. GC content of Swine Flu virus is 47.04 which is maximum as compare to other viruses. Hence, it is more stable while COVID-19 is less stable having a low GC content of 37.97 hence, COVID-19 virus varies differently because of climate change in different continents and country. An identical behaviour of COVID-19 virus with the SARS virus has been observed on the basis of DNA and protein sequence, hamming distance, dot plots. Genomic structure of COVID-19 virus shows that this virus can spread more frequently as compare to another virus. Hence, proper care has to be taken in the form of social distancing, wearing a mask, using a sanitizer etc.

 

Experimentally it has been observed that COVID-19 and SARS have an 89% similarity which means both are the same genus and belong to the same family, COVID-19 and MERS have a 71% similarity, COVID-19 and Ebola have a 58% similarity, COVID-19 and HIV have a 61% similarity while COVID-19 and swine flu have a 62% similarity content. Since, Ebola, HIV, and swine flu have less percentage of similarity with COVID-19 they belong to a different family of viruses.

 

7. FUTURE WORK:

Since COVID-19 virus spread more frequently than other virus used for the study hence, deep analysis using machine learning and deep learning methods are used for close observation of genomic structure are required in order to suggest better remedy.

 

8. REFERENCES:

1.      Stadnytskyi, V., Anfinrud, P., and Bax, A. Breathing, speaking, coughing or sneezing: What drives transmission of SA. 2021

2.      Nakagawa, S., and Miyazawa, T. Genome evolution of SARS-CoV-2 and its virological characteristics. Inflammation and Regeneration. 2020; 40(1): 17.

3.      Gopalan, H. S., and Misra, A. COVID-19 pandemic and challenges for socio-economic issues, healthcare and National Health Programs in India. Diabetes and Metabolic Syndrome: Clinical Research and Reviews. 2020; 14(5): 757–759.

4.      Pereira, F., Azevedo, F., Carvalho, Â., Ribeiro, G. F., Budde, M. W., and Johansson, B. (). Pydna: a simulation and documentation tool for DNA assembly strategies using python. BMC Bioinformatics. 2015; 16: 1–10.

5.      Knight, R., Vrbanac, A., Taylor, B. C., Aksenov, A., Callewaert, C., Debelius, J., Gonzalez, A., Kosciolek, T., McCall, L.-I., and McDonald, D. Best practices for analysing microbiomes. Nature Reviews Microbiology. 2018: 16(7): 410–422.

6.      Cannataro, M., Guzzi, P. H., and Sarica, A. Data mining and life sciences applications on the grid. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2013; 3(3): 216–238.

7.      Chapman, B., and Chang, J. Biopython: Python tools for computational biology. ACM Sigbio Newsletter. 2000; 20(2): 15–19.

8.      Gauthier, J., Vincent, A. T., Charette, S. J., and Derome, N. A brief history of bioinformatics. Briefings in Bioinformatics. 2019; 20(6): 1981–1996.

9.      Wang, X. H. (). Pair-wise and Multiple Sequence Alignment. In Data Analysis in Molecular Biology and Evolution. 1999: 33–39. Kluwer Academic Publishers. https://doi.org/10.1007/0-306-46893-X_5

10.   Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J.  Basic local alignment search tool. Journal of Molecular Biology. 1990; 215(3): 403–410. https://doi.org/10.1016/S0022-2836(05)80360-2

11.   Sonnhammer, E. L. L., and Durbin, R. A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene. 1995; 167(1–2): GC1–GC10. https://doi.org/10.1016/0378-1119(95)00714-8 RS‐CoV‐2? Journal of Internal Medicine, 290(5), 1010–1027.

12.   Koyutürk, M., Kim, Y., Topkara, U., Subramaniam, S., Szpankowski, W., and Grama, A. Pairwise Alignment of Protein Interaction Networks. Journal of Computational Biology. 2006; 13(2): 182–199. https://doi.org/10.1089/cmb.2006.13.182

13.   Wijaya, H. A., Syaifudin, and Siswanto, T. Visualization of corona virus disease 2019 deoxyribonucleic acid data analysis. AIP Conference Proceedings. 2022; 2659(1): 90005.

14.   Seibt, K. M., Schmidt, T., and Heitkam, T. FlexiDot: highly customizable, ambiguity-aware dotplots for visual sequence analyses. Bioinformatics. 2018; 34(20): 3575–3577. https://doi.org/10.1093/bioinformatics/bty395

15.   Inheiro, H. P., de Souza Pinheiro, A., and Sen, P. K. Comparison of genomic sequences using the Hamming distance. Journal of Statistical Planning and Inference. 2005; 130(1–2): 325–339. https://doi.org/10.1016/j.jspi.2003.03.002

16.   Apostolico, A., Guerra, C., and Pizzi, C. Alignment free sequence similarity with bounded hamming distance. 2014 Data Compression Conference. 2014: 183–192.

17.   Shampo, M. A., and Kyle, R. A. J. Craig Venter—The Human Genome Project. Mayo Clinic Proceedings. 2011; 86(4): e26–e27.

18.   Rizwan, T., Kothidar, A., Meghwani, H., Sharma, V., Shobhawat, R., Saini, R., Vaishnav, H. K., Singh, V., Pratap, M., Sihag, H., Kumar, S., Dey, J. K., and Dey, S. K. Comparative analysis of SARS-CoV-2 envelope viroporin mutations from COVID-19 deceased and surviving patients revealed implications on its ion-channel activities and correlation with patient mortality. Journal of Biomolecular Structure and Dynamics. 2022; 40(20): 10454–10469. https://doi.org/10.1080/07391102.2021.1944319

19.   Jiang, Y., Liu, L., Manning, M., Bonahoom, M., Lotvola, A., Yang, Z., and Yang, Z.-Q. Structural analysis, virtual screening and molecular simulation to identify potential inhibitors targeting 2’-O-ribose methyltransferase of SARS-CoV-2 coronavirus. Journal of Biomolecular Structure and Dynamics. 2022; 40(3): 1331–1346. https://doi.org/10.1080/07391102.2020.1828172

20.   Talevich, E., Invergo, B. M., Cock, P. J. A., and Chapman, B. A. Bio. Phylo: a unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython. BMC Bioinformatics. 2012; 13: 1–9.

21.   Batzoglou, S. The many faces of sequence alignment. Briefings in Bioinformatics. 2005; 6(1): 6–22.

22.   Ryu, T.-W. Benchmarking of BioPerl, Perl, BioJava, Java, BioPython, and Python for primitive bioinformatics tasks and choosing a suitable language. International Journal of Contents. 2009; 5(2): 6–15.

23.   Cock, P. J. A., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., and Wilczynski, B. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009; 25(11): 1422–1423.

24.   Pearson, W. R. Using the FASTA program to search protein and DNA sequence databases. Computer Analysis of Sequence Data: Part I. 1994: 307–331.

25.   Dayhoff, M., Schwartz, R., and Orcutt, B. 22 a model of evolutionary change in proteins. Atlas of Protein Sequence and Structure. 1978; 5: 345–352.

26.   Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., and Tesconi, M. DNA-inspired online behavioral modeling and its application to spambot detection. IEEE Intelligent Systems. 2016; 31(5): 58–64.

27.   Needleman, S. B., and Wunsch, C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology. 1970; 48(3): 443–453.

28.   Purohit, S., Satapathy, S. C., Sibi Chakkaravarthy, S., and Zhang, Y.-D. Correlation-Based Analysis of COVID-19 Virus Genome Versus Other Fatal Virus Genomes. Arabian Journal for Science and Engineering. 2020: 1–13.

29.   Jacobson, A. B., and Zuker, M. Structural analysis by energy dot plot of a large mRNA. Journal of Molecular Biology. 1993; 233(2): 261–269.

30.   Piovesan, A., Pelleri, M. C., Antonaros, F., Strippoli, P., Caracausi, M., and Vitale, L. On the length, weight and GC content of the human genome. BMC Research Notes. 2019; 12(1): 1–7.

31.   Karimi, K., Wuitchik, D. M., Oldach, M. J., and Vize, P. D. Distinguishing species using GC contents in mixed DNA or RNA sequences. Evolutionary Bioinformatics. 2018; 14: 1176934318788866.

32.   Ghosh, A., and Nandy, A. Graphical representation and mathematical characterization of protein sequences and applications to viral proteins. Advances in Protein Chemistry and Structural Biology. 2011; 83: 1–42.

33.   Yang, Y., Peng, F., Wang, R., Guan, K., Jiang, T., Xu, G., Sun, J., and Chang, C. The deadly coronaviruses: The 2003 SARS pandemic and the 2020 novel coronavirus epidemic in China. Journal of Autoimmunity. 2020; 109: 102434.

34.   Randic, M., Zupan, J., and Balaban, A. T. Unique graphical representation of protein sequences based on nucleotide triplet codons. Chemical Physics Letters. 2004; 397(1–3): 247–252.

35.   Richmond, T. J., and Davey, C. A. The structure of DNA in the nucleosome core. Nature. 2003; 423(6936): 145–150.

36.   Chiaromonte, F., Yap, V. B., and Miller, W. Scoring pairwise genomic sequence alignments. In Biocomputing 2002: 115–126. World Scientific.

37.   Poullet, M., and Orlando, L. Assessing DNA sequence alignment methods for characterizing ancient genomes and methylomes. Frontiers in Ecology and Evolution. 2020; 8: 105.

38.   Mohammadi-Kambs, M., Hölz, K., Somoza, M. M., and Ott, A. Hamming distance as a concept in DNA molecular recognition. ACS Omega. 2017; 2(4): 1302–1308.

39.   Gong, Y., Wen, G., Jiang, J., and Xie, F. Codon bias analysis may be insufficient for identifying host (s) of a novel virus. Journal of Medical Virology. 2020; 92(9): 1434.

40.   Anderson, D. E., Sivalingam, V., Kang, A. E. Z., Ananthanarayanan, A., Arumugam, H., Jenkins, T. M., Hadjiat, Y., and Eggers, M. Povidone-iodine demonstrates rapid in vitro virucidal activity against SARS-CoV-2, the virus causing COVID-19 disease. Infectious Diseases and Therapy. 2020; 9(3): 669–675.

41.   Gao, F., and Zhang, C.-T. GC-Profile: a web-based tool for visualizing and analyzing the variation of GC content in genomic sequences. Nucleic Acids Research. 2006; 34(suppl_2): W686–W691.

42.   Li, Y., Yang, X., Wang, N., Wang, H., Yin, B., Yang, X., and Jiang, W. GC usage of SARS-CoV-2 genes might adapt to the environment of human lung expressed genes. Molecular Genetics and Genomics. 2020; 295(6): 1537–1546. https://doi.org/10.1007/s00438-020-01719-0

43.   Shereen, M. A., Khan, S., Kazmi, A., Bashir, N., and Siddique, R. COVID-19 infection: Emergence, transmission, and characteristics of human coronaviruses. Journal of Advanced Research. 2020; 24: 91–98.

44.   Wu, C., Chen, X., Cai, Y., Zhou, X., Xu, S., Huang, H., Zhang, L., Zhou, X., Du, C., and Zhang, Y. Risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease 2019 pneumonia in Wuhan, China. JAMA Internal Medicine, 2020; 180(7): 934–943.

 

 

 

Received on 17.06.2024      Revised on 20.09.2024

Accepted on 28.12.2024      Published on 28.01.2025

Available online from February 27, 2025

Research J. Pharmacy and Technology. 2025;18(2):502-512.

DOI: 10.52711/0974-360X.2025.00076

© RJPT All right reserved

 

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Creative Commons License.