Application of Molecular Descriptors in Modern Computational Drug Design –An Overview
Venkatesh Kamath1, Aravinda Pai2*
1Deaprtment of Pharmaceutical biotechnology, Manipal College of Pharmaceutical Sciences, Manipal University, Manipal, Karnataka, India
2Deaprtment of Pharmaceutical Chemistry, Manipal College of Pharmaceutical Sciences, Manipal University, Manipal, Karnataka, India
*Corresponding Author E-mail: pai.aravind@gmail.com, aravind.pai@manipal.edu
ABSTRACT:
The last decade witnessed several scientific experiments with the aim to correlate the structure of chemical entities to their biological activities, toxicities and other molecular properties. Chemical species are mathematically represented based on specified algorithms in molecular descriptors. They play extremely important role in the field of chemistry, pharmacy, environmental toxicology and health research. Till today, more than 5000 molecular descriptors being reported and are calculated mainly using dedicated software's, which are run on advanced machines. Molecular descriptors play important roles in the fields of quantitative structure–activity relationship studies (QSAR) as well as quantitative structure– property relationship studies (QSPRs).The path breaking progress in the field of chemoinformatics has showed us new paths for identifying key links between the molecular structure and their biological properties. In the present review, some of the pharmaceutically important molecular descriptors and their applications are presented.
KEYWORDS: Molecular descriptors, QSAR, QSPR, modelling.
INTRODUCTION:
Molecular descriptors are quantum mechanically derived mathematical depictions of a chemical structure, obtained using a specific algorithm. Also, molecular descriptor is the ultimate conversion of an information encoded in a chemical structure into a numerical value, which directly correlates the structure to its biological property or to its physicochemical property1. Molecular descriptors are based on the principles of quantum-chemistry, mechanistic organic chemistry, and graph theory etc. Molecular descriptors are categorized into two important categories, the first being experimentally derived descriptors and second category is theoretically derived descriptors.
The important examples of experimentally derived descriptors are; Logp (lipophilicity descriptors), polar surface area, dielectric constant etc. The theoretically derived descriptors involves various distance connectivity matrices, topological descriptors etc. Theoretical descriptors have low statistical error owing to the absence of experimental noise. Theoretical descriptors from physico-chemical analysis and theory show slight degree of overlap with respect to experimental measurements. Experimental counterpart descriptors include surface areas, volume descriptors and quantum chemical descriptors. The main advantage of theoretical descriptors compared to experimental descriptors include less time requirement and cost effectiveness.
Until the 1970s, mathematically linked relationships in experimentally measured quantities was the core of molecular modeling. The trend now, however, has shifted towards relationships between property that is measured and descriptors which are capable in capturing structural chemical data. The advent of these descriptors have not just opened a new window in exploring new mathematical models, but has paved way to ignite a compounding change in the research interest in this area. The present scenario has allowed to understand the co-relation between experimental basis and theoretical data from molecular structure. Simple kind of molecular descriptors could be attained just by counting type of atom or fragments present in the molecule, through physico-chemical and properties such as number donors or acceptors of of hydrogen bond, molecular weight as well as number of hydroxyl groups to name a few. 2D-descriptors are generated through algorithms which are used on a topological representation. Other molecular descriptors known as geometrical or 3D-descriptors are obtained through spatial coordinates (x, y, z) of the particular molecule. 4D-descriptors are obtained from the energy generation that occur between molecules which are imbedded into grid and probe. Higher information content is present in 3D- or 4D-descriptors compared to simple ones. Thus, it is logical to think of using the most reliable descriptors in all kind of modeling procedures. This thought process is however not proper as the most suitable descriptors are the ones in which the information content can be compared with the response for which the model is sought. Thus, having abundant information in the independent variables in terms of response is considered as noise giving rise to an unstable or unpredictive model. In general, molecular descriptors should be able to satisfy the basic necessities besides trivial invariance properties. It can thus be mentioned that the most reliable descriptor which is effective for all the issues does not exist.
Stepwise process for the generation and interpretation of a molecular descriptor:
Representation of chemical structures: Molecular representation is one of the important preparative step before the generation of a chemical descriptor. The kind of representation is based primarily on extent of chemical input that is transferred to a molecule2, 3. The most common molecular representation is through chemical or molecular formula. This can be illustrated with the example of 4-bromo toluene, C7H7Br, which indicates the presence of 8 atoms in the molecule excluding the hydrogens, where NC =7(number of carbon atoms), NH =7(number of hydrogen atoms) and NBr =1(number of bromine atoms). This molecular representation do not give any idea of the actual molecular structure, hence molecular descriptors generated solely based on chemical formula are known as 0 dimensional descriptors.
Atoms in a molecule are usually characterized based on their atomic properties which also explain about chemical information of a molecular structure. Vander Waals radii, atomic mass, charge, electronegativities as well as atomic polarizabilities are the usual atomic properties for calculation of molecular descriptor. Local vertex invariants (LOVIs) that can be generated through graph theory is also used to characterize atoms. A one-dimensional depiction of a molecule includes list of fragments of a molecule and substructure representation list falls under it. 1D-molecular descriptors are the descriptors obtained from such representation and they are bit-strings and/or holographic vectors. 0D as well as 1D descriptors can be calculated easily. As the method does not need optimization of molecular structure they can easily be interpreted.
The basic frame in two-dimensional representation include connecting of atoms in a molecular complex and the nature of chemical bonds. Topological form of representation is basically when a molecule is represented through molecular graph. Such graphs, give information on the connection of atoms in a molecule not taking into account the parameters such as torsion angle, bond angles and nuclei distance.
In 3D representation, a molecule is considered as geometrical object which is rigid in space demonstrating the nature and atoms that are connected, as well as total configuration and is referred as geometrical representation. Such representation defines a molecule by type of atom comprising a molecule as well as the set coordinates x,y and z that are linked to every atom.
Molecules represented through lattice are linked to properties obtained through electron distribution and interaction of such molecules with a given probe describing the space surrounding them which is characteristic of QSAR techniques based on grid. At this junction, descriptors can be regarded as 4D-molecular descriptors that are channelized by scalar field, that are linked with 3D molecular geometry.
Generation and application of some important descriptors used in molecular design
a) Topological indexes:
Distance present among atoms and calculated by the number of intervening bonds form the basis of topological indexes and are regarded as through-bond indexes4,5. The result of topological index non uniqueness is that they do not let re-construction of a molecule. Kier shape descriptors, path counts, Hosoya Z index, self-returning walk counts and path/walk shape indexes[6,7,8,9,10] are some of the specific graph elements in simple TIs that include count. Wiener operator is the half-sum of the matrix elements which is among the most common TI is obtained by using graph operators to graph–theoretical matrixes including Harary indexes, Wiener index and spectral indexes [11,12,13]. In the past, efforts have been made to come out with algorithms with respect to molecular graph.14.
b) Graph theoretical matrices15 :
Structural information of molecules are commonly encoded by mathematical tool such as molecular matrixes. Graph–theoretical matrixes can be vertex matrixes or edge matrices. Vertex matrixes are the ones if the column and rows indicate graph vertexes (atoms) or matrix elements encode properties of pairs of vertexes whereas edge matrixes are the ones when rows and columns refer to graph edges (bonds) and matrix elements encode properties of pairs of edges. Edge and Vertex matrixes are square shaped with dimensions of B×B, B and A× A, A respectively. Vital graph–theoretical matrixes include incidence matrixes that are utilized to describe a molecular graph.
c) Auto correlation descriptors:
The molecular descriptors, which are purely based on autocorrelation function ACl
Autocorrelation16 measures the strength of correlation between observations as a direct function of space and time separated between them, where f(x) corresponds to a time dependent function.
d) Moreau-Broto autocorrelation17:
The most applied autocorrelation descriptor applied on a molecular graph.
e) Nucleophilic superdelocalizability:
Sum overall of the unoccupied molecular orbitals (NMONOCC)
Measure of availability for additional electron density on the ath atom. If the transition states are controlled by the frontier orbital, the nucleophilic super delocalizability is calculated on the lowest unoccupied molecular orbital (LUMO).
f) Composite nuclear potential:
The descriptors defines the composite nuclear potential18for a respective configuration of a molecule. It is expressed as
Where, Za is the nuclear charge at a position defined as Ra
g) Quantum chemical descriptors:
These descriptors are mainly based on Schrodinger equation.
Where H corresponds to a Hamiltonian operator, Ei corresponds to electronic energy state of ith atom and remaining one corresponds to the wave function. The spectrum of electronic molecular energy levels are themselves are characterized molecular descriptors.
h) Bertz-Herndon relative complexity index (CBH): The structural complexity of a molecule is assessed by descriptors based on the molecular graph19in comparison with its parent molecular graph
K is the total number of interconnected sub graphs in G and K (G).
i) Minoli complexity index:
The descriptor measures the complexity of a molecular graph20 based on the incremental addition of edges and vertices.
Where A represents number of vertices and B represents number of edges. And L is the length of the longest path in the graph.
j) Maximum nuclear repulsion C-H bond index: The descriptors corresponds to the nuclear repulsion energy between a carbon and its bonded hydrogen.21
Where Z corresponds to the atomic numbers, rCH stands for C-H bond length and k is the pairs of bonded carbon and protons.
k) ALPHA descriptor:
A vector based molecular descriptor22 calculated from the trajectories got through molecular dynamic simulations by utilizing Gaussian smoothing. Alpha descriptor is expressed as
In the above equation, a and s are the mean and standard deviation of the Gaussian functions.
l) Amphiphilic moments:
The difference between free energies of transfer between aqueous phases to the aqueous: air phase is known as amphiphilicity. It is quantified by the relative surface tension measurements.23
Where d stands for the distance
m) Atomic solvation parameter:
Solvation free energy of chemical species is calculated using an empirical molecular descriptor.24
Where SA stands for solvent accessible surface area.
n) Molecular electronegativity edge vector:
Is a modified version of the Moreau-broto autocorrelation defined by applying the reciprocal of topological distances in conjunction with atom electronegativities. The auto correlation for kth lag is calculated as
Where dij refers to the topological distance between ith and jth atoms.
o) Balaban like indices:
In such indices, the distance connectivity descriptors calculated by applying the distance connectivity indices J.25.
p) Bio descriptors:
Numerical quantities which encode information about biochemical systems and complex biological macromolecules. They give clear link between graph theories, topological parameters with molecular biology.
q) Amino acid descriptors:
Due to the wide importance and complexity of proteins, some descriptors were defined to represent amino acid side chains, these being responsible for the packing of the regular elements of secondary structure and then for the tertiary structure of a protein. As a consequence, the structure of a protein can be expressed quantitatively by means of side chain amino acid properties. Starting from the pioneering work of Sneath, who described peptide sequences by semi quantitative experimental parameters of the 20 coded amino acids.26, several amino acid descriptors have been proposed that contain information about properties of side chains of amino acids.
r) Pharmacological indices:
Minimal dose that produces the desired effect of a drug is known as the effective dose (ED), often determined based on analyzing the dose–response relationship specific to the drug. The dosage which is able to produce a desired effect in half the test population is referred to as the median effective dose ED50,
s) Therapeutic index:
Comparison of the amount of a therapeutic agent that causes the therapeutic effect to the amount that causes toxic effects. Quantitatively, it is the ratio of the dose required to produce the toxic effect over the therapeutic dose. A commonly used measure of therapeutic index is the lethal dose of a drug for 50% of the population (LD50) divided by the effective dose for 50% of the population (ED50)
t) Median Inhibitory Concentration (IC50):
Measure of concentration that is required for producing 50% inhibition of a biological activity (i.e., an enzyme reaction, cell growth, reproduction, etc.). In simpler terms, it measures how much of a particular substance/molecule is needed to inhibit some biological process by 50%. IC50 is commonly used as a measure of drug-receptor binding affinity
CONCLUSION:
The present day researchers are showing deeper level of interest in the field of QSAR. In the process of solving QSAR problems, many of the chemoinformatics methods were specifically conceived. This has enabled to answer the demand to know in depth the chemical systems and their relationships with biological systems. As the likelihood of dealing with various biological systems described by peptide/protein or DNA sequences, to describe proteomics maps, or to give effective answers to ecological and health problems have increased, in the present scenario, it has promoted new borders where in mathematics orientation, statistics analysis, chemistry basis and biology together with their inter-relationships may produce new effective useful knowledge. In the recent years, several molecular descriptors have been proposed underlining the great interest the present generation scientific community has shown in theoretical approach to gather information about chemical compounds as well as the need for more specialized and sophisticated molecular descriptors useful for the development of predictive QSAR/QSPR models.
REFERENCES:
1. Todeschini R, Consonni V, Handbook of molecular descriptors. Wiley-VCH, Weinheim 2000.
2. Vedani A, Dobler M, 5D-QSAR: The key for simulating induced fit? J Med Chem 45; 2002:2139–2149.
3. Vedani A, Dobler M, Lill MA .Combining protein modeling and 6D-QSAR. Simulating the binding of structurally diverse ligands to the estrogen receptor. J Med Chem 48; 2005: 3700–3703.
4. Diudea MV, Horvath D, Graovac A. Molecular topology. 15. 3D distance matrices and related topological indices. J Chem Inf Comput Sci 35; 1995: 129–135.
5. Balaban AT. From chemical graphs to 3D molecular modeling. In: Balaban AT (ed) From chemical topology to three-dimensional geometry. Plenum Press, New York 1998
6. Hosoya H .Topological index. A newly proposed quantity characterizing the topological nature of structural isomers of saturated hydrocarbons. Bull Chem Soc Jap 44; 1971: 2332–2339.
7. Randi´c M, Wilkins CL. Graph theoretical ordering of structures as a basis for systematic searches for regularities in molecular data. J Phys Chem 83; 1979: 1525–1540.
8. Harary F. Graph theory. Addison-Wesley, Reading MA. 1969
9. Wolohan P, Reichert DE. CoMFA and docking study of novel estrogen receptor subtype selective ligands. J Comput Aided Mol Des 17; 2003: 313–328
10. Ai N, DeLisle RK, Yu SJ et al. Computational models for predicting the binding affinities of ligands for the wild-type androgen receptor and a mutated variant associated with human prostate cancer. Chem Res Toxicol 16; 2003: 1652–1660.
11. Moro S, Braiuca P, Deflorian F et al. Combined target-based and ligand-based drug design approach as a tool to define a novel 3D-pharmacophore model of human A3 adenosine receptor antagonists: Pyrazolo[4,3-e]1,2,4-triazolo[1,5-c]pyrimidine derivatives as a key study. J Med Chem 48; 2005: 152–162.
12. Tervo AJ, Nyroenen TH, Ronkko T et al. A structure–activity relationship study of catechol-O-methyltransferase inhibitors combining molecular docking and 3D QSAR methods.J Comput Aided Mol Des 17; 2003: 797–810
13. Medina-Franco JL, Rodrýguez-Morales S, Juarez-Gordiano CA et al. Docking-based CoMFA and CoMSIA studies of non-nucleoside reverse transcriptase inhibitors of the pyridinone derivative type. J Comput Aided Mol Des 18; 2004: 345–360
14. Thaimattam R, Daga P, Rajjak SA et al. 3D QSAR CoMFA, CoMSIA studies on substituted areas as Raf-1 kinase inhibitors and its confirmation with structure-based studies. Bioorg Med Chem 12; 2004: 6415–6425
15. Trinajstic N. Chemical graph theory. CRC press; 1992.
16. Harary F, Hedetniemi ST, Robinson RW. Uniquely colorable graphs. Journal of Combinatorial Theory. 1; 6(3); 1969: 264-70.
17. Moreau G, Broto P. The auto-correlation of a topological-structure-a new Molecular Descriptor. Nouveau Journal De Chimie-New Journal of Chemistry. 1 1; 4(6); 1998: 359-60.
18. Iczkowski RP, Margrave JL. Electronegativity. Journal of the American Chemical Society. 83(17); 1961: 3547-51.
19. Bertz, Steven H., and William C. Herndon. "The similarity of graphs and molecules.", 1986: 169-175.
20. Algeri S, Cerletti C, Curcio M, Bonollo L, Buniva G, Minazzi M, Minoli G. Effect of anticholinergic drugs on gastro-intestinal absorption of L-dopa in rats and in man. European journal of pharmacology.1; 35(2); 1976: 293-9.
21. Katritzky AR, Karelson M, Sild S, Krygowski TM, Jug K. Aromaticity as a quantitative concept. 7. Aromaticity reaffirmed as a multidimensional characteristic. The Journal of Organic Chemistry. 24; 63(15); 1998: 5228-31.
22. Tuppurainen K, Viisas M, Peräkylä M, Laatikainen R. Ligand intramolecular motions in ligand-protein interaction: ALPHA, a novel dynamic descriptor and a QSAR study with extended steroid benchmark dataset. Journal of computer-aided molecular design. Mar 1; 18(3); 2004: 175-87.
23. Kansy M, Fischer H, Kratzat K, Senner F, Wagner B, Parrilla I. High-throughput artificial membrane permeability studies in early lead discovery and development. Testa, B.; van de Waterbeemd, H.; Folkers, G. 26; 2001: 447-64.
24. Eisenberg, David, and Andrew D. McLachlan. "Solvation energy in protein folding and binding.".1986: 199-203.
25. Diudea MV, Ivanciuc O, Nikolic S, Trinajstic N. Matrices of reciprocal distance, polynomials and derived numbers. MATCH Commun. Math. Comput. Chem. 1; 35; 1997: 41-64.
26. Sneath PH. Relations between chemical structure and biological activity in peptides. Journal of theoretical biology. 1; 12(2); 1996: 157-95.
Received on 17.06.2017 Modified on 14.07.2017
Accepted on 10.08.2017 © RJPT All right reserved
Research J. Pharm. and Tech. 2017; 10(9): 3237-3241.
DOI: 10.5958/0974-360X.2017.00574.1