INTRODUCTION:

The fact that, a chemical structure representation or three-dimensional structure of a complex molecule contains information related to its biological activity, reactivity, and various other physicochemical properties¹,². Molecular similarity is a concept extensively applied in modern drug discovery process³. The crux of research especially involving that of chemical and pharmaceutical entities depends mainly on the small molecules. Various molecular similarity measures have been studied and proposed to predict properties or functionality in molecules. According to similar property principle, molecules that show structural similarity are likely to present comparable properties⁴, meaning that by establishing and comparing molecular structures, properties of chemical molecules can be predicted.

The molecular similarity measures can be categorized into two classes: graph-based and fingerprint-based. Thus in view of the lock and key model of substrate-receptor steric complementarity, a drug and the substrate, or a known lead drug and others under development, should have appropriate similarity.

Three dimensional geometry and similarity indices defined in terms of electron density distributions have been used for comparing molecules quite successfully⁵. Similarity and differences between molecules need also be evaluated at the level of their interactions with the appropriate receptor.

Graph-based Molecular Similarity:

Graph-based measures are based on the intuitive graph representation of the molecular atom-bond structure where single atoms or aromatic rings are represented by nodes and chemical bonds between atoms through edges. Every edge and node label can carry specific properties of the bond and atom, respectively. The graph prepared can have a label carrying various properties or characteristics of respective molecule.

Molecular similarity based on fingerprint:

The second class of measures uses a vector-based representation called fingerprint, a conventional concept in chemical informatics and related fields. Fingerprints are binary vectors that represent particular substructures in certain molecule where each bit can be valued either as 1 or 0, signifying if the molecule contains an associated substructure with probability. Many processes to quantify resemblance between fingerprint-based representations of molecules such as Tanimoto, Cosine, Dice and Euclidean distance have been established.

MATERIAL AND METHODS:

Molecular similarity measurements:

A Variety of molecular similarity coefficients are used in the similarity measurements, including Tanimoto, Cosine, Dice, and Euclidean distance. One of the highly applied similarity measurement is Tanimoto similarity coefficient. Only the Tanimoto similarity is discussed in detail. Tanimoto coefficient which is also known as Jaccard coefficient is among the most commonly and frequently used similarity measure method for binary fingerprints.

The TC for two molecular fingerprints Y and Z is given by;

Tc (Y, Z) = c/a+b-c

Here, a and b stand for the number of bits on the fingerprints Y and Z, respectively and c is the number of bits in both fingerprints. It is seen that Tc equates the intersection of fingerprint features with union of all features existing in two compound fingerprints. The Tc values ranges between 0 to 1, where 0 is minimal fingerprint and 1 is maximal similarity respectively.

The Tc can also be used on non-binary fingerprints, in which case, Tc calculates the fingerprint overlap by

Tc (Y,Z)

Here, the fingerprints have the form Y = (a1; a2; : : : ; an) and Z = (b1; b2; : : : ; bn) with a length of n. The variables ai and bi symbolize the ith position in the fingerprints Y and Z, respectively, and aibi being their product⁶. A value range of -0.333 to 1 can occur in non-binary Tc. The various molecular similarity coefficients are listed in Table 1.

Table 1: List of molecular similarity coefficients

All the above mentioned similarity measurements are derived by the application of quantum mechanics and are also termed as quantum similarity measurements. Quantum molecular similarity⁷ involves complex calculations for larger molecules and it is solved only for the simplest molecule like hydrogen. Figure (1) represents the quantum similarity applied for the Hydrogen molecule. List of similarity coefficients are given in Table 1

Molecular fingerprints used in similarity measurements:

The way by which molecular structure is transformed into a bit intiger, many types of molecular fingerprints are recognized. 2D fingerprints are those that use only 2D molecular graph which is observed in many, but some have the capacity of storing 3D information, among which the most notable is that of pharmacophore fingerprints. The major approaches includes topological and circular fingerprints.

Fingerprints based on Substructure keys:

Depending on the substructures present in a chemical moiety or features, the structural keys or bits of bit string are assigned. The number of keys influences the number of bits which relate to the presence or absence of chemical feature in a molecule, whereas, other (hashed) kinds of fingerprints these features may not possibly work. The substructure keys-based fingerprints which are of prominence is listed below.

MACCS Fingerprint ⁸:

It is represented as 2 variants, consisting of nine sixty (960) and one hundred sixty six (166) structural keys developed on the basis of SMARTS application. The key with fewer length (with 166 bits) is applied extensively. The shorter version is more advantageous than the longer version as it can cover all the structural features and can be applied in small molecule drug discovery. Also it can be run using commonly available software packages.

PubChem fingerprint⁹:

This fingerprint utilizes 881 substructure keys. The fingerprint is mainly used for similarity searching. Pubchem fingerprint is also used in ChemFP [13] and in CDK [14, 15] application algorithms.

BCI fingerprints¹⁰:

These fingerprints can be developed with varying amounts of bits and the user can modify it based on the problem under consideration. The standard substructure dictionary of BCI fingerprint consists of 1052 keys.

TGD¹¹, TGT fingerprints:

These belong to the group of pharmacophoric fingerprints and are calculated chiefly based on 2D molecular graph containing 735 and 13824 bits respectively. TGD works based on seven-atom features, which encodes pair of atoms descriptors and maximum distance up to 15 bonds. TGT encodes atom triplet features through three graph matrix distances separable by 6 distance ranges.

Topological or path-based fingerprints:

These fingerprints work by generating a linear path for a molecule followed by generating a hash code specific for a substructure (Table 2).Following are the types of Topological fingerprints.

Table 2: List of topological indices

The Daylight fingerprint¹², is most prominent among topological fingerprints. They are made up of 2048 bits that encode all the prominent connectivity paths over a molecule up to a user defined length. Most software packages provide these kind of fingerprints. Some examples of fingerprints based on nonlinear paths such as Open Eye’s Tree fingerprints etc. are used maximum. A list of topological indices are given in Table number 2.

Circular Fingerprints:

They are classified under hashed topological fingerprints. These fingerprints do not look for the linear path instead they take circular path up to 5 determined radius. They are extensively used in full structure similarity searching.

Molprint2D¹³:

2D Molprint uses the atom environments to encode molecular connectivity and are represented by strings of different size.

ECFP:

The de facto standard circular path based fingerprints which are also called Extended Connectivity Fingerprints (ECFPs), defined by the Morgan algorithm¹⁴. They are especially designed for use in structure-activity relationship modeling.

FCFP (Functional-Class Fingerprints):

These are variables of ECFP and are further utilized in that, instead of indexing a particular atom in the environment, they index the role of the atom. Thus, various atoms or groups with similar function are not differentiated by the fingerprint, thus enabling these applications to be used as pharmacophoric fingerprints. ECFP fingerprints which are supported by major software packages also support such variations.

CONCLUSION:

Finger-print based molecular similarity measurements are gaining importance in the drug design process. They can be adopted as a virtual screening technique like other QSAR methods. The cost involved in the drug synthesis and screening them against their biological targets have turned out in the search of alternative methods for the quick and cost effective screening. The right combination of similarity measurement and fingerprints to be selected for the reliable predictivity. This can be achieved only with the experience and sophisticated software to run the simulation.

REFERENCES:

1. Wipke WT, Heller S, Feldman, Hyde E (eds) (1974) Computer representation and manipulation of chemical information. Wiley, New York

2. Ash JE, Hyde E (eds) (1975) Chemical information systems. Ellis Horwood, Chichester, England

3. Muddukrishna, B.S., Pai, V., Lobo, R. et al. Mol Divers (2017). https://doi.org/10.1007/s11030-017-9793-0

4. Mark A. Johnson and Gerald M. Maggiora. Concepts and applications of molecular similarity. Wiley, New York, 1990.

5. M. A. Eshera and K-S Fu. An image understanding system using attributed symbolic representation and inexact graph-matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8:604--618, 1986.

6. Rohrer, S. G.; Baumann, K. Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data. J. Chem. Inf. Model 2009, 49, 169–184.

7. Carbo, R.; Besalii, E. In Molecular Similarity and Reactivity: From Quantum Chemical to Phenomenological Approaches; Carbo, R. Ed.; Kluwer: Dordrecht, 1995, pp.3-30.

8. J.L. Durant, B.A. Leland, D.R. Henry, J.G. Nourse, Reoptimization of MDL Keys for Use in Drug Discovery, J. Chem. Inf. Model. 42 (2002-11) 1273–1280. doi:10.1021/ci010132r.

9. E.E. Bolton, Y. Wang, P.A. Thiessen, S.H. Bryant, Chapter 12 PubChem: Integrated Platform of Small Molecules and Biological Activities, Annu. Rep. Comput. Chem. 4(2008) 217–241. doi:10.1016/S1574-1400(08)00012-1.

10. J.M. Barnard, G.M. Downs, Chemical Fragment Generation and Clustering Software, J. Chem. Inf. Model. 37 (1997-1) 141–142. doi:10.1021/ci960090k.

11. R.P. Sheridan, M.D. Miller, D.J. Underwood, S.K. Kearsley, Chemical Similarity Using Geometric Atom Pair Descriptors, J. Chem. Inf. Model. 36 (1996-1) 128–136 doi:10.1021/ci950275b.

12. A. Bender, H.Y. Mussa, R.C. Glen, S. Reiling, Molecular similarity searching using atom environments, information-based feature selection, and a naïve Bayesian classifier., J. Chem. Inf. Comput. Sci. 44 (2003-1) 170–8. doi:10.1021/ci034207y.

13. A. Bender, H.Y. Mussa, R.C. Glen, S. Reiling, Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance., J.Chem. Inf. Comput. Sci. 44 (2004-1) 1708–18. doi:10.1021/ci0498719.

14. H.L. Morgan, The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service., J. Chem. Doc. 5 (1965-5) 107–113. doi:10.1021/c160017a018.

Received on 10.02.2018 Modified on 29.03.2018

Research J. Pharm. and Tech 2018; 11(4): 1375-1377.

DOI: 10.5958/0974-360X.2018.00256.1