Physiochemical Properties of Medicines – Using Graph Eccentricity and Multiple Regression

 

Shivam Gupta, M. Yamuna

VIT, Vellore

*Corresponding Author E-mail: https://mail.google.com/mail/u/0/images/cleardot.gif myamuna@vit.ac.in

 

ABSTRACT:

Medicines have many predictable properties like water solubility, melting point, boiling point and so on. These are related to the molecular structure of the medicine. These molecular structures can be represented as graph theory graph structures. In this paper we propose a method of determining the predictable properties using graph eccentricity and multiple regression.

 

KEYWORDS: Graph, Regression, Eccentricity, Melting Point, Boiling Point.

 

 


INTRODUCTION:

Graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects. Graphs have found importance in various fields of sciences and sociology. In, chemistry, it has found a lot of usage in study of molecules and to find various properties relating them.1

 

A knowledge of pKa values is important for the quantitative treatment of systems involving acid–base equilibria in solution. Many applications exist in biochemistry; for example, the pKa values of proteins and amino acid side chains are of major importance for the activity of enzymes and the stability of proteins.2

 

Solubility is often said to be one of the "characteristic properties of a substance", which means that solubility is commonly used to describe the substance, to indicate a substance's polarity, to help to distinguish it from other substances, and as a guide to applications of the substance. For example, indigo is described as "insoluble in water, alcohol, or ether but soluble in chloroform, nitrobenzene, or concentrated sulfuric acid".3

 

 

A drug's distribution coefficient strongly affects how easily the drug can reach its intended target in the body, how strong an effect it will have once it reaches its target, and how long it will remain in the body in an active form. Hence, the log P of a molecule is one criterion used in decision-making by medicinal chemists in pre-clinical drug discovery, for example, in the assessment of drug likeness of drug candidates.4

 

Aqueous solubility is of fundamental interest owing to the vital biological and transportation functions played by water. The ability to accurately predict a molecules solubility represents potentially large financial savings in many chemical product development processes, such as pharmaceuticals.3

 

Graph theory has various applications in different domains. In5 DNA gap penalty is determined using directed graphs. In6 directed graphs are used to represent chemical equations. Graph theory has its contributions to topological indices also. In7 a constructive method of determining the wiener index of larger tree from a subtree is determined. In8 the molecular topological indices of a tree with same number of vertices of a given tree is determined. In9 a brief review on the contribution of graph theory to chemistry is discussed. In10 a correlation between boiling point and wiener index is determined. In11 the eccentricity related indices is studied and reverse eccentric connectivity index is studied for V-phenylenicnanotorus. So we understand that graph theory can be used in various field of research. This paper focus to determine the physiochemical properties of drugs using graph theory and multiple regression.

 

Preliminaries:

In this section, various basics have been listed to carry out calculation

 

pKa Value of Drugs:

An acid dissociation constant, Ka, (also known as acidity constant, or acid-ionization constant) is quantitative measure of the strength of an acid in solution. It is the equilibrium constant for a chemical reaction known as dissociation in the context of acid–base reactions. For many practical purposes it is more convenient to discuss the logarithmic constant,

 

pKa = -log 10(Ka ){\displaystyle \mathrm {p} K_{\mathrm {a} }=-\log _{10}\left(K_{\mathrm {a} }\right)}

 

The more positive the value of pKa, the smaller the extent of dissociation at any given pH.2

 

Solubility Value:

Solubility is the property of a solid, liquid, or gaseous chemical substance called solute to dissolve in a solid, liquid, or gaseous solvent. The solubility of a substance fundamentally depends on the physical and chemical properties of the solute and solvent as well as on temperature, pressure and the pH of the solution. The extent of the solubility of a substance in a specific solvent is measured as the saturation concentration.3

 

Partition Coefficient (Log P):

The partition coefficient, abbreviated P, is defined as a particular ratio of the concentrations of a solute between the two solvents (a bi-phase of liquid phases), specifically for un-ionized solutes, and the logarithm of the ratio is thus log P.When one of the solvents is water and the other is a non-polar solvent, then the log P value is a measure of lipophilicity or hydrophobicity.4

 

Aqueous Solubility (Log S):

Methods found in physical theory tend to use thermodynamic cycles, a concept from classical thermodynamics. The two common thermodynamic cycles used involve either the calculation of the free energy of sublimation (solid to gas without going through a liquid state) and the free energy of solvating a gaseous molecule (gas to solution), or the free energy of fusion (solid to a molten phase) and the free energy of mixing (molten to solution).The use of these cycles enables the calculation of the solvation free energy indirectly via either gas (in the sublimation cycle) or a melt (fusion cycle).

 

 

 

ΔGsolvation

log S(Vm) = -----------------------

-2.303RT

 

The free energy of solvation can be converted to a solubility value using: 3

 

Melting Point:

The melting point (or, rarely, liquefaction point) of a solid is the temperature at which it changes state from solid to liquid at atmospheric pressure. At the melting point the solid and liquid phase exist in equilibrium. The melting point of a substance depends on pressure and is usually specified at standard pressure.

 

Drug Class:

A drug class is a set of medications that have similar chemical structures, the same mechanism of action (i.e., bind to the same biological target), a related mode of action, and/or are used to treat the same disease.12

 

Graph:

A graph is an ordered pair G = (V, E) comprising a set V of vertices or nodes or points together with a set E of edges or arcs or lines, which are 2-element subsets of V (i.e. an edge is associated with two vertices, and that association takes the form of the unordered pair comprising those two vertices)1. A weighted graph is a graph in which each branch is given a numerical weight. A weighted graph is therefore a special type of labeled graph in which the labels are numbers (which are usually taken to be positive).13 Snapshot – 114 provides an example of graph.

 

 

Snapshot 1: Weighted Graph

 

 

Molecular Graph:

In chemical graph theory and in mathematical chemistry, a molecular graph or chemical graph is a representation of the structural formula of a chemical compound in terms of graph theory. A chemical graph is a labeled graph whose vertices correspond to the atoms of the compound and edges correspond to chemical bonds.15

 

 

In the Snapshot 2 16, the Molecular Graph of Benzene is shown.

 

Snapshot 2: Molecular Graph of Benzene

 

Path:

In graph theory, a path in a graph is a finite or infinite sequence of edges which connect a sequence of vertices which, by most definitions, are all distinct from one another.17

 

 

Fig. 1: Path

 

 

 

Length of a path is the number of edges in the path. The length of a path with n – vertices is n – 1.

 

Eccentricity:

The eccentricity of a graph vertex v in a connected graph is the maximum graph distance between v and any other vertex of the graph.18

The maximum eccentricity is the graph diameter. The minimum graph eccentricity is called the graph radius.18

 

 

Snapshot 3: Eccentricity

 

For example for the graph in Snapshot – 3 18 the length of the longest possible path for each vertex is shown.


 

 

Proposed Method:

Graph Eccentricity:

Table 1: Anti-depression drugs used in Regression

S. No

Drug

Molecular structure

Drug graph with eccentricity values

Total Eccentricity

1.

Amitriptyline

 

 

 

 

 

 

344

2.

Amoxapine

 

 

 

 

 

382

3.

Clomipramine

 

 

 

 

 

362

4.

Desipramine 

 

 

 

 

 

 

325

5.

Imipramine 

 

 

 

 

 

 

344

6.

Nortriptyline 

 

 

 

 

 

 

325

7.

Protriptyline

 

 

 

 

 

 

325

8.

Trimipramine

 

 

 

 

 

 

 

361

 


Table-1 provides the molecular structure of medicines that are prescribed as anti-depression drugs. A careful observation of these molecular structures reveals that they have some common sub-structure.

 

 

Fig. 2: Common sub-structure

 

The sub-structure in Fig. 2is repeated in almost all the medicines and hence we understand that medicines prescribed for a particular kind of disease shares some common structure and hence there should be some relation between their eccentricity values. . Consider any drug graph. Determine the eccentricity of all the vertices of the drug graph. Total eccentricity of all the vertices is defined as the eccentricity of the graph. Fig. 3 shows the graph structure of the anti-depression drug Amitriptyline. The vertices are labeled with their respective eccentricity values. Here we get the sum of the eccentricities is 344.

 

 

Fig.3: Eccentricity calculation

 

Regression Analysis:

For a given set of data Y, X1, X2,…,Xn, multiple linear regression model is an equation of the form

                             

Y=a0+a1X1+a2X2+……. +anXn             (Equation 1)

 

We know that medicines have many physio chemical properties which use pKa value, log S value, Log P value, water solubility and melting point for our discussion. For known medicines these values are already available in the database. For details related to these values refer to Drugbank.20

Whenever a new medicine for anti-depression is discovered the chemical composition and hence the molecular formula will be known. For any new medicine an approximate physio chemical property can be determined using regression.Table 2 lists the random 8 anti-depression drugs, their pKa values, log S values, log P values, water solubility and melting point and the corresponding graph eccentricities.


Table2: Data obtained for each drug using Drugbank

S.No

Name

pka

logS

logP

water Solubility

Melting Point

Eccentricity

1.

Amitriptyline

9.4

-4.46

4.92

9.71

188.25

344

2.

Amoxapine

8.83

-3.3

3.4

171

175.5

382

3.

Clomipramine

9.2

-4.3

5.19

0.294

191.75

362

4.

Desipramine 

10.4

-3.66

4.9

58.6

216

325

5.

Imipramine 

9.4

-4.19

4.8

18.2

174.5

344

6.

Nortriptyline 

10.1

-5.5

4.51

0.874

214

325

7.

Protriptyline

10.54

-6.1

4.7

1.04

170

325

8.

Trimipramine

9.42

-4

4.2

26

45

361

 

Snapshot 4: R code used

 


Multiple regression Fit:

Table 2 provides the set of data of pKa values, log S values, log P values, water solubility and melting point and their corresponding eccentricities. We shall determine the equation for regression fits of these values using R software.by using R software, we find the regression equation between eccentricity, pKa, log S, log P, water solubility and melting point of the above anti-depression drugs. Thus by knowing the regression line for randomly chosen anti-depression drugs, we can find the pKa value, log S value, log P value, water solubility and melting point of any other anti-depression drug which are having more or less the same molecular structure. Snapshot 4 provides the R program to fit a linear regression line for the data in Table 2.

 

Hence the linear regression equation for the randomly chosen 8anti-depression drugs is determined as

pKa = 16.237946 – 0.327624(log S) + 0.698458(log P) + 0.011168(water solubility) – 0.005328(melting point) – 0.030957(eccentricity)

In the same way regression line for log S, log P, water solubility and melting point can be calculated and are as follows

log S = 17.97676 – 1.57924 (pKa) + 1.992681(log P) + 0.028724(water solubility) – 0.012196(melting point) – 0.043956(eccentricity)

log P = -7.04424 + 0.640929 (pKa) + 0.379344 (log S) - 0.013145 (water solubility) + 0.005764 (melting point) + 0.019052 (eccentricity)

water solubility = -701.718+ 54.0175 (pKa) + 28.8239 (log S) - 69.2877 (log P) + 0.4368 (melting point) + 1.6924 (eccentricity)

melting point = 1626.114 - 112.695 (pKa) - 53.5152 (log S) + 132.8647 (log P) + 1.9099 (water solubility) - 3.6976 (eccentricity)

Equations 1: Regression equations for calculating pKa, log S, log P, water solubility and melting point

 

RESULTS AND DISCUSSIONS:

Let us now consider 3 new anti-depression drugs Doxepin, Dimetacrine, Demexiptiline. The results generated for these 3 medicines using Equations 1 is summarized in Table 3. The highlighted columns indicate the data obtained usingEquations 1. We observe that in many cases the data is close to the original values. Whenever we fit in regression lines more the number of data better the results. For discussion purpose we have picked only 8 medicines. As the number of data increases we believe that we can generate closer values for all the variables.


 

 

 

 

 

 

 

 

Table 3: Actual and Calculated values for various random drugs

 

Doxepin

Dimetacrine

Demexiptiline

Test Drug

 

 

 

Eccentricity

 

344

 

340

 

325

 

Actual Value

Calculated Value

Actual Value

Calculated Value

Actual Value

Calculated Value

pKa

9.76

9.91875622

9.2

9.63204536

9.09

9.27577564

Log S

l-3.4

-3.40612503

-3.9

-3.60090018

-5

-5.80706292

Log P

4.29

4.20406644

4.42

4.2959767

3.82

4.38018016

Water Solubility

31.6

23.352807

34.3

19.916456

2.81

32.088461

Melting Point

25

66.531759

155.5

193.618044

232.5

180.483632

 


CONCLUSION:

The proposed method is used to find the pKa values, log S values, log P values, water solubility and melting point using the multiple regression equation. This regression fit is the best fit to estimate the future values. Thus by determining the regression line from randomly chosen 8 anti-depression drugs with their graph eccentricity, the new physio-chemical properties can be estimated which is the approximate to the original values. Here we found out that the regression fit can be used to find the pKa values, log S values, log P values, water solubility and melting point which are the physio-chemical properties of chemical compounds or drugs. This method can be used to find the other physio-chemical properties like boiling point, using the graph property eccentricity.

 

REFERENCES:

1.       https://en.wikipedia.org/wiki/Graph_theory           

2.       https://en.wikipedia.org/wiki/Acid_dissociation_constant

3.       https://en.wikipedia.org/wiki/Solubility

4.       https://en.wikipedia.org/wiki/Partition_coefficient

5.       Yamuna. M. DNA gap penalty using directed graph. Der Pharmacia Lettre. 7(12); 2015: 392-398.

6.       Yamuna. M, Elakkiya. A.Chemical equation representation as directed graph. Der Pharma Chemica. 7(9);2015: 49-55.

7.       Yamuna. M. Wiener index of chemical trees from its subtree.Der Pharma Chemica. 6(5);2014: 235-242.

8.       Yamuna. M, Divya, T. Molecular topological index of tree with equal number of vertices of a given tree. IOP conference series: Materials science and Engineering. 263(4);2017:042119.

9.       Yamuna. M. A brief review- on contribution of graph theory and Wiener index to chemistry, International Journal of Pharmacy and Technology 8(1);2016: 11182-11192.

10.     https://books.google.co.in/books?id=c8EQ4HL4V1MC&printsec=frontcover&dq=introduction+topology&hl=en&sa=X&ved=0ahUKEwj4iMjZhuDVAhXMKo8KHSjHAqYQ6AEIRTAF#v=onepage&q=introduction%20topology&f=false.

11.     Jianzhang, Wu. Mohammad Reza, Farahani. Xiao Yu. & Wei Gao.Physical-chemical properties studying of molecular structures via topological index calculating. Open Phys. 15; 2017: 261–269.

12.     https://en.wikipedia.org/wiki/Drug_class

13.     http://mathworld.wolfram.com/WeightedGraph.html

14.     https://study.com/academy/lesson/assessing-weighted-complete-graphs-for-hamilton-circuits.html

15.     https://en.wikipedia.org/wiki/Molecular_graph

16.     http://www.orientjchem.org/vol29no4/development-of-qsar-model-of-substituted-benzene-sulphonamide-using-multiple-regression-analysis/

17.     https://en.wikipedia.org/wiki/Path_(graph_theory)

18.     https://statistics.laerd.com/spss-tutorials/multiple-regression-using-spss-statistics.php

19.     http://www.statsoft.com/Textbook/Multiple-Regression

20.     https://www.drugbank.ca

 

 

 

 

Received on 25.03.2018           Modified on 23.05.2018

Accepted on 05.06.2018          © RJPT All right reserved

Research J. Pharm. and Tech 2018; 11(9): 4112-4118.

DOI: 10.5958/0974-360X.2018.00756.4