Study of Residue Closeness Centrality and its Significance in Predicting Cdr Regions in Antibody Light Chains

 

Shubhangi Swaroop, Isaac Arnold Emerson*

Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore-14, India

*Corresponding Author E-mail: i_arnoldemerson@yahoo.com

 

ABSTRACT:

In the science of immunology, antibodies occupy a pivotal role, since they are protein molecules directed against harmful antigens by recognizing foreign molecules or antigens. Antibodies being the essence of an immune response are heavily studied molecules whose Human Antibody Light Chain Variable Domains (HALCVD) were our interest of study. We represented antibody structures as residue interaction networks, and our goal was to determine how useful closeness centrality is in identifying the CDR regions’ residues. The study utilized a dataset comprising of 120 antibodies, both free and bound. Only 4% of the statistically significant central residues belonged to the CDR L1, CDR L2, and CDR L3 regions. While almost 60% of the statistically significant central residues fell in the Fr regions flanking the CDR regions. Moreover, a staggering 61% of the statistically significant peripheral residues were found to belong to the CDR regions. On the other hand, we saw that some of the residues flanking CDR L1 and CDR L3 are centrally conserved, even though they are not a part of the CDR regions.

 

KEYWORDS: Amino acids, CDR regions, Centrality.

 

 


1.      INTRODUCTION:

Due to amino acid residue interaction, many biological functions and processes occur. These interactions could be represented graphically using vertices (amino acids) and edges (interactions) which collectively form an amino acid residue interaction networks. They provide better visualization in understanding these highly clustered small world protein-protein residue interaction networks(1). In such networks, edges between two amino acid residues are considered only when the distance between them is 8 A° or less, or else the two residues are considered to be disconnected or at an infinite distance from each other. Within the network analysis, there are various means of measuring the centrality of a node which eventually helps in determining the relative importance of that node within the network. One of them being closeness centrality. Closeness centrality of a node is the inverse of the average farness of that node from all the other nodes in the network(2).

 

According to facts, residues with high closeness centrality value play a vital role in transmitting information to other amino acid residues in the protein structure, thus fulfill essential roles in the network’s communication. Therefore, concluding from the study conducted by Fujihashi in 2006, it was seen that active site residues found in clefts or cavities on enzymes were observed to have high closeness centrality value. However, binding site residues with flatter shapes on specific proteins, don’t have high closeness centrality score. This study recommended that the efficiency of closeness centrality in identifying functionally important residues is related to the shape of the site (2). Thus closeness centrality can be used to identify those cavities or clefts that contain residues that have significant protein function. Statically significant central residues that are also conserved are called centrally conserved residues. From the previous detailed analysis of the centrally conserved residues and their location in the network, it is known that most of them are found clustered in surface cavities or clefts. However, the parameter of closeness centrality makes sure that these cavities or clefts only contain residues essential for protein function.

In antibodies, the 6 CDRs, three each on the light and heavy chain form a large cleft for antigen binding(3). Moreover, the only function of the variable domain is to identify the specific epitope and bind to it to form a reversible antigen-antibody complex(4). After noting the facts mentioned above, it can be said that the only functionally significant parts of the variable domain are the CDR regions(5). Thus their closeness centrality values were expected to be the highest(6). However, on progressing in our study by calculating the Z scores of all the residues in the antibodies of the dataset, the expected output was not seen in the result.

 

2 METHODS:

2.1   Dataset compilation:

To identify functionally important residues in Human Antibody Light Chain Variable Domain (HALCVD) and to determine their role in antigen-antibody interaction, a dataset of 28 nonredundant antibody sequences was created. Since the CDR regions are incorporated only in the variable domains of both the chains of the antibody and because our study was localized to light chains only, we retrieved the sequences of the variable domains of the light chains of all the human antibodies under study. Following are the steps that went into achieving the same.

 

2.1.1 HALCVD sequence dataset creation:

PDB ID’s of 120 human antibody sequences, both free and bound, were retrieved from the ABG database and SAbDAb respectively(7). Each PDB ID was then searched in the PDB database individually to collect the sequence of only the variable domain of the light chain in a FASTA format(8). This dataset created with the help of the annotation from Structural Classification of Protein (SCOP), which helps differentiate between the sequence of the variable and constant domains on the same chain.

 

2.1.2 Non redundant HALCVD sequence compilation:

It was essential for our study that we had a dataset that included only non-redundant HALCVD sequences. Thus protein sequences that exceeded certain similarity thresholds had to remove. Since the CDR regions are the source of hypervariability and antibody and comprise of approximately 30% of the variable domain, the threshold was set at 70% similarity. Further, HALCVD sequences with similarity higher than 70% were removed. Thus a dataset of 28 nonredundant HALCVD sequences was generated with the help of the program – CD hit Suite(9).

 

2.2   CDR region marking:

Sequences for CDR L1, CDR L2, and CDR L3 regions of all the antibodies in the dataset were individually found out using SAbDab’s non-redundant CDR database. A document created denoting the location of these CDR regions on each HALCVD sequence of the dataset by highlighting the CDR L1 (red), CDR L2 (blue) and CDR L3 (green) sequences for all the antibody light chains under study. For example:

>1DCL:L

 

PSALTQPPSASGSLGQSVTISCTGTSSNVGGYNYVSWYQQHAGKAPKVIIYEVNKRPSGVPDRFSGSKSGNTASLTVSGLQAEDEADYYCSSYEGSDNFVFGTGTKVTVLG

 

2.3   Multiple sequence alignment:

Multiple Sequence Alignment (MSA) was conducted with the help of Multiple Alignment using Fast Fourier Transform (MAFFT) program, to find the conserved residues in the HALCVD sequences all the antibodies of the dataset. The result of the MSA was modified to show CRD L1, CDR L2 and CDR L3 sequences for all antibodies. This result was later utilized to highlight centrally conserved residues (statistically significant central residues that are also conserved) as shown in Figure 4.

 

2.4   Network representation of antibody structures:

Each antibody structure in the dataset was represented as a network, where amino acids were nodes or vertices, and the interaction or contact between them formed the edges. In such networks, edges between two amino acid residues were considered only when the distance between them was 8 A° or less, or else the two residues were considered to be disconnected or at an infinite distance from each other.

 

2.5   Closeness centrality study:

Within the network analysis, there are various means of measuring the centrality of a node which eventually helps in determining the relative importance of that node within the network. One of them being closeness centrality. Closeness centrality of a node is the inverse of the average farness of that node from all the other nodes in the network.

 

Therefore, closeness centrality value Ci for an amino acid residue i can be represented as:

 

 

Where D (k, i) is the shortest distance between residue i and residue k and n is the total number of amino acid residues in the sequence(2).

 

With the help of a Program, the closeness centrality values (Ci) of all the residues of the antibodies in the dataset were determined, one antibody at a time. These were then used by the same program to determine the Z scores of all the residues in a given antibody. The formula represents z score:

 

 

Where,  is the Z score of residue i,  is the closeness centrality value of residue i,  is the closeness centrality average value for all residues of an antibody and  is the standard deviation.

 

 

The Z scores for all the residues were obtained, one antibody at a time. These were then used to divide residues of each antibody into two categories:

      Residues with Z scores>=1 or Statistically Significant Central residues

      Residues with Z scores<=-1 or Statistically Significant Peripheral residues

 

The same is represented in Table 1 which shows statistically significant central and peripheral residues of one HALCVD sequence of the database.


Table 1: Statistically significant central residues and peripheral residues of PDB ID: 1DCL

1DCL

Network parameters

Values

Number of amino acids

216.00

The Average Degree of a Network

9.86

The Clustering Coefficient (C)

0.56

The Assortative value

0.32

The Shortest path length (L)

4.94

The Closeness Centrality value

0.21

Structural and Functional Important amino acids:

Amino acid

Chain

Position

Z-score >= 1

THR

A

5

1.23

GLN

A

6

1.7

PRO

A

7

2.46

PRO

A

8

2.35

SER

A

9

2.12

ALA

A

10

1.66

SER

A

11

1.28

ILE

A

20

1.61

SER

A

21

1.47

CYS

A

22

1.61

TYR

A

88

1.09

TYR

A

89

1.03

THR

A

103

1.54

GLY

A

104

1.6

THR

A

105

1.88

LYS

A

106

1.67

VAL

A

107

1.43

THR

A

108

1.29

Amino acid

Chain

Position

Z-score <= -1

PRO

A

57

-1.14

SER

A

58

-1.57

GLY

A

59

-1.59

PRO

A

61

-1.06

ASP

A

62

-1.05

GLY

A

95

-1.22

SER

A

96

-1.22

ASP

A

97

-1.26

ASN

A

98

-1.15

 


Since the statistically significant central residues in all the HALCVD sequences were now known, the MSA output was modified to show the statistically significant central residues that were also conserved. These residues were called centrally conserved residues.

 

After obtaining the Z scores, as shown above, the following values were calculated and tabulated for HALCVD sequences of each antibody:

I.          Total Percentage of critical central residues that lie in the CDR regions.

II.          Percentage of significant central residues that lie in the regions flanking CDR L1, CDR L2, and CDR L3.

III.          Percentage of peripheral residues that lie in the CDR L1, CDR L2, and CDR L3 regions.

IV.          Total percentage of peripheral residues that lie in the CDR regions.

 

3       RESULTS AND DISCUSSIONS:

3.1   Multiple sequence alignment:

MSA tools are used to determine regions of conserved residues in closely or distantly related sequences. In our study, we used the MSA technique to determine conserved regions in 28 HALCVD sequences of the dataset. For this, we used Multiple Alignment using Fast Fourier Transform (MAFFT) program(10).

 

 

Figure 1: MSA output through MAFFT

 

As mentioned earlier, the variable domain of an antibody comprises 3 CDR regions separated by 4 Fr regions. The MSA output is shown in Figure1, clearly reflected that the maximum amount of variability was present in the three CDR regions. The hyper-variability regions saw a high ratio of different amino acids in a given position. These amino acids were found to be related to the most common amino acid found in that position. This result supports the fact that CDR regions contribute to the antibody’s specificity for binding antigens(11,12).

 

Conserved residues were seen localized to the Fr regions which form the β sheet to form a scaffold that holds the CDR residues in place for effective interaction with its corresponding epitope on the antigen. Thus most of the residues of the Fr regions comprise of stable amino acids which are also conserved.

3.2   Statistically significant central residues and peripheral residues

 

The only functionally significant parts of the variable domains are the CDR regions(13,14). Thus the values of their closeness centrality values were expected to be the highest. However, on progressing in our study by calculating the Z scores of all the residues in the antibodies of the dataset, the expected output was not seen in the result. The real observations are tabulated below in Table 2.


 

Table 2: Percentages of statistically significant central residues and peripheral residues that lie in various regions of the HALCVD sequences in the dataset

NO

PDB ID

Z SCORE >= 1 (%)

Z SCORE <= -1 (%)

CDR

CDR1 F

CRD2 F

CDR3 F

CDR

CDR1

CDR2

CDR3

1

1ADQ

0.0

0.0

0.0

58.8

72.7

18.2

18.2

36.4

2

1AXS

0.0

15.0

0.0

50.0

62.5

31.3

6.3

25.0

3

1BJ4

0.0

0.0

0.0

57.1

46.2

7.7

0.0

38.5

4

1BBJ

0.0

14.3

0.0

61.9

64.7

35.3

5.9

23.6

5

1C5C

0.0

0.0

0.0

62.5

69.2

23.1

7.7

38.5

6

1DCL

0.0

0.0

0.0

44.4

66.7

0.0

22.2

44.4

7

1DL7

35.0

10.0

0.0

15.0

15.8

0.0

0.0

15.8

8

1GC1

5.9

23.5

0.0

41.2

54.5

9.1

0.0

45.5

9

1IGM

25.0

6.3

0.0

25.0

50.0

0.0

16.7

33.3

10

1IKF

0.0

25.0

0.0

50.0

64.7

35.3

5.9

23.5

11

1MCW

0.0

23.5

0.0

70.6

72.2

38.9

5.6

33.3

12

1PW3

15.8

26.3

10.5

15.8

31.6

10.5

5.3

15.8

13

1YY8

0.0

16.6

0.0

61.0

62.5

31.3

6.3

25.0

14

2FEE

0.0

19.4

0.0

57.1

64.7

35.3

5.9

23.5

15

2FL5

0.0

0.0

0.0

61.1

100.

25.0

25.0

50.0

16

2RHE

31.5

5.3

0.0

21.0

31.6

0.0

5.3

26.3

17

3LMJ

0.0

0.0

0.0

60.0

83.3

38.9

5.6

38.9

18

3MLR

0.0

0.0

0.0

42.9

83.3

16.7

8.3

58.3

19

3MO1

0.0

0.0

0.0

60.0

58.8

29.4

5.9

23.5

20

3QCU

0.0

0.0

0.0

52.9

71.4

42.9

14.3

14.3

21

3S35

0.0

0.0

0.0

53.3

84.6

53.8

7.7

23.1

22

3SE8

0.0

0.0

0.0

56.3

56.3

37.5

12.5

6.3

23

3SE9

0.0

0.0

0.0

57.1

41.2

31.3

6.3

6.3

24

3U4B

0.0

0.0

0.0

58.8

55.6

11.1

11.1

33.3

25

4G5Z

0.0

23.8

0.0

52.4

68.8

37.5

6.3

25.0

26

4JY5

0.0

33.3

0.0

46.7

78.6

28.6

7.1

42.9

27

4LST

0.0

10.0

0.0

55.0

46.7

26.7

6.7

13.3

28

4LSU

0.0

0.0

0.0

50.0

66.7

27.8

22.2

16.7

AVG

4.0

9.0

0.4

49.9

61.6

24.4

8.9

28.6

 


 

3.2.1         Statistically significant central residues

 

Figure 2: Location of statistically significant central residues in the HALCVD sequences

 

All the residues of the HALCVD sequences with a Z score greater than or equal to 1, were considered as statistically significant central residues. According to the structure of the antibody paratope, most of these residues were expected to belong to the CDR regions. However, this most likely outcome was contradicted as only 4% of the statistically significant central residues belonging to the CDR L1, CDR L2, and CDR L3 regions. While almost 60% of the statistically significant central residues fell in the Fr regions flanking the CDR regions. When this data was further broken down, it was found that almost 9% of the statistically significant residues were found flanking CDR L1, whereas nearly 50% of the statistically significant residues were involved in flanking CDR L3, and only 0.38% of the statistically significant residues were flanking CDR L2. The same is represented in the form of a graph in Figure 2.

 

3.2.2 Statistically significant peripheral residues

To study the relevance of closeness centrality in determining the residues of the CDR regions, all the HALCVD sequences residues with a Z score less than or equal to-1 were considered as statistically significant peripheral residues. Theoretically, these would comprise the surface residues of  the antibody network.

 

Thus by filtering the residues of HALCVD sequences with Z scores equal to or below -1 it was observed that 61% of the statistically significant peripheral residues belonged to the CDR regions, of which almost 23% fell in CDR L1, 29% in CDR L3 and only 9% in CDR L2. The same is depicted in Figure 3.

 

 

Figure 3: Location of statistically significant peripheral residues in the HALCVD sequences

 

From this outcome, we understand that even though CDR L1, CDR L2, CDR L3, CDR H1, CDR H2 and CDR H3 form a large cleft for antigen binding, the residues in the CDR regions on light chain do not interact with the residues in the CDR regions on the heavy chain. As a result of this residue in the CDR regions do not appear as a part of a large cleft in the antibody network. Instead, CDR L1, CDR L2, and CDR L3 appear as three consecutive bulges on the surface of L chain. As a result of them forming variable loops on β sheets created by Fr region and the same applies to the CDR regions on the heavy chain. From these findings, we can say that even though the CDR regions are the major functional parts of HALCVD sequences, the parameter of closeness centrality cannot be used in identifying the CDR regions’ residues due to their shape.

 

However, the question, why some of the CDR flanking residues were recognized as statistically significant central residues? remained unanswered. To find a valid explanation for the same the results of MSA and closeness centrality studies were merged to give centrally conserved residues.

 

 

3.3 Centrally conserved residues

Statically significant central residues that are also conserved are called centrally conserved residues. From the previous analysis of the centrally conserved residues and their location in the network, it is known that most of them are found clustered in surface cavities or clefts. However, the parameter of closeness centrality makes sure that these cavities or clefts contain residues essential for protein function (2).

 

From our study on closeness centrality values of the residues in HALCVD sequences, we observed that 60% of the statically critical central residues belong the Fr region, flanking the CDR regions. When we collaborated this information with the result obtained from MSA, we obtained a result depicted in Figure 6. In this figure the residues of the CDR regions are specified as CDR L1 in red, CDR L2 in blue and CDR L3 in green and the area shaded in grey consists of all the centrally conserved residues.

 

 

Figure 4: Multiple sequence alignment of HALCVD sequences showing CDR L1, CDR L2, CDR L3 and centrally conserved residues

 

Thus on observing the Figure 4, we saw that the residues flanking CDR L1 and CDR L3 are centrally conserved thus these residues are functionally important in the antibody network.

 

These central residues are capable of efficient transmission of information in the physical or chemical form of the other amino acid residues in the antibody, even though they are not a part of the CDR regions. Since these centrally conserved residues of the Fr region are a part of the variable domain, it is safe to assume that their function is also related to the formation of the antigen-antibody complex.

Since these centrally conserved residues belong to the Fr regions that form the β sheet scaffold to hold the CDR regions in place, they are embedded in the paratopic cleft and exposed like CDR residues in the form of consecutive bulges(15). However, because they are the residues that flank the CDR residues, they are not very deeply buried in the β sheet scaffold. Therefore, they too are capable of interacting with the antigen epitope, when the antibody encounters the antigen.

 

Thus functionally, these CDR flanking centrally conserved residues (CDRFCCR), may have the following possibilities:

 

·      They trigger a conformational change when the CDR regions (paratope) recognize the specific epitope on the antigen. This conformational change enables proper docking of the antibody on the antigen. (ball in glove)

·      On the formation of the antigen-antibody complex, they trigger signals through the antibody that enables the constant domain to execute a suitable immune response.

 

4. CONCLUSION:

The goal of this study is to determine how useful closeness centrality is in identifying the CDR regions’ residues. According to our study, the efficiency of closeness centrality in identifying functionally important residues is related to the shape of the site. In antibodies, the 6 CDRs, three each on the light and heavy chain form a large cleft for antigen binding. Moreover, the only function of the variable domain is to identify the specific epitope and bind to it to form a reversible antigen-antibody complex. After noting the facts mentioned above, it was assumed that the only functionally significant parts of the variable domain are the CDR regions. Thus the values of their closeness centrality values were expected to be the highest. However, on progressing in our study by calculating the Z scores of all the residues in the antibodies of the dataset, the expected output was not seen in the result. Only 4% of the statistically significant central residues belonged to the CDR L1, CDR L2, and CDR L3 regions. While almost 60% of the statistically significant central residues fell in the Fr regions flanking the CDR regions. Moreover, a staggering 61% of the statistically significant peripheral residues were found to belong to the CDR regions. From these findings, we concluded that even though the CDR regions are the major functional parts of HALCVD sequences, the parameter of closeness centrality cannot be used in identifying the CDR regions’ residues. As residues in the CDR regions do not appear as a part of a large cleft in the antibody network, instead CDR L1, CDR L2, and CDR L3 appear as three consecutive bulges on the surface of L chain.

 

5. ACKNOWLEDGEMENT:

The author thanks the Vellore Institute of Technology for providing the necessary computational equipment to carry out the research work.

 

6. REFERENCES:

1.     Boginski V, Commander CW. Identifying critical nodes in protein-protein interaction networks. In: Clustering challenges in biological networks. World Scientific; 2009. p. 153–67.

2.     del Sol A, Fujihashi H, Amoros D, Nussinov R. Residue centrality, functionally important residues, and active site shape: Analysis of enzyme and non-enzyme families. Protein Sci. 2006 Sep;15(9):2120–8.

3.     Adolf-Bryfogle J, Xu Q, North B, Lehmann A, Dunbrack RL. PyigClassify: A database of antibody CDR structural classifications. Nucleic Acids Res. 2015;

4.     Kabat EA, Wu TT, Bilofsky H. Unusual distributions of amino acids in complementarity determining (hypervariable) segments of heavy and light chains of immunoglobulins and their possible roles in specificity of antibody-combining sites. J Biol Chem. 1977;

5.     Kabat Ea, Wu Tt. Attempts To Locate Complementarity‐Determining Residues In The Variable Positions Of Light And Heavy Chains. Ann N Y Acad Sci. 1971;

6.     Estrada E. Virtual identification of essential proteins within the protein interaction network of yeast. Proteomics. 2006;

7.     Dunbar J, Krawczyk K, Leem J, Baker T, Fuchs A, Georges G, et al. SAbDab: The structural antibody database. Nucleic Acids Res. 2014;

8.     Sheriff S, Silverton EW, Padlan EA, Cohen GH, Smith-Gill SJ, Finzel BC, et al. Three-dimensional structure of an antibody-antigen complex. Proc Natl Acad Sci. 1987;

9.     Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010;26(5):680–2.

10.   Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;

11.   Ofran Y, Schlessinger A, Rost B. Automated Identification of Complementarity Determining Regions (CDRs) Reveals Peculiar Characteristics of CDRs and B Cell Epitopes. J Immunol. 2008;

12.   Wu TT, Kabat EA. An analysis of the sequences of the variable regions of Bence Jones proteins and myeloma light chains and their implications for antibody complementarity. J Exp Med. 1970;

13.   Mirsky A, Kazandjian L, Anisimova M. Antibody-specific model of amino acid substitution for immunological inferences from alignments of antibody sequences. Mol Biol Evol. 2015;

14.   Wang W, Singh S, Zeng DL, King K, Nema S. Antibody structure, instability, and formulation. Journal of Pharmaceutical Sciences. 2007.

15.   Krawczyk K, Baker T, Shi J, Deane CM. Antibody i-Patch prediction of the antibody binding site improves rigid local antibody-antigen docking. Protein Eng Des Sel. 2013;

 

 

Received on 04.09.2018          Modified on 25.09.2018

Accepted on 13.10.2018        © RJPT All right reserved

Research J. Pharm. and Tech 2018; 11(12): 5569-5575.

DOI: 10.5958/0974-360X.2018.01013.2