The therapeutic targeting of the immune system, for example in vaccinology and cancer treatment, is a challenging task and the subject of active research. Several in silico tools used for predicting immunogenicity are based on the analysis of peptide sequences binding to the Major Histocompatibility Complex (pMHC). However, few of these bioinformatics tools take into account the pMHC three-dimensional structure. Here, we describe a new bioinformatics tool, MatchTope, developed for predicting peptide similarity, which can trigger cross-reactivity events, by computing and analyzing the electrostatic potentials of pMHC complexes. We validated MatchTope by using previously published data from in vitro assays. We thereby demonstrate the strength of MatchTope for similarity prediction between targets derived from several pathogens as well as for indicating possible cross responses between self and tumor peptides. Our results suggest that MatchTope can enhance and speed up future studies in the fields of vaccinology and cancer immunotherapy.
The Immune System (IS) is the primary defense of an organism against a wide range of exogenous pathogens like viruses, bacteria, and fungi, as well as endogenous pathological conditions like tumor cells. However, an inadequate immune response to self, healthy cells, or peptides, is not desirable, as it can lead to autoimmune diseases. Several cell types and molecules, such as cell receptors, chemokines, and interleukins, are involved in the immune response, and the complex interactions between these components drive the human immune system.
The first step for the IS to mount an immune response and defend the organism is to recognize possible harmful pathogens. One of the ways the human IS accomplishes this task is by loading the Major Histocompatibility Complex (MHC) with a peptide (pMHC) and presenting it to immune cells. This presented epitope can be derived from a self-protein, a protein from a pathogen, or a tumor cell protein. There are two main MHC types - MHC class I (MHC-I) and MHC class II (MHC-II) - that differ essentially in which cells they are expressed by and by which immune cells they are recognized. The MHC loci are called Human leukocyte antigens (HLA) in humans.
The cells responsible for pMHC interaction are the T lymphocytes. Among the different T lymphocytes subtypes, two subpopulations coordinate the immune response: the CD8+, or cytotoxic T cells, and the CD4+, or helper T cells. While CD4+ binds to MHC-II, which are expressed by Antigen Presenting Cells (APCs), CD8+ binds to MHC-I. The focus of the current work, MHC-I, is virtually expressed by all nucleated cells and is the central player in presenting every peptide produced inside these cells. The presented epitope can be recognized either as self or non-self. If the epitope is recognized as non-self, a signaling cascade will be triggered, leading to the apoptosis of the infected or tumor cell. However, this recognition is not strictly specific: The T-cell receptor (TCR) not only recognizes an exact match of the epitope but also similar ones. This latter event is called cross-reactivity and can lead to unwanted immune responses. Expanding recognition broadness has a positive side since it allows a reduction in the number of TCRs required. However, an epitope derived from a virus protein can mimic a self-epitope and thus trigger an autoimmune disease. Furthermore, this is a major limitation to the immune response to tumors, given the high similarity between proteins from normal and tumor cells, making appropriate response difficult for CD8+ cells.
The triggering of an immune response depends on the protein interaction between the TCR and the pMHC, in which interface complementarity is a pivotal element. Several physicochemical elements govern this event, such as electrostatic potential distribution. Several works have already described its central role in protein interactions in intracellular and extracellular environments. However, beyond that, the charge complementary has an additional function: to guide the anchoring of the protein interaction system more than any other factors.
Cross-reactivity becomes particularly important in vaccine development. It is crucial to check whether the vaccine will be effective against all subtypes of a given pathogen (as in the case of dengue viruses, where cross-reactivity between subtypes can lead to hemorrhagic fever). Likewise, when developing a new immunotherapeutic approach, it is necessary to ensure that the target will not trigger cross-reactivity with a self-protein. Given that testing all possible pMHCs in vitro is impossible, in silico analyses can be helpful. Some cross-reactivity predictors are available, mainly using linear peptide sequences as input, and were primarily designed to predict allergic processes. However, it is already known that some epitopes show cross-response despite sharing fewer than 50% of amino acid residues in their linear sequence, which implies substantial difficulties for such predictors to predict cross-reactivity correctly. For this reason, we developed a new cross-reactivity prediction tool, the MatchTope, which uses protein structural information to predict similarities between pMHC-I complexes, facilitating the development of new vaccines and immunotherapies. Using several available datasets, we verified that MatchTope achieves excellent agreement with experimental results, indicating that this tool can significantly improve vaccine development for several diseases and cancer immunotherapeutic treatments.
The MatchTope tool uses the calculation of molecular electrostatic potentials (MEP) of MHC class I loaded with different peptides, followed by clustering the different peptide-MHC class I (pMHC) complexes based on their MEPs similarity. The application of MEP differences as a measure of pMHC class I similarity was previously described by our research team.
The steps involved in our analysis are displayed in Figure 1. Prior to the analysis, the user should provide a set of pMHC class I files in PDB format (a minimum of three files are required). Since only few crystallographic complexes exist to date, the input pdb file will often stem from a modeling approach. The pdb file contains three columns holding the 3D coordinates of each protein atom as well as some additional information, such as occupancy, temperature factor, element name, charge, radius, or other properties, depending on the source. Since some columns of non-standard pdb files for modeled complexes were found to cause problems during the MEP calculation, these were deleted in a pre-processing step using a bash script.
The next step involves a repositioning of the 3D orientation of all input complexes by superimposing them. This process is important to ensure the comparison of the same electrostatic regions in different pMHCs. To achieve this, we use a Python script to call the PyMOL ‘Fitting’ function. This function superimposes the pdb input with a predefined model pMHC pdb structure to unify input positions.
After the fitting process, MatchTope starts to calculate the electrostatic similarity of the complexes by using the standalone version of PIPSA. The PIPSA (Protein Interaction Property Similarity Analysis) software is an established tool for analyzing protein electrostatic interaction similarities. We added modifications to PIPSA to adapt it for the pMHC analysis, accounting for the typical elongated shape of the pMHC binding cleft, which differed from the globular protein shape PIPSA has largely been previously applied to; these modifications are available in PIPSA version 3.2 or later. PIPSA first calculates the MEP using the University of Houston Brownian Dynamics (UHBD) program. PIPSA creates a ‘skin’ around each pMHC and then the MEPs of each pMHC complex are compared. Besides calculating overall electrostatic similarities for the full proteins in the complete skins, the algorithm also allows for calculating similarities in a focused region. For this study, a cylinder in the pMHC cleft was considered, and only regions of protein skins residing within this cylinder were used for computing similarity indices, as shown in Figure 2. Using this focused region, we can reduce the noise caused by identical surroundings, and thereby avoid erroneous clustering of the results.
The final part of the analysis, the clustering process, uses the similarity indices calculated during the PIPSA run as input. To group electrostatically similar pMHCs in the same cluster, MatchTope uses an R package called ‘pvclust’, which performs a hierarchical clustering combined with a bootstrap of the input data to validate the clustering branch. The cluster package requires some user-defined arguments. We used the “correlation method” to calculate the distance between branches, and the “complete method” as the cluster method. After testing all other criteria, these two arguments yielded the best correlation with the in vitro results.
To validate MatchTope, we used four data sets which were obtained from previously published articles. A list of all considered epitopes, stating also which of them trigger in vitro cross-reactivity, is shown in Supplementary Table 1, and data on input superposition and model pdb structures is shown in Supplementary Table 2. The low average RMSD obtained (0.019 Angströms, considering all protein atoms) indicates that all MHC structures were well superimposed.
The first data set used for MatchTope validation was from a study testing a Hepatitis E Virus (HEV)-Specific T Cell Receptor against some epitopes derived from RNA-dependent RNA polymerase (HEV.1527), non-muscle Myosin Heavy Chain 9 (MYH9.478) and from other proteins. The in vitro assays show cross-reactivity between HEV.1527 and MYH9.478 and a non-cross recognition between HEV.1527 and ACTB.266. Figure 3 presents the results obtained with MatchTope during the validation process. Cluster letter A depicts two groupings: 1 and 2. Grouping 1 clustered HEV.1527 and MYH9.478, matching with the in vitro results while also putting ACTB.266 on the most distant branch from grouping 1. Grouping 2 clustered different epitopes, but no experimental information regarding potential cross-reactivity was available in the original publication.
In the second data set, six epitopes derived from throat cancer were used. In this study, two major clusters of epitopes are presented, which trigger responses from different TCRs. Within each