Helen Frankenthaler Foundation

T-cell tetramer staining antigen

Immunodominant T-cell epitopes from the SARS-CoV-2 spike antigen reveal robust pre-existing T-cell immunity in unexposed individuals

Introduction

Uncovering the immunological responses to COVID-19 infection will help in designing and developing next-generation therapies and manage the treatment of critical COVID-19 patients. Many host factors associated with mild or severe disease symptoms have been reported. For example, leukopenia, exhausted CD8 T-cells, higher levels of T H 2 cytokines in serum, a high titer of neutralizing antibodies, blunted interferon response, dysregulation of the myeloid cell compartment, activated NK cells, and the size of the naïve T-cell compartment is associated with critically ill patients. This wide range of variable factors shares a common immunological underpinning—that of a systemic dysregulation in immune homeostasis due to the failure of the host immune system to clear the virus during the early stages of the infection. Animal and human studies have demonstrated that susceptibility to respiratory virus infections is associated with compromised CD8 T-cell immunity. A delay in the activation of CD8 T-cells and a lack of early IFN-γ production by the innate immune arm leads to an increase in viral load triggering overactivation of the innate and the adaptive arm of the immune system leading to a loss of immune homeostasis resulting in severe disease phenotype, including death. Therefore, an early wave of strong CD8 T-cell response may delay viral titer build-up, allowing rapid clearance of the virus by the immune system without perturbing immune homeostasis.

Healthy humans not exposed to COVID-19 show pre-existing CD4 and CD8 T-cell immunity to SARS-CoV-2 antigens. The pre-existing immunity to CD4 and CD8 T-cells was detected against structural and non-structural SARS-CoV-2 proteins by overlapping 15-mer peptide pools. The existence of a pool of SARS-CoV-2-reactive T-cells in unexposed individuals is thought to arise from coronaviruses that cause common cold. Whether pre-existing immunity provides any protection to SARS-CoV-2 infection or contributes to a faster recovery from infection remains speculative. Besides, it is unclear whether a pre-existing immunity, involving either CD4 or CD8 T-cells, or both, is required for maximal protection. Identifying robust pre-existing immunity against SARS-CoV-2 in the healthy population can be used as a measure to assess the mode of recovery and also viral spread in the global population.

In this study, we identified strong CD8 T-cell-activating epitopes from SARS-CoV-2 spike protein by a combination of epitope prediction and T-cell activation assays in healthy donors unexposed to SARS-CoV-2. The rationale for identifying epitopes that favor CD8 T-cell activation was twofold. First, robust CD8 T-cell activating epitopes can be formulated as second-generation vaccines for short and long-term protection against viral infection. Second, detection of pre-existing immunity in healthy donors using epitopes that favor CD8 T-cell activation may provide a framework to understand the complex immune responses observed in clinical settings. It may also shed light on the differences in morbidity and mortality in different population groups across the globe.

We developed a proprietary algorithm OncoPeptVAC to predict CD8 T-cell activating epitopes across the SARS-CoV-2 proteome. OncoPeptVAC predicts binding of the HLA-peptide complex to the T-cell receptor (TCR). We selected a cocktail of eleven 15-mer peptides with a broad class-I and class-II coverage and favorable TCR engagement predicted by the algorithm. The cocktail of peptides was tested for T-cell activation in healthy donors from the USA and India unexposed to COVID-19. We observed higher CD8 T-cell activation by the 11-peptide pool compared to the overlapping 15-mer peptide pools from the spike-S1 and S2 proteins. Homology analysis of the selected peptides with other coronavirus spike proteins indicated a lack of significant amino acid identity with any of the 11 peptides, suggesting engagement of one or more peptides in the pool to cross-reactive TCRs from other viruses, not particularly from a coronavirus. Bulk and single-cell TCR analysis revealed expanded clonotypes recognizing epitopes from CMV, Influenza-A, and other viruses to which most of us are exposed. Taken together, our findings support that strong pre-existing CD8 T-cell immunity in unexposed donors is contributed by cross-reactive TCRs from other viruses. Significantly, we discovered multiple immunodominant epitopes in our predicted pool of peptides that favored CD8 T-cell activation. Finally, we show that our cocktail of 11-peptides induced a robust immune response in convalescent patients demonstrating that these peptides are recognized by infected patients. Taken together, our study uncovered strong pre-existing CD8 T-cell immunity against SARS-CoV-2 using a small set of 11 epitopes that engaged cross-reactive TCRs recognizing epitopes from other viruses, not necessarily common cold viruses belonging to the coronavirus family as hypothesized by other studies. Additionally, our findings raise the possibility that many individuals carrying antigen-experienced T-cells against other viruses may be naturally protected against COVID-19 without prior SARS-CoV-2 infection.

RESULTS

Prediction of immunogenic epitopes favoring CD8 T-cell activation

A deep CNN model OncoPeptVAC was implemented to predict the immunogenicity of peptides based only on the peptide and HLA sequences. A total of 8870 immunogenic and non-immunogenic peptide-HLA pairs were obtained from the IEDB. The BLOSUM encoding was used to represent the peptide and HLA molecules. The BLOSUM substitution scores encode evolutionary and physicochemical properties of the amino acids. In addition, hydrophobicity indices and predicted HLA binding scores were also used to represent the peptide and HLA sequences.

OncoPeptVAC used the CNN model with multiple 2D convolutional layers combined with max-pooling to confirm the additive effect of different input features on the performance of the model. All the model versions were trained using fivefold cross-validation. The AUC of the final model was 0.87 based on a blind test dataset (Fig.1A). The prediction algorithm showed a sensitivity of 0.64 and a specificity of 0.84 based on the score cut-off of 0.2 (Fig.1B). By increasing the cut-off score to 0.5, the specificity could be further increased to 96.8 with a concomitant loss in sensitivity. OncoPeptVAC reduced the number of false positives significantly compared to the HLA-binding rank (Compare Fig.1B,C) reducing the number of epitopes by 30% that needed to be screened in a T-cell activation assay to identify true immunogenic epitopes. For example, to identify 50% (119 out of 238) of the immunogenic peptide-HLA pairs present in the blind test dataset, 256 top peptide-HLA pairs from OncoPeptVAC prediction needed to be screened, compared to 753 top peptide-HLA pairs predicted by netMHCpan-4.0.

Figure 1

Identification of immunogenic epitopes from SARS-CoV-2 by OncoPeptVAC. (A) ROC curves of OncoPeptVAC TCR-binding and netMHCpan-4.1 HLA binding algorithms. A blind dataset of non-immunogenic or immunogenic HLA class-I binding T-cell epitopes from IEDB was used to assess the performance of OncoPeptVAC (cyan). The HLA-binding affinity of the epitopes expressed as percentile rank < 1% was used to assess the performance of netMHCpan-4.1 in predicting true immunogenic epitopes (orange). (B) Separation of immunogenic from non-immunogenic epitopes by OncoPeptVAC score. (C) Separation of immunogenic from non-immunogenic epitopes by HLA-binding percentile rank. (D) Schematic showing the steps used to identify immunogenic epitopes from SARS-CoV-2 proteome. (E) Number of immunogenic epitopes identified in different SARS-CoV-2 antigens. (F) HLA-A, B and C-restricted epitopes from SARS-CoV-2 proteome.

The prediction algorithm was applied to the SARS-CoV-2 proteome and screened against 23 class-I HLAs covering over 98% of the world population. A schematic of the in silico screening approach is shown in Fig.1D. Briefly, 9–11-mer peptides from the SARS-CoV-2 proteome were screened for TCR-binding against 23 HLA, and peptides with OncoPeptVAC score > 0.2 were analyzed for class-I HLA binding. Peptide-HLA pairs with a high predicted binding affinity (< 1 percentile rank) were selected, their length extended to 15-mer, and screened for class-II HLA binding (See “Methods” for details). Peptides with favorable TCR binding and class-I/II HLA binding features were selected for further validation. The number of predicted immunogenic epitopes from SARS-CoV-2 protein-coding genes is shown in Fig.1E. The distribution of OncoPeptVAC scores against different class-I HLA genes indicates a higher number of favorable TCR-binding peptides for HLA-B and C compared to HLA-A (Fig.1F). Natural biases in HLA restrictions have been reported for immunogenic HIV epitopes.

T-cells from unexposed donors respond to OncoPeptVAC-predicted peptides

We performed T-cell activation assay using a set of 11 prioritized epitopes from the SARS-CoV-2 spike antigen (Table 1) in unexposed donors. We selected epitopes from the spike antigen because it is highly immunogenic and generates strong B and T-cell responses. Besides, identifying strong immunogenic epitopes from the spike antigen can become useful reagents to study mechanisms of immune toxicity, and for long-term immune monitoring studies in naïve and vaccinated populations. The 15-mer peptides contained in the 11 selected epitopes cover different segments of the RBD and the non-RBD regions of the spike antigens and few peptides harbor A