-
To identify CD8+ T cell-related immune genes, we obtained immune-related gene data from the IMMPORT (https://www.immport.org/home) and InnateDB (https://www.innatedb.ca/) databases. We then accessed single-cell data of TNBC from the TISCH database (https://tisch.comp-genomics.org) to extract the expression profile of CD8+ T cells.
In the GSE110686 dataset, we divided the cells into 13 different subpopulations based on the expression of marker genes (Figure 1A). Subsequently, we performed cell annotation and identified six cell types: CD4+ Tconv, CD8+ T, CD8+ Tex, Mono, Tprolif, and Treg cells (Figure 1B). The pie chart shows that the sample contains 1,681 CD8+ T cells (Figure 1C).
Figure 1. Identification of CD8+ T cell-related immune genes. (A, B) Triple-negative breast cancer single-cell sequencing UMAP cluster analysis visualization, there are six different cell types from 13 clusters. (C) Number of different types of cells in single-cell sequencing. (D) Venn map acquisition of 67 intersection genes.
To identify CD8+ T cell genes specifically related to immunity, we analyzed the differential gene expression within the CD8+ T cells from the GSE110686 sample. A total of 160 differentially expressed genes were identified (P < 0.05). To narrow down the list to immune-related genes, we intersected the differentially expressed genes with immune-related genes and generated a Venn plot. This resulted in the identification of 67 CD8+ T cell-related immune genes (Figure 1D). These genes represent key targets for further investigation of the immune response in TNBC.
-
To obtain relevant core genes for further analysis, we constructed a PPI network comprising 67 genes using the STRING database (Figure 2A). The number of nodes for each gene was determined (Figure 2B). Notably, CD8A and STAT1 emerged as pivotal contributors within the network, with 41 and 39 nodes, respectively. To identify genes suitable for model construction, a comprehensive univariate analysis encompassing all genes was performed (Figure 2C), which led to the identification of eight key prognostic genes: XCL1, CXCL13, CXCR6, STAT1, GBP2, PDCD1, GZMB, and FASLG (P < 0.1).
Figure 2. Construction of PPI network and acquisition of core genes. (A) Construction of PPI Network for Intersecting Genes. (B) Histogram showing the number of nodes of different genes in the network graph, the bigger the number means more genes associated with it. (C) Screening of 8 prognostic-related differentially expressed genes.
-
To mitigate the risk of overfitting the prognostic features, LASSO regression was performed on the genes. By setting the primary value of Log (λ) as the minimal deviation possibility, three genes associated with the prognosis of TNBC were extracted (Figure 3A–B). Subsequently, a risk scoring formula was developed as follows:
Figure 3. Construction and validation of a prognostic model. (A, B) LASSO regression to determine the optimal outcome λ Values. (C, D) To validate the model in TCGA and METABRIC samples respectively, both of the training set and validation set show significant differences (P < 0.05). (E–G) Prognostic analysis of the genes involved in constructing the model in high and low-risk groups. (H, I) Univariate and multivariate analyses of the model (J, K) To validate the predictive ability of the model.
$$ riskscore=\sum _{k=1}^{n}coef\left({gene}^{k}\right)\times expr\left({gene}^{k}\right) , $$ where "coef" represents the correlation coefficient of the gene and "expr" represents the expression level of the gene in each sample. The prognostic statuses of the high- and low-risk groups were compared using the risk scoring formula in TCGA and validated using the METABRIC sample (Figure 3C–D). The results demonstrated that the model effectively predicted patient prognosis (P < 0.1), with a cutoff value of 40%. Moreover, prognostic analysis was conducted on three genes (CXCL13, GBP2, and GZMB), which revealed significant differences (Figure 3E–G).
Univariate and multivariate analyses were performed to verify the predictive efficacy of this model for clinical traits (Figure 3H–I). The results indicated significant differences in grading and risk scores (< 0.05). Furthermore, the model prediction analysis demonstrated the model’s ability to effectively predict the 1-year, 3-year, and 5-year survival rates of patients (Figure 3J). Additionally, the AUC curve revealed that the model better predicted patient grading and risk scores, whereas its predictive performance based on age was less satisfactory (Figure 3K).
-
KEGG pathway analysis was conducted on samples from the high- and low-risk groups to explore the crucial biological functions and pathways associated with the identified core genes. The results revealed that the high-risk group primarily exhibited enrichment in cardiac muscle contraction, drug metabolism via cytochrome P450, extracellular matrix (ECM) receptor interaction, and xenobiotic metabolism via the cytochrome, porphyrin, and chlorophyll metabolism pathways. Conversely, the low-risk group demonstrated enrichment in the chemokine signaling pathway, cytokine receptor interaction, hematopoietic cell lineage, natural killer cell-mediated cytotoxicity, and T cell receptor signaling pathway (Figure 4A–B).
Figure 4. KEGG, mutation, and immune checkpoint analysis. (A–B) KEGG analysis in high- and low-risk groups. (C–D) Gene mutation frequency analysis in high and low risk groups. (E) Differential expression of CD274 in high and low risk groups. (F) Correlation analysis between CD274 and risk score.
Furthermore, gene mutation analysis was performed in both risk groups, identifying TP53 and TTN as the genes with the highest mutation frequencies (Figure 4C–D). Immunological checkpoint analysis revealed significant differences in CD274 expression between the high- and low-risk groups (Figure 4E), which exhibited a strong negative correlation with the risk score (Figure 4F).
-
Immune cells play crucial roles in the immune microenvironment. To gain an insight into the distribution of different immune cells in the high- and low-risk groups, we conducted immune cell infiltration (Figure 5A) and immune functional analyses (Figure 5B). Immune infiltration analysis revealed higher levels of immune infiltration by M0 and M2 cells in the high-risk group, whereas memory B cells, CD8+ T cells, M1 cells, and other cells exhibited higher expression in the low-risk group. Immune functional analysis indicated that the low-risk group exhibited higher scores for all immune pathways, which may account for the higher survival rates observed in this group.
Figure 5. Immune landscape of high and low risk groups. (A) Immune infiltration analysis of high and low risk groups. (B) Immune functional analysis of high- and low-risk groups. (C) Difference of immune scores among different classifications, the expression of CD8+ T cell-related genes also showed significant differences in all subtypes. (D) Prognostic analysis of high- and low-risk groups. (E) Correlation between immune scores and immune treatment responses. (F) Area verification prediction accuracy under AUC curve.
To assess the applicability of this model across all samples, we divided the 185 TNBC samples into four subtypes based on the expression of immune genes; significant differences were observed among the four subtypes (Figure 5C). Immune subtype data were retrieved from the xenabrowser (https://xenabrowser.net/datapage). Additionally, significant differences were observed in survival status between the high- and low-risk groups (Figure 5D), further validating the reliability of this grouping.
Furthermore, to evaluate the significance of this grouping for clinical treatment, we analyzed the correlation between different immunotherapies and risk scores (Figure 5E). The results indicated differences in the immune treatment response between the two groups, with the group with lower scores displaying stronger responsiveness. The ROC curve exhibited an AUC of 0.604, indicating reliable prediction of the immunotherapeutic effect in patients through immune scoring (Figure 5F).
Predicting the Prognosis and Immunotherapeutic Response of Triple-Negative Breast Cancer by Constructing a Prognostic Model Based on CD8+ T Cell-Related Immune Genes
doi: 10.3967/bes2024.065
- Received Date: 2023-08-30
- Accepted Date: 2023-11-24
-
Key words:
- Breast Cancer /
- Immunotherapy /
- Prognosis /
- CD8+ T cells /
- PD-L1
Abstract:
The authors declare that this study was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.
&These authors contributed equally to this work.
Citation: | Nani Li, Xiaoting Qiu, Jingsong Xue, Limu Yi, Mulan Chen, Zhijian Huang. Predicting the Prognosis and Immunotherapeutic Response of Triple-Negative Breast Cancer by Constructing a Prognostic Model Based on CD8+ T Cell-Related Immune Genes[J]. Biomedical and Environmental Sciences, 2024, 37(6): 581-593. doi: 10.3967/bes2024.065 |