Integrated analysis across studies improves statistical power by extending sample size
The aim of this study was to substantiate previous scRNA-seq results from CSF cells by integrating existing datasets from RRMS patients and controls across studies. We first obtained and analytically integrated two datasets including 4 RRMS vs 4 control patients (dataset 1) [11], and 11 RRMS vs 2 control patients (dataset 3) [13]. In addition, we incorporated 5 treatment-naïve RRMS vs 5 control patients (dataset 2, partially published in [17]) thus achieving a total of 20 RRMS and 11 control patients (Fig. 1A).
Inclusion criteria were largely comparable between studies which enrolled both CIS and RRMS patients in relapse and diagnosed according to the 2017 McDonald criteria (Methods). None of the patients had received immunomodulatory treatment. The diagnosis was either confirmed by follow-up within 4 weeks [11] or definitively diagnosed at study entry [13]. The study by Schafflick et al. [11] explicitly excluded concomitant autoimmune diseases. Control patients notably differed between studies and encompassed 9 patients with idiopathic intracranial hypertension (IIH; dataset 1 + 2) and 2 healthy controls (dataset 3) (Additional file 5: Tab. S1). We collected available clinical meta-data from all studies and found no significant differences in the distribution of age and sex across the cohorts (Additional file 5: Tab. S1, Additional file 1: Fig. S1A). Specifically, the median age of the MS patients in cohort one was 38 years, in the second cohort was 44 years and in the third cohort was 44 years (p = 0.66). The median age of the IIH patients in cohort one was 28.5 years, in cohort two 35 years and the median age of the healthy controls in cohort three was 30 years (p = 0.82). We then analytically integrated all available single cell data resulting in 80,187 RRMS-derived single-cell transcriptomes (subsequently denoted as ‘cells’ for simplicity) and 25,764 control cells (Additional file 1: Fig. S1B; Additional file 5: Tab. S1). Data were processed with the Cell Ranger / Seurat v4.0 bioinformatics pipeline [18]. The scRNA-seq chemistry for the first and second cohort was 10 × 3' and for the third cohort was mixed between 10 × 3′ and 10 × 5′ (Additional file 5: Tab. S1). We first tested for gross systematic technical bias between the datasets. The median cell number per sample in cohort one was 2817 (IQR25–75 = 1338–3739), in the second cohort was 3,555 (IQR25–75 = 1553–5096) and in the third cohort was 3269 (IQR25–75 = 2364–4844) (Additional file 1: Fig. S1C). Mean genes detected per cell in cohort one was 1005, in the second cohort was 853 and in the third cohort was 1077 (Additional file 5: Tab. S1, Additional file 1: Fig. S1C). Batch effects were removed using SCTranformation (Additional file 2: Fig S2).
We then performed a principal component analysis (PCA) using all genes detected in all cells across all samples (‘pseudo-bulk’). We found that the separation between samples did not systematically differ across different cohorts although there were two individual outliers (Additional file 3: Fig. S3). These two outliers from dataset 2 were not characterized by apparent differences in clinical terms and were therefore included in further analyses. This argues against major systematic technical bias across cohorts and scRNA-seq chemistries.
Accessible bioinformatic tool allows querying cellular and transcriptional patterns in the CSF in MS
Following this basic technical and clinical validation, we generated a freely accessible visualization and analysis tool to facilitate investigating CSF cells in MS vs control without prior bioinformatic expertise (http://CSFinMS.bxgenomics.com/). Specifically, this tool provides two levels of cell type annotation based on Multimodal Reference Mapping (Seurat v4) inferred from a CITE-seq reference dataset (162,000 blood cells, 228 antibodies) [18]: (i) Level 1 annotation of 9 general cell types and (ii) Level 2 annotation with 31 individual subclusters and thus more detailed annotation of cell subsets. On each level, marker genes and genes differentially expressed between RRMS vs controls can be interactively queried and visualized (Methods).
The combined dataset including 105,951 CSF single-cell transcriptomes was projected onto a latent space and visualized using Uniform Manifold Approximation and Projection (UMAP) plots defined by the reference dataset [18]. Using the first annotation level (Fig. 1B, Additional file 6: Tab. S2), we identified nine main clusters (named level 1 clusters) annotated as B cells (Bc; n = 2125), CD4 T cells (CD4 Tc; n = 55,814), CD8 T cells (CD8 Tc; n = 18,488), dendritic cells (DC; n = 3440), monocytes (Mono, n = 9082), natural killer cells (NK; n = 2059) and other T cells (n = 1909). Cells lacking definitive assignment (other n = 1074, unassigned n = 11,960) were not further considered in the analysis. Annotation was congruent with the expression of canonical cell type marker genes (Fig. 1C) in these level 1 clusters (Bc: CD79A, MS4A1; CD4 Tc: IL7R; CD8 Tc: CCL5, CD8B; DC: HLA-DBP1, CD74, CD1C; Mono: HLA-DBP1, CD74, CD14; NK: GNLY) supporting the adequacy of our annotation. In accordance with previous studies of the CSF [10, 12, 14], all detected cells in this combined CSF dataset were of hematopoietic origin and T cells were the most abundant cell type with a preponderance of CD4 T cells outnumbering myeloid and B lineage cells (Fig. 1D).
Next, we aimed to identify gross MS-associated changes and therefore tested for differential level 1 cluster abundance between RRMS patients and controls (Additional file 7: Tab. S3). Already in a simple qualitative comparison, the frequency of B cells was higher in MS vs control samples (Fig. 1D) as described previously [11, 13]. The proportion of B cells in the integrated data set was 3% in RRMS patients vs 0% in healthy controls (Fig. 1E). Quantification also identified a general myeloid-to-lymphoid shift with proportions of B cells, T cells and NK cells significantly expanded at the expense of the Mono cluster (Fig. 1E, F). These proportional alterations in MS-derived CSF cells had also been described previously [11, 13].
Integrating datasets allows characterizing rare CSF cell populations through higher resolution clustering
We next asked whether higher resolution clustering — facilitated by increased cell numbers after integration — would reveal previously unrecognized MS-associated changes. We therefore clustered and annotated all 105,951 CSF single-cell transcriptomes using deeper level 2 clustering (Fig. 2A) and annotation from a reference dataset [18]. This considerably increased cellular granularity to 31 clusters (Fig. 2A, Additional file 8: Tab. S4). For example, we identified 4 level 2 subsets of cells ascribed to the B cell lineage: B memory (Bc MEM), B intermediate (Bc Int), B naïve cells (Bc Naïve), and antibody secreting cells; referred to as plasmacells/-blasts (plasma) for simplicity. Myeloid lineage cells separated into 5 subclusters (CD14 Mono, CD16 Mono, cDC2 (myeloid/conventional DC1), ASDC (AXL+ SIGLEC6+ DCs), cDC1) based on the expression of subset marker genes (Fig. 2A, Additional file 8: Tab. S4). CD4 T cells separated into 6 subclusters (CD4 Naive, Treg, CD4 TCM (central memory), CD4 TEM (effector memory), CD4 CTL (T cells with cytotoxic activity), CD4 Prolif (proliferating)) and CD8 T cells into 4 individual clusters (CD8 Naive, CD8 TCM, CD8 TEM, CD8 Prolif). There was a small subcluster of double-negative T cells (dnT) in the CD8-cluster. Several smaller clusters separated from the larger clusters (HSPC, MAIT, platelet). Notably, smaller clusters of such identity had not previously been detected in CSF datasets [11, 13]. We thus replicated cell types known from previous CSF scRNA-seq studies, but also demonstrated the potential of joint analyses to identify rare CSF cell populations.
Joint analysis of single CSF cell transcriptomes helps confirming and refuting previous findings in CSF in MS
We next quantified disease-associated compositional and transcriptional changes at level 2 clustering (Fig. 2B, Additional file 9: Tab. S5). Again, the increase of B lineage clusters annotated as memory and / or class-switched was the most pronounced alteration among MS-derived cells (Fig. 2B) but the low number of these clusters in control patients obviated calculating differential gene expression. The plasma cluster and other B cell clusters were almost exclusively detected in the CSF of MS patients (Fig. 2C) (plasma: n = 443 in MS vs n = 0 in control). These changes had been described previously [11, 19] and were thus replicated at higher resolution in the present integrated dataset.
B lineage cells are nearly absent from non-diseased CSF [10]. In our joint analysis including 11 control patients, we annotated 12 total cells as B cells in the combined control dataset compared to 1346 B cells among MS CSF cells. Aiming to better understand B cells in the CSF in MS, we next plotted B cell-associated gene sets across the respective clusters. In accordance with our annotation, naïve B cells expressed IGHM and IGHD while the plasma cluster expressed JCHAIN, CXCR3, PRDM1, CD38, and class-switched immunoglobulin chain genes (e.g., IGHG), and lacked MS4A1 (Fig. 2D). Antigen presentation-associated genes (e.g., HLA-DRB1, HLA-DOB) and MS4A1/CD20 were expressed across all non-plasma B cell clusters and upregulation of CD24 and downregulation of IGHM characterized memory B cells (Fig. 2D). Notably IGHA gene expressing plasma cells recently described in MS [20] were also detected. When performing a detailed transcriptional characterization of B cells clusters, we observed a phenotype indicative of antibody secreting cells (Fig. 2D). Our integrated transcriptional characterization thus identifies B cells across a developmental continuum specifically in the CSF in MS.
We next focused on compositional changes described previously, but not replicated in our integrated dataset; this was true for γδ T cells (gdT cells). The merged dataset showed a non-significant increase in gdT cells in the CSF of MS patients (751 cells in MS vs 317 cells in control patients (Fig. 2B)) while a reduction had been described previously [13]. Differences in cell annotation may account for some of the differences. GNLY, CCL5 and CST7 were all downregulated in the gdT cells of the integrated MS data (Fig. 2E) indicating a potential reduction in cytotoxic potential. Upregulation of HLA-C in this cluster in MS was in line with previous results [13], while genes IL12RB1, HIF1A, and IRF3 were only downregulated and STAT5A, STAT1, and CD5 were only upregulated in the integrated data (Fig. 2F). Our joint analysis thus enables a deeper transcriptional characterization with increased confidence and replicates some, but not all previously described changes.
Deep characterization of rare cell types facilitated by integrated CSF scRNA-seq data analysis
We next aimed to better characterize the cell types with profound differential abundance in MS vs control comparisons with specific focus on MS-related changes not previously reported in single cell studies.
Specifically, an expansion of cell clusters annotated as innate lymphoid cells (ILC) and double-negative (dnT; i.e., CD4−CD8−) T cells was newly identified (Fig. 2B). The ILC cluster was more abundant in MS patients than in controls (Ctrl n = 13 vs MS n = 82 cells) and expressed MTRNR2L12 and MT-ND4L (Figs. 2B, 3A). ILCs differ from NK cells in their transcriptional regulation by IL7R, ID2, TOX, ETS1, and GATA3, whereas NK cells depend on the transcription factors TBX21 and EOMES [21] (Fig. 3A). BCL11B, ETS1, GATA3, IL17R, NFIL3, and ID3 were upregulated in ILCs of MS patients, whereas TOX, RUNX3, ID2, and AHR were downregulated (Fig. 3B) potentially indicating loss of regulatory mechanisms.
Differential abundance of Tregs had previously been only marginally significant [11]. The merged dataset showed that the Treg cluster was more abundant in MS patients (cell count Ctrl n = 288 vs MS n = 2086) (Fig. 3C) and was characterized by FOXP3, CD4, TNFRSF18, and CTLA4 [22] (Fig. 3D). Expression of IKZF2, IRF4, CCR4, CCR6, and CXCR3 was indicative of induced Tregs (iTreg) [23], and further confirmed by the absence of the natural Treg (nTreg) markers PECAM1, CD101, and NRP1. nTregs are unstable and transit into a Th17-like phenotype under inflammatory conditions and the presence of IL-6 [24]. In contrast, iTregs retain at least temporarily their immunoregulatory capacity despite an autoreactive environment [25]. In our integrative dataset, Treg cells did not express genes of anti-inflammatory cytokines (e.g., TGFB3, IL10, IL12A, EBI3) or the transcription factor for the Th1-lineage T-bet (TBX21), but exposed the transcription factor for Th17 lineage STAT3 (Fig. 3D), indicating a Th17-like rather than Th1-like phenotype [26,27,28]. Despite the higher abundance of Treg cells in the CSF of MS patients, none of the genes were significantly differentially expressed (Fig. 3E). Considering the potential of iTreg to suppress ongoing autoimmune response [29], Treg expansion may reflect local regulatory mechanisms in the CSF in MS. These mechanisms are likely exhausted by persistent autoreactive mechanisms and thus depicts a potential therapeutic approach.
Next, we characterized the clusters annotated as mucosal associated invariant T cells (MAIT) and dnT (Fig. 3F). MAIT expressed KLRB1, IL7R, SLC4A10, and DPP4 and dnT lacked expression of CD4 and CD8. We detected 373 dnT cells which were preferentially MS-derived (Ctrl n = 52 vs MS n = 321 dnT cells) (Fig. 3G). Despite the higher frequency of dnT cells in MS, no gene was upregulated while 42 genes were downregulated (e.g., SET, MAPKAPK5-AS1, NCOA3, AKAP13, ASXL1, TRPS1, GLRX5) (Fig. 3H). The MAIT cluster consisted of 251 cells, again with preferential detection in the MS CSF (Ctrl n = 56 vs MS n = 195 MAIT cells) (Fig. 3D). No genes were significantly differentially expressed (Fig. 3I). Overall, this set of differentially abundant clusters can be summarized as cells with both innate and cytotoxic phenotypes but also regulatory function expanding in the CSF in MS across studies. An expansion of cytotoxic phenotype CD4 T cells (albeit not ILC, MAIT, dnTc) had been described previously [11] and this may reflect similar changes annotated differently.
Integrated analysis reveals inflammatory phenotype of CD16+ monocytes
Overall, clusters annotated as monocytic cells (level 1) showed the greatest number of differentially expressed genes in MS vs control comparison (Fig. 4A) across several studies [11, 13] indicating preferential phenotypic alterations in myeloid lineage cells in the CSF in MS. In the level 2 clustering, this was especially pronounced in CD16+ monocytes, which upregulated 156 genes in MS and were relatively more abundant in MS (Fig. 4A). We therefore next aimed to capitalize on the potential of our integrated dataset to better understand how MS affects CSF cell types of such phenotype.
We first characterized the total Mono cluster (level 1) classified by genes like CSF1R, LYZ, MAFB, MSR1, and CD300E (Fig. 4B) [30]. Genes associated with lipid metabolism like ALOX15B and LPL or involved in immunoregulation like FKBP5 and CD163 were among the 73 upregulated genes in the overall Mono cluster of MS patients (Fig. 4C). Genes like IGLC2, MZB1, JCHAIN, and CD79A, which are known B cell markers, were upregulated in the Mono cluster which could represent remnant mis-annotated B cells in the Mono cluster (Fig. 4C).
Next, we analyzed the subclusters of the CD14 and CD16 monocytes (level 2). In the past, monocytes and their activation markers were mainly studied in PBMC [31,32,33]. CD16+ and CD14+ monocytes — especially in the CSF of MS patients — are not yet well characterized.
In our joint analysis of RRMS and IIH/control CSF of three independent datasets, the number of CD14+ monocytes was higher than the number of CD16+ monocytes in both control and MS (Fig. 4D). While CD14+ monocytes are known to decrease in the blood of MS patients [34], the total number of CD14+ monocytes was increased in the CSF of MS patients compared to controls supporting the independence of blood and CSF compartments. The CD14+ monocytes in the MS samples were characterized by an upregulation of genes associated with phagocytosis and lipid metabolism like LPL, APOE, and AXL [35], an observation reminiscent of the phenotypes displayed by microglia at the rims of the chronic active lesions [36, 37] (Fig. 4E). Furthermore, we found a downregulation of trafficking-associated transcripts in the CD14 Mono cluster (IL1B, CCL4, CCL3, CCL2) (Fig. 4E). We interpret this downregulation as supportive of the classification of CD14+ monocytes as rather “CSF-derived” as opposed to “periphery-derived” as described by Schafflick et al. [11].
CD16+ monocytes in CSF were more abundant in MS than in the controls (Fig. 4D). GPR34, STAB1, P2RY12, and LYVE1 previously described in so-called border-associated macrophages (BAMs) [38], were significantly upregulated in CD16+ monocytes in MS. Furthermore, the CD16 Mono cluster upregulated mitochondrial (e.g., ATP5MD) and cell cycle-associated transcripts (EIF1, EEF1D, EEF2, SAP18, BTF3, HCLS1) indicative of proliferation (Fig. 4F).
Comparing CD14+ and CD16+ monocytes in detail, both CD14+ and CD16+ monocytes expressed activation markers (CD86, CD40) and anti-inflammatory cytokines (IL10) which were upregulated in the CSF of MS patients (Fig. 4G). Additionally, CD16+ and CD14+ monocytes showed a CNS-associated macrophage phenotype (Mrc1, Lyve1, CD163, Siglec1) [39,40,41], with CD16+ monocytes appearing to be more differentiated and specialized than CD14+ monocytes (Fig. 4H). CCR2, a chemokine receptor critical for crossing the blood–brain barrier [42], was significantly upregulated in the CD16+ monocytes but not in the CD14+ monocytes in MS (Fig. 4G), and may thus locally drive inflammation in MS. Similar to previous findings in the periphery of MS patients [31,32,33], our study identified CD16+ monocytes as active inducers of inflammation and considered them as one of the first cells crossing the blood–brain barrier during MS [31].
After the in-depth characterization of the individual clusters on level 1 and level 2, we were interested in further speculating on the identity of the 11,960 cells that we had defined as unassigned cells in the level 1 clustering. Based on the marker genes from level 2 clustering (Figs. 2E, 3A,D,F, 4B), the unassigned level 1 cluster showed mainly characteristics of gdT cells (CCL5, CST7, GZMA, CD74), ILC (MTRNR2L12, MT-CD4L, ID2, ETS1, BCL11B, IGKC), MAIT (KLRB1, NKG7), and dnT (GZMK) cells and less of naive CD4, Tre,g and Mono (Additional file 4: Fig. S4). This underscores that lower resolution scRNA-seq are unlikely to capture these smaller cell populations and highlights the importance of our joint analysis.
Overall, an integrated data analysis facilitates deeply characterizing disease-associated transcriptional changes in both rare and abundant cell types for a better understanding of mechanisms of MS.