Exome sequencing study in patients with multiple sclerosis reveals variants associated with disease course

Background It remains unclear whether disease course in multiple sclerosis (MS) is influenced by genetic polymorphisms. Here, we aimed to identify genetic variants associated with benign and aggressive disease courses in MS patients. Methods MS patients were classified into benign and aggressive phenotypes according to clinical criteria. We performed exome sequencing in a discovery cohort, which included 20 MS patients, 10 with benign and 10 with aggressive disease course, and genotyping in 2 independent validation cohorts. The first validation cohort encompassed 194 MS patients, 107 with benign and 87 with aggressive phenotypes. The second validation cohort comprised 257 patients, of whom 224 patients had benign phenotypes and 33 aggressive disease courses. Brain immunohistochemistries were performed using disease course associated genes antibodies. Results By means of single-nucleotide polymorphism (SNP) detection and comparison of allele frequencies between patients with benign and aggressive phenotypes, a total of 16 SNPs were selected for validation from the exome sequencing data in the discovery cohort. Meta-analysis of genotyping results in two validation cohorts revealed two polymorphisms, rs28469012 and rs10894768, significantly associated with disease course. SNP rs28469012 is located in CPXM2 (carboxypeptidase X, M14 family, member 2) and was associated with aggressive disease course (uncorrected p value < 0.05). SNP rs10894768, which is positioned in IGSF9B (immunoglobulin superfamily member 9B) was associated with benign phenotype (uncorrected p value < 0.05). In addition, a trend for association with benign phenotype was observed for a third SNP, rs10423927, in NLRP9 (NLR family pyrin domain containing 9). Brain immunohistochemistries in chronic active lesions from MS patients revealed expression of IGSF9B in astrocytes and macrophages/microglial cells, and expression of CPXM2 and NLRP9 restricted to brain macrophages/microglia. Conclusions Genetic variants located in CPXM2, IGSF9B, and NLRP9 have the potential to modulate disease course in MS patients and may be used as disease activity biomarkers to identify patients with divergent disease courses. Altogether, the reported results from this study support the influence of genetic factors in MS disease course and may help to better understand the complex molecular mechanisms underlying disease pathogenesis. Electronic supplementary material The online version of this article (10.1186/s12974-018-1307-1) contains supplementary material, which is available to authorized users.


Background
Multiple sclerosis (MS [MIM: 126200]) is a common disease of the central nervous system (CNS) of complex pathogenesis. Although the etiology of MS is unknown, it is assumed that both genetic and environmental factors, such as cigarette smoking, vitamin D deficiency, viral infections, or obesity, influence disease phenotype [1][2][3]. Additionally, there is increasing evidence that inherited epigenetic variation complements the role of the genetic predisposition in the susceptibility to develop MS [4]. Over the last years, genome-wide association studies (GWAS) have significantly contributed to the characterization of the MS genetic component, with the identification of more than 200 genetic variants outside the major histocompatibility complex (MHC) that influence the risk of developing MS [2][3][4][5]. Despite the growing knowledge of MS risk genes, still little is known about the disease-modifying genes that modulate MS course. Disease course in MS is extremely variable and patients may have relapse-onset or progressive clinical forms, and face benign or severe disease courses. Although the underlying cause of this disease variability remains elusive, evidence exists that it may be influenced by genetic factors [6,7].
In the present study, we aimed to identify genes associated with MS disease course by first applying an exome sequencing approach to a discovery cohort of MS patients with benign and aggressive disease courses, followed by the validation of selected genetic variants in two independent cohorts of patients with divergent disease courses.

Discovery cohort
The discovery cohort comprised 20 MS patients classified according to their disease course into benign and aggressive phenotypes. Patients with benign phenotypes (n = 10) were defined as having an Expanded Disability Status Scale (EDSS) equal or lower than 3.0 after 15 or more years from disease onset [8] and never received MS therapies. Patients with aggressive disease courses (n = 10) reached an EDSS score equal or higher than 6.0 within the first 5 years after disease onset, regardless of treatment [9]. All patients included in the discovery cohort were recruited at the Centre d'Esclerosi Múltiple de Catalunya (Cemcat). Additional file 1: Table S1 summarizes demographic and main clinical characteristics of the discovery cohort.

Exome sequencing
Genomic DNA was extracted from peripheral blood using standard methods. An exome sequencing approach was applied to the discovery cohort in order to identify genes associated with benign and aggressive disease courses. Exome sequencing was based on an Illumina HiSeq2000 sequencing platform and an Agilent's SureSelect Target Enrichment System for 51 Mb. Sequencing was done with a 50× of coverage and reads were aligned against the human reference genome (GRCh37/hg19 assembly) using the Burrows-Wheeler Alignment tool (BWA) [10]. After reads mapping, low-quality reads and sequences flagged as PCR duplicates were removed from the BAM file using the Sequence Alignment/Map (SAM) [10] and Picard Tools. Unmasked variants were annotated considering all possible transcripts for each target gene, and in some cases variants located within a coding sequence when considering one isoform could be positioned within a non-coding region when considering another isoform, thus resulting also in the identification of intronic variants. Exome sequencing was performed in Sistemas Genómicos (Valencia, Spain).

Selection of candidate single-nucleotide polymorphisms for validation
For the variant calling process, different algorithms were applied, including VarScan [11] and the Genome Analysis Toolkit (GATK) [12]. Python scripts were developed to combine variants. Variants annotation was based on Ensembl and NCBI databases. For the selection of significant variants, a Fisher exact test was applied to the benign and aggressive phenotypes. For prioritization and selection of the most promising variants, the following criteria were applied: (i) presence of two or more statistically significant variants per gene; (ii) odds ratio difference of the prevalence for the variant between disease phenotypes equal or higher than 2; (iii) absence of the variant in one disease phenotype and presence of the variant in ≥ 50% of patients belonging to the counterpart phenotype; (iv) missense variants, splice region variants, and variants reported as possible deleterious mutations; and (v) biological and functional relevance of the target genes to MS, as reported in the literature. A total of 16 independent variants satisfying 2 or more of the aforementioned criteria were selected for validation.

Validation cohorts
Two independent cohorts with benign and aggressive disease courses were included in the study in order to validate the selected variants from the exome sequencing approach.
The second validation cohort consisted of 257 MS patients from Canada, 224 patients with benign phenotypes and 33 with aggressive disease courses. MS patients were ascertained through the Canadian Collaborative Project on the Genetic Susceptibility to Multiple Sclerosis (CCPGSMS) [13].
Clinical criteria to classify patients into benign and aggressive disease courses were the same as those applied to the discovery cohort, except for the second validation cohort in which treatment information on patients with benign disease course was not available. Similar to the discovery cohort, patients with benign phenotypes from the first validation cohort never received MS therapies. A summary of demographic and clinical characteristics of the first and second validation cohorts is shown in Additional file 1: Table S1.
The study was approved by the corresponding local ethics committees, and all participants provided informed consent.

TaqMan OpenArray genotyping
Genotyping of selected variants in the first validation cohort was performed using an OpenArray technology (Thermo Fisher Scientific, Massachusetts, USA) and following the manufacturer's instructions. Briefly, DNA samples were loaded into custom designed arrays using an OpenArray® AccuFill System (Thermo Fisher Scientific). QuantStudio™ 12K Flex system (Thermo Fisher Scientific) was used for sample amplification and fluorescent data collection. Hapmap samples with known genotype were included as internal controls of the process. Genotype was assigned using Taqman Genotyper Software (Thermo Fisher Scientific). Genotyping was performed by the Human Genotyping laboratory of the Spanish National Cancer Research Centre (CNIO).

Sequenom MassARRAY genotyping
In the second validation cohort, selected variants were genotyped using a MassArray iPLEX platform (Sequenom, San Diego, CA, USA) as previously described [14].

CPXM2, IGSF9B, and NLRP9 expression analysis in peripheral blood cells
Gene expression levels for CPMX2, NLRP9, and IGSF9B were determined by real-time PCR in peripheral blood mononuclear cells (PBMC) available from a subgroup of untreated MS patients from the first validation cohort. In order to avoid a confounding effect of disease course in the expression levels for these genes, analysis was restricted to the group of patients with aggressive disease course (n = 8 for CPXM2; n = 9 for NLRP9; n = 7 for IGSF9B). Briefly, PBMC were isolated by Ficoll-Isopaque density gradient centrifugation (Gibco BRL, Life Technologies LTD, Paisley, UK) and stored in liquid nitrogen until used. Total RNA was extracted from PBMC using TRIzol® reagent (Invitrogen, Carlsbad, CA) and cDNA synthesized using the High Capacity cDNA Archive kit (Applied Biosystems, Foster City, CA, USA). Messenger RNA expression levels for CPMX2, NLRP9, and IGSF9B were determined by real-time PCR using TaqMan® probes specific for each gene (Applied Biosystems, Foster City, CA, USA). The housekeeping gene glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was used as an endogenous control (Applied Biosystems). Assays were run on the ABI PRISM® 7900HT system (Applied Biosystems) and data were analyzed using the 2 −ΔΔCT method [15]. Results were expressed as fold change in gene expression in MS patients carrying the risk allele relative to non-carrier patients.
Immunohistochemistry for CPXM2, IGSF9B, and NLRP9 in MS brain tissue Paraffin-embedded brain samples of chronic active lesions from four MS patients were provided by the UK Multiple Sclerosis Tissue Bank and stained with hematoxylin and eosin (HE) and Klüver-Barrera (KB) for inflammation and demyelination assessment.
Four-micrometer-thick, paraffin-embedded serial sections were deparaffined in xylene and rehydrated in alcohol. Endogenous peroxidase activity was blocked with hydrogen peroxide (2%), methanol (70%), and PBS for 20 min. Antigen retrieval was performed in TE buffer (1 M TrismaBase and 1 mM EDTA) (pH = 9) in the microwave. Non-specific protein binding was blocked with 0.2% of bovine albumin (BSA) in PBS. Sections were incubated overnight at 4°C with the following primary antibodies: rabbit anti-CPXM2 (Biorbyt), rabbit anti-NLRP9 (Abcam), and rabbit anti-IGSG9B (Abcam). Samples were incubated for 1 h at room temperature with goat-anti rabbit HRP secondary antibody (Dakocytomation) and stainings were visualized with 3,3′diaminobenzidine (Sigma, St Louis, MO, USA) as a chromogenic substrate.

Results
Exome sequencing in the discovery cohort of patients with benign and aggressive disease courses and selection of candidate genetic variants A flowchart summarizing the main steps of the study design and analysis is depicted in Fig. 1. In order to identify genetic variants associated with disease course, we first performed exome sequencing in an initial cohort of 20 MS patients classified according to their disease course into benign (n = 10) and aggressive (n = 10). A total of 915 single-nucleotide polymorphisms (SNPs) were differentially distributed between MS patients with benign and aggressive disease courses (uncorrected p values < 0.05; data not shown). From the list of differentially distributed SNPs and after applying the selection criteria described in the "Methods" section, 16 SNPs were chosen for validation in two validation cohorts. Table 1 shows a description of the 16 selected SNPs from the discovery cohort, and Table 2 provides the results of the exome sequencing analysis for these 16 SNPs.
Validation of selected SNPs in two independent cohorts of patients with benign and aggressive disease courses Selected SNPs were genotyped in two independent validation cohorts composed of MS patients classified according to their disease course. A total of 194 MS patients were included in the first validation cohort and 257 MS patients in the second validation cohort. Meta-analysis of genotyping results in both cohorts identified two SNPs, rs28469012 and rs10894768, significantly associated with MS disease course. As shown in Fig. 2, the minor allele for rs28469012, an intronic SNP positioned in CPXM2 (carboxypeptidase X, M14 family, member 2), was nominally significantly associated with more aggressive disease course [p-value; odds ratio (95% confidence interval): p = 0.04; 1.81 (1.03-3.18)] (Fig. 2). In this case, a modest degree of heterogeneity was found between both validation cohorts.
The minor allele for rs10894768, which corresponds to a synonymous SNP located in IGSF9B (immunoglobulin superfamily member 9B), was nominally significantly associated with benign disease course [p = 0.04; 0.70 (0.49-0.99)]. Meta-analysis of this variant supported the originally identified association with homogeneous outcomes for both cohorts (I 2 = 0).
A trend for significance was observed in a third SNP, rs10423927, an intronic variant located in NLRP9 (NLR family pyrin domain containing 9), with the minor allele associated with benign disease course [p = 0.09; 0.58 (0.31-1.09)]. Similar to the IGSF9B variant, meta-analysis for rs10423927 also supported the association in the discovery cohort and resulted in similar outcomes in both validation cohorts (I 2 = 0).
It should be noted that rs2374639, a synonymous variant positioned in VPS50 (VPS50, EARP/GARPII complex subunit), showed a significant association in the second validation cohort with the same protective effect observed in the discovery cohort [p = 0.04; 0.47 (0.23-0.95)]. However, the heterogeneity found between both validation cohorts did not allow to perform a joint analysis (Table 2; I 2 = 80%).
Genotyping frequencies and results of statistical analysis in the two validation cohorts for all selected variants are provided in Table 2.
Tissue-specific expression of IGSF9B, CPXM2, and NLRP9 In order to explore the functional consequences of the polymorphisms associated with MS disease course, we first investigated whether mRNA expression levels for IGSF9B, CPXM2, and NLRP9 in Fig. 1 Flowchart showing the study design and analysis. Patients were classified according to their disease course into benign and aggressive MS, as described in the Methods. By means of exome sequencing, a total of 915 single-nucleotide polymorphisms (SNPs) were identified from 10 MS patients with benign and 10 with aggressive disease courses as being differentially distributed between both groups (discovery cohort). After applying several selection criteria on the list of 915 SNPs including odds ratio difference, phenotype prevalence, number of statistically significant SNPs per gene, type and variant effects on the predicted protein, and relevance of target genes to MS, a total of 16 SNPs were chosen for further validation in two independent cohorts of patients also classified into benign and aggressive phenotypes. The first validation cohort comprised 194 MS patients, 107 with benign and 87 with aggressive disease courses, and genotyping was conducting using an OpenArray technology. The second validation cohort consisted of 257 MS patients, 224 with benign and 33 with aggressive disease courses, and genotyping was performed on a MassArray iPLEX platform. Finally, a meta-analysis was performed in the two validation cohorts PBMC differed between MS patients carrying the minor allele associated with disease course and non-carrier patients. CPXM2 and NLRP9 expression was not detected in PBMC from MS patients (data not shown). IGSF9B expression was detected in PBMC from MS patients, although no differences were observed between carriers and non-carriers of the minor allele associated with benign disease course (Additional file 1: Figure S1).
In order to determine whether the selected genetic variants associated with disease course could be modifying the expression of nearby genes in the region, we analyzed cis-expression quantitative trait loci (eQTLs) in 48 tissues from the Genotype-Tissue Expression (GTEx) project [16], lymphoblastoid cell lines in 465 individuals from the Geuvadis project [17], and trans-eQTLs from expression data of 9196 tumor samples in 33 cancer types from the Pan-canQTL study [18]. We observed correlation of the rs10894768 genotypes with IGSF9B expression in pancreatic tissue (p = 3.6 × 10 − 6 ) from 171 individuals and thyroid tissue (p = 5.6 × 10 − 11 ) from 323 individuals from the GTEx study [16]. The minor allele for this polymorphism, which was more represented in patients with benign disease course, was associated with lower IGSF9B expression (Additional file 1: Figure  S2). Of note, IGSF9B was highly expressed in cerebellar hemisphere, cerebellum and also in cortex, hypothalamus, and spinal cord; however, no eQTLs were described for these tissues most likely due to the lower number of samples analyzed in brain tissue (105 samples from cerebellar hemisphere and 125 from cerebellum).

IGSF9B, CPXM2, and NLRP9 are expressed in brain chronic active lesions from MS patients
Based on the negative results observed for IGSF9B, CPXM2, and NLRP9 expression in peripheral blood, we wondered whether selected genes associated with MS disease course could play more specific roles at the CNS level by investigating their expression in brain tissue from patients. Figure 3 depicts the results of IGSF9B, CPXM2, and NLRP9 immunohistochemistries in brain tissue sections from chronic active lesions of MS patients. IGSF9B showed a diffuse neutrophil staining in both the gray and the white matter. The most intense IGSF9B staining was detected in the cytoplasm of microglia/macrophages and astrocytes located at the margins of MS lesions, where inflammatory activity is highest. IGSF9B immunostaining was also observed in less active areas of the lesions, albeit to a lesser degree compared to more active areas. CPXM2 and NLRP9 exhibited a similar and very discrete immunostaining. Their expression was restricted to the cytoplasm of few microglia/macrophages located at the margins of MS lesions showing the highest inflammatory activity (Fig. 3). By definition, a splice region variant is a sequence variant in which a change has occurred within the region of the splice site, either within 1-3 bases of the exon or 3-8 bases of the intron [30] Discussion Exome sequencing has significantly contributed to the characterization of the genetic component of a number of common complex diseases [19]. In the present study, we aimed to identify genetic variants associated with MS disease course by applying, as a first step, an exome sequencing approach to a small discovery cohort of patients stratified according to benign and aggressive phenotypes. This initial approach led to the identification of a ranked list of candidate polymorphisms associated either with benign or aggressive MS disease courses. Despite the statistically significant associations of a large number of SNPs with MS disease course, in small discovery cohorts only strong statistically significant associations are likely to be real, and hence original findings should be better replicated in additional cohorts to eliminate the chances of false positive results. Based on these observations, in our study selected SNPs identified in the discovery cohort were further genotyped in two independent validation cohorts of patients classified according to similar criteria into benign and aggressive phenotypes.
Meta-analysis in the two validation cohorts revealed two polymorphisms, rs28469012 and rs10894768, as potential MS phenotype modifiers. The SNP rs28469012 is an intronic variant located in the CPXM2 gene whose minor allele was associated with worse disease evolution. CPXM2 codes for a member of the metallocarboxypeptidase family with potential roles in synaptic integrity [20]. Previous studies have associated the CPXM2 gene to Alzheimer disease [21], Parkinson's disease [20], and schizophrenia [22]. Although no studies of CPXM2 have been reported thus far in MS, it is interesting to mention that experimental autoimmune encephalomyelitis (EAE) mice deficient for another metallocarboxypeptidase that shares protein homology with CPXM2, carboxipeptidase N, had attenuated EAE disease course and reduced spinal cord inflammation and demyelination [23], data that indirectly support the association observed in our study between CPXM2 and aggressive MS phenotypes.
The synonymous exonic variant rs10894768 is positioned in the IGSF9B gene, and the minor allele for this polymorphism was more represented in MS patients with benign disease course. IGSF9B encodes a transmembrane immunoglobulin that has been reported to be highly expressed in GABAergic interneurons, where it may play a role promoting inhibitory synaptic development via the formation of a ternary complex with the postsynaptic scaffolding protein S-SCAM and the neuronal cell surface protein neuroligin 2 [24]. Similar to CPXM2, the role of IGSF9B in MS is unknown. However, considering that the GABAergic system is dysregulated in both MS and EAE and a selective loss of GABAergic interneurons has been reported in EAE [25], it is tempting to speculate that the finding of a higher frequency of genetic variants located in a gene that promotes maintenance of inhibitory synapses may result in more benign disease outcomes of MS patients. Noteworthy, MS brain tissue immunohistochemistry revealed IGSF9B expression in astrocytes, cells that are known to be involved in the formation and control of neuronal synapses [26]. Although rs28469012 and rs10894768 were the only polymorphisms whose association with MS disease course was validated, a trend for association with benign phenotypes was also observed for rs10423927, an Meta-analysis for rs28469012 (CPXM2), rs10894768 (IGSF9B), and rs10423927 (NLRP9) in the two validation cohorts of patients with aggressive and benign disease courses. The figure depicts joint analyses for the first and second validation cohorts in each SNP, with homogeneity tests (I 2 ) and tests for overall effects. The squares and horizontal lines correspond to the study specific odds ratios (ORs) and 95% confidence intervals (CI) respectively. The area of the squares reflects the study specific weight (inverse of the variance). The diamond represents the pooled ORs and 95% CI. M-H Mantel-Haenszel intronic variant located in the NLRP9 gene. Despite that little evidence exists in the literature regarding its function, NLRP9 belongs to the NOD-like receptor (NLR) family of inflammasomes, which are known to play critical roles both in innate and adaptive immunity and whose dysfunction has strongly been linked to autoimmune diseases [27]. Interestingly, a missense variant located in another member of the NLR family of inflammasomes, NLRP5, was recently found to be associated with higher disease severity scores, suggesting a role of NLR inflammasomes in MS disease course [28].
In an attempt to investigate the functional consequences of the genetic variants associated with benign and aggressive phenotypes, expression of IGSF9B, CPXM2, and NLRP9 was investigated at the gene and protein expression levels in PBMC and brain tissue respectively. Although not proven in the study, the negative results obtained in peripheral blood, with lack of expression of CPXM2 and NLRP9 in PBMC and no evidence of differences in IGSF9B expression between minor allele carriers and non-carriers for rs10894768, suggest that the genetic variants associated with disease course in MS may act by modulating the function of CNS cells such as macrophages/microglia and astrocytes, as supported by the immunohistochemistry studies in MS brain tissue showing expression for these genes in these particular cell types. Unfortunately, postmortem brain studies are not suitable for patient stratification to explore allele-specific gene expression differences. Furthermore, it could be possible that the MS course-associated allele of our reported SNPs confers increased ability to interact with certain environmental risk factors or impacts on chromatin structure by affecting epigenetic marks, including DNA methylation or histone modifications [29].
Finally, the finding that the minor allele of rs10894768, which is more represented in MS patients with benign outcomes, was associated with lower expression of IGSF9B in thyroid and pancreatic tissues supports the view that gene expression may be markedly different across tissues. Fig. 3 IGSF9B, CPXM2, and NLRP9 expression in MS brain tissue. Immunostainings for IGSF9B, CPXM2, and NLRP9 in the margins of MS brain chronic active lesions, where inflammatory activity is highest. IGSF9B expression was observed in astrocytes (arrows) and macrophages/microglia (arrow heads), whereas CPMX2 and NLRP9 immunostaining was only detected in macrophages/microglia (arrow heads). Photos were taken at × 20 and × 40