The discovery cohort comprised 20 MS patients classified according to their disease course into benign and aggressive phenotypes. Patients with benign phenotypes (n = 10) were defined as having an Expanded Disability Status Scale (EDSS) equal or lower than 3.0 after 15 or more years from disease onset  and never received MS therapies. Patients with aggressive disease courses (n = 10) reached an EDSS score equal or higher than 6.0 within the first 5 years after disease onset, regardless of treatment . All patients included in the discovery cohort were recruited at the Centre d’Esclerosi Múltiple de Catalunya (Cemcat). Additional file 1: Table S1 summarizes demographic and main clinical characteristics of the discovery cohort.
Genomic DNA was extracted from peripheral blood using standard methods. An exome sequencing approach was applied to the discovery cohort in order to identify genes associated with benign and aggressive disease courses. Exome sequencing was based on an Illumina HiSeq2000 sequencing platform and an Agilent’s SureSelect Target Enrichment System for 51 Mb. Sequencing was done with a 50× of coverage and reads were aligned against the human reference genome (GRCh37/hg19 assembly) using the Burrows-Wheeler Alignment tool (BWA) . After reads mapping, low-quality reads and sequences flagged as PCR duplicates were removed from the BAM file using the Sequence Alignment/Map (SAM)  and Picard Tools. Unmasked variants were annotated considering all possible transcripts for each target gene, and in some cases variants located within a coding sequence when considering one isoform could be positioned within a non-coding region when considering another isoform, thus resulting also in the identification of intronic variants. Exome sequencing was performed in Sistemas Genómicos (Valencia, Spain).
Selection of candidate single-nucleotide polymorphisms for validation
For the variant calling process, different algorithms were applied, including VarScan  and the Genome Analysis Toolkit (GATK) . Python scripts were developed to combine variants. Variants annotation was based on Ensembl and NCBI databases. For the selection of significant variants, a Fisher exact test was applied to the benign and aggressive phenotypes. For prioritization and selection of the most promising variants, the following criteria were applied: (i) presence of two or more statistically significant variants per gene; (ii) odds ratio difference of the prevalence for the variant between disease phenotypes equal or higher than 2; (iii) absence of the variant in one disease phenotype and presence of the variant in ≥ 50% of patients belonging to the counterpart phenotype; (iv) missense variants, splice region variants, and variants reported as possible deleterious mutations; and (v) biological and functional relevance of the target genes to MS, as reported in the literature. A total of 16 independent variants satisfying 2 or more of the aforementioned criteria were selected for validation.
Two independent cohorts with benign and aggressive disease courses were included in the study in order to validate the selected variants from the exome sequencing approach.
The first validation cohort included 194 MS patients from 7 MS centers [Bilbao (n = 56); UCSF (n = 55); Madrid—Hospital Clínico (n = 32); Barcelona—Hospital Clinic (n = 23); Madrid—Ramón y Cajal (n = 16); Madrid—Puerta de Hierro (n = 9); Girona (n = 3)]. Of these, 107 MS patients had benign phenotypes and 87 aggressive disease courses.
The second validation cohort consisted of 257 MS patients from Canada, 224 patients with benign phenotypes and 33 with aggressive disease courses. MS patients were ascertained through the Canadian Collaborative Project on the Genetic Susceptibility to Multiple Sclerosis (CCPGSMS) .
Clinical criteria to classify patients into benign and aggressive disease courses were the same as those applied to the discovery cohort, except for the second validation cohort in which treatment information on patients with benign disease course was not available. Similar to the discovery cohort, patients with benign phenotypes from the first validation cohort never received MS therapies. A summary of demographic and clinical characteristics of the first and second validation cohorts is shown in Additional file 1: Table S1.
The study was approved by the corresponding local ethics committees, and all participants provided informed consent.
TaqMan OpenArray genotyping
Genotyping of selected variants in the first validation cohort was performed using an OpenArray technology (Thermo Fisher Scientific, Massachusetts, USA) and following the manufacturer’s instructions. Briefly, DNA samples were loaded into custom designed arrays using an OpenArray® AccuFill System (Thermo Fisher Scientific). QuantStudio™ 12K Flex system (Thermo Fisher Scientific) was used for sample amplification and fluorescent data collection. Hapmap samples with known genotype were included as internal controls of the process. Genotype was assigned using Taqman Genotyper Software (Thermo Fisher Scientific). Genotyping was performed by the Human Genotyping laboratory of the Spanish National Cancer Research Centre (CNIO).
Sequenom MassARRAY genotyping
In the second validation cohort, selected variants were genotyped using a MassArray iPLEX platform (Sequenom, San Diego, CA, USA) as previously described .
CPXM2, IGSF9B, and NLRP9 expression analysis in peripheral blood cells
Gene expression levels for CPMX2, NLRP9, and IGSF9B were determined by real-time PCR in peripheral blood mononuclear cells (PBMC) available from a subgroup of untreated MS patients from the first validation cohort. In order to avoid a confounding effect of disease course in the expression levels for these genes, analysis was restricted to the group of patients with aggressive disease course (n = 8 for CPXM2; n = 9 for NLRP9; n = 7 for IGSF9B). Briefly, PBMC were isolated by Ficoll-Isopaque density gradient centrifugation (Gibco BRL, Life Technologies LTD, Paisley, UK) and stored in liquid nitrogen until used. Total RNA was extracted from PBMC using TRIzol® reagent (Invitrogen, Carlsbad, CA) and cDNA synthesized using the High Capacity cDNA Archive kit (Applied Biosystems, Foster City, CA, USA). Messenger RNA expression levels for CPMX2, NLRP9, and IGSF9B were determined by real-time PCR using TaqMan® probes specific for each gene (Applied Biosystems, Foster City, CA, USA). The housekeeping gene glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was used as an endogenous control (Applied Biosystems). Assays were run on the ABI PRISM® 7900HT system (Applied Biosystems) and data were analyzed using the 2−∆∆CT method . Results were expressed as fold change in gene expression in MS patients carrying the risk allele relative to non-carrier patients.
Immunohistochemistry for CPXM2, IGSF9B, and NLRP9 in MS brain tissue
Paraffin-embedded brain samples of chronic active lesions from four MS patients were provided by the UK Multiple Sclerosis Tissue Bank and stained with hematoxylin and eosin (HE) and Klüver-Barrera (KB) for inflammation and demyelination assessment. Four-micrometer-thick, paraffin-embedded serial sections were deparaffined in xylene and rehydrated in alcohol. Endogenous peroxidase activity was blocked with hydrogen peroxide (2%), methanol (70%), and PBS for 20 min. Antigen retrieval was performed in TE buffer (1 M TrismaBase and 1 mM EDTA) (pH = 9) in the microwave. Non-specific protein binding was blocked with 0.2% of bovine albumin (BSA) in PBS. Sections were incubated overnight at 4 °C with the following primary antibodies: rabbit anti-CPXM2 (Biorbyt), rabbit anti-NLRP9 (Abcam), and rabbit anti-IGSG9B (Abcam). Samples were incubated for 1 h at room temperature with goat-anti rabbit HRP secondary antibody (Dakocytomation) and stainings were visualized with 3,3′diaminobenzidine (Sigma, St Louis, MO, USA) as a chromogenic substrate.