Genome sequencing analysis identifies new loci associated with Lewy body dementia and provides insights into its genetic architecture

The genetic basis of Lewy body dementia (LBD) is not well understood. Here, we performed whole-genome sequencing in large cohorts of LBD cases and neurologically healthy controls to study the genetic architecture of this understudied form of dementia and to generate a resource for the scientific community. Genome-wide association analysis identified five independent risk loci, whereas genome-wide gene-aggregation tests implicated mutations in the gene GBA. Genetic risk scores demonstrate that LBD shares risk profiles and pathways with Alzheimer’s disease and Parkinson’s disease, providing a deeper molecular understanding of the complex genetic architecture of this age-related neurodegenerative condition.

also a hallmark feature of Parkinson's disease. The vast majority of LBD patients additionally exhibit Alzheimer's disease co-pathology 2 . These neuropathological observations have led to the, as yet unproven, hypothesis that LBD lies on a disease continuum between Parkinson's disease and Alzheimer's disease 3 . Though relatively common in the community, with an estimated 1.4 million prevalent cases in the United States 4 , the genetic contributions to this underserved condition are poorly understood.
The rapid advances in genome sequencing technologies offer unprecedented opportunities to identify and characterize disease-associated genetic variation. Here, we performed wholegenome sequencing in a cohort of 2,981 patients diagnosed with LBD and 4,391 neurologically healthy individuals. We analyzed these data using a genome-wide association study (GWAS) approach. This investigation identified five risk loci that were replicated in an independent case-control cohort 5,6 . We also performed gene aggregation tests, and we modeled the relative contributions of Alzheimer's disease and Parkinson's disease risk variants to this fatal neurodegenerative disease (see Fig. 1 for an analysis overview). Additionally, we created a resource for the scientific community to mine for new insights into the genetic etiology of LBD and to expedite the development of targeted therapeutics.

Genome-wide association analysis identifies new loci associated with LBD.
After quality control, whole-genome sequence data from 2,591 individuals diagnosed with LBD and 4,027 neurologically healthy individuals were available for study. Participants were recruited across 44 institutions/consortia and were diagnosed according to established consensus criteria. Using a GWAS approach, we identified five loci that surpassed the genome-wide significance threshold (Table 1 and Fig. 2a). Three of these signals were located at known LBD risk loci within the genes GBA, APOE, and SNCA [7][8][9][10] . The remaining GWAS signals in BIN1 and TMEM175 represented novel LBD risk loci. Notably, these loci have been implicated in other age-related neurodegenerative diseases, including Alzheimer's disease (BIN1) and Parkinson's disease (TMEM175) 11,12 . We examined the associations of BIN1 and TMEM175 risk alleles with CERAD and Braak semi-quantitative pathological measures of Alzheimer's disease co-pathology. We found that the BIN1 risk allele (rs6733839-T) was significantly associated with increased neurofibrillary tangle pathology (Fisher's exact test P-value based on Braak neurofibrillary tangle staging = 0.0002; Extended Data Fig. 1). In contrast, there was no significant association of the TMEM175 risk allele with Alzheimer's disease co-pathology. Conditional analyses detected a second signal at the APOE locus (see Extended Data Fig. 2 for regional association plots and Extended Data Fig. 3 for conditional association analyses). Subanalysis GWAS of pathologically defined LBD cases only versus control subjects identified the same risk loci (Fig. 2b). Finally, we replicated each of the observed risk loci in an independent sample of 970 European-ancestry LBD cases and 8,928 controls (Table 1) 5,6 .
Gene-level aggregation testing identifies GBA as a pleomorphic risk gene.
The significant loci from our GWAS explained only a small fraction (1%) of the conservatively estimated narrow-sense heritability of LBD of 10.81% (95% confidence interval [CI]: 8.28%-13.32%, P = 9.17 × 10 −4 ). To explore whether rare variants contribute to the remaining risk of LBD, we performed gene-level sequence kernel associationoptimized (SKAT-O) tests of missense mutations with a minor allele frequency (MAF) threshold ≤ 1% and a minor allele count (MAC) of ≥ 3 across the genome 13 . This rare variant analysis identified GBA as associated with LBD (Fig. 2c). GBA, encoding the lysosomal enzyme glucocerebrosidase, is a known pleomorphic risk gene for LBD and Parkinson's disease 7,14,15 , and our rare and common variant analyses confirm a prominent role of this gene in the pathogenesis of Lewy body diseases.

Functional inferences from colocalization and gene expression analyses.
Most GWAS loci are thought to operate through the regulation of gene expression 16,17 . Thus, we performed a colocalization analysis to determine whether a shared causal variant drives association signals for LBD risk and gene expression. Expression quantitative trait loci (eQTL) were obtained from eQTLGen and PsychENCODE 18,19 , the largest available human blood and brain eQTL datasets. We found evidence of colocalization between the TMEM175 locus and an eQTL regulating TMEM175 expression in blood (posterior probability for H 4 (PPH4) = 0.99; Fig. 3a and Supplementary Table 1). There was also colocalization between the association signal at the SNCA locus and an eQTL regulating SNCA-AS1 expression in the brain (PPH4 = 0.96; Fig. 3b and Supplementary Table 1). Interestingly, the index variant at the SNCA locus was located within the SNCA-AS1 gene, which overlaps with the 5'-end of SNCA and encodes a long noncoding antisense RNA species known to regulate SNCA expression. Sensitivity analyses confirmed that these colocalizations were robust to changes in the prior probability of a variant associating with both traits (Extended Data Fig. 4).
We interrogated the effect of each SNP in the region surrounding SNCA-AS1 on LBD risk using our GWAS data and SNCA-AS1 expression using the PsychENCODE data (Extended Data Fig. 5a). All genome-wide significant risk SNPs in the locus had a negative beta coefficient, while the shared SNCA-AS1 eQTL had a positive beta coefficient. This negative correlation suggested that increased SNCA-AS1 expression is associated with reduced LBD risk (Spearman's rho = −0.42; P = 0.0012; Extended Data Fig. 5b).
Analysis of human bulk-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) consortium and single-nucleus RNA-sequencing data of the medial temporal gyrus from the Allen Institute of Brain Science 20,21 demonstrated that TMEM175 is ubiquitously expressed, whereas SNCA-AS1 is predominantly expressed in brain tissue (Extended Data Fig. 6a and Supplementary Table 2). At the cellular level, TMEM175 is highly expressed in oligodendrocyte progenitor cells, while SNCA-AS1 demonstrates neuronal specificity (Extended Data Fig. 6b and Supplementary Table 2). SNCA and SNCA-AS1 share a similar, though not identical, tissue expression profile (Extended Data Fig. 7).

LBD risk overlaps with risk profiles of Alzheimer's disease and Parkinson's disease.
We leveraged our whole-genome sequence data to explore the etiological relationship between Alzheimer's disease, Parkinson's disease, and LBD. To do this, we applied genetic risk scores derived from large-scale GWAS analyses of Alzheimer's disease and Parkinson's disease to individual-level genetic data from our LBD case-control cohort 22,23 . We tested the associations of the Alzheimer's disease and Parkinson's disease genetic risk scores with LBD disease status, and with age at death, age at onset, and the duration of illness observed among the LBD cases. Individuals diagnosed with LBD had a higher genetic risk for developing both Alzheimer's disease (odds ratio [OR] = 1.66 per standard deviation of Alzheimer's disease genetic risk, 95% CI = 1.58-1.74, P < 2 × 10 −16 , Fig. 5a) and Parkinson's disease (OR = 1.20, 95% CI =1.14-1.26, P = 4.34 × 10 −12 , Fig. 5b). These risk scores remained significant after adjusting for genes that substantially contribute to Alzheimer's disease (model after adjustment for APOE: OR = 1.53, 95% CI = 1.37-1.72, P = 3.29 × 10 −14 ) and Parkinson's disease heritable risk (model after adjustment for GBA, SNCA, and LRRK2: OR = 1.26, 95% CI = 1.19-1.34, P = 5.91 × 10 −14 ). The Alzheimer's disease genetic risk score was also found to be significantly associated with an earlier age of death in LBD (β = −1.77 years per standard deviation increase in the genetic risk score from the population mean, standard error [SE] = 0.19, P < 2 × 10 −16 ) and shorter disease duration (β = −0.90 years, SE = 0.27, P = 0.0007). In contrast, the Parkinson's disease genetic risk score was associated with an earlier age at onset among patients diagnosed with LBD (β = −0.98, SE = 0.28, P = 0.00045), indicating that higher Parkinson's disease risk is associated with earlier age at onset in LBD. We found no evidence of interaction between the genetic risk scores of Alzheimer's disease and Parkinson's disease in the LBD cohort (OR = 0.99, 95 % CI = 0.95-1.03, P = 0.59), implying that Alzheimer's disease and Parkinson's disease risk variants are independently associated with LBD risk.

Enrichment analysis identifies pathways involved in LBD.
Pathway enrichment analysis of LBD, using a polygenic risk score based on the GWAS risk variants, found several significantly enriched gene ontology processes associated with LBD (Fig. 5). These related to the regulation of amyloid-beta formation (adjusted P = 0.04), regulation of endocytosis (adjusted P = 0.02), tau protein binding (adjusted P = 1.85 × 10 −5 ), and others. Among these, the regulation of amyloid precursor protein, amyloid-beta formation, and tau protein binding have been previously implicated in the pathogenesis of Alzheimer's disease, while regulation of endocytosis is particularly important in the pathogenesis of Parkinson's disease 24,25 . These observations support the notion of overlapping disease-associated pathways in these common age-related neurodegenerative diseases.

Association of polygenic risk with clinical dementia severity.
We performed an association analysis of LBD polygenic risk with dementia severity, as measured by the Clinical Dementia Rating scale 26 . We found that LBD patients in the highest polygenic risk score quintile had more severe impairment at baseline evaluation compared to LBD patients in the lowest quintile (χ 2 = 5.60, df = 1, P = 0.009; Extended Data Fig. 8).

Discussion
Our analyses highlight the contributions of common and rare variants to the complex genetic architecture of LBD, a common and fatal neurodegenerative disease. Specifically, our GWAS identified five independent genome-wide significant loci (GBA, BIN1, TMEM175, SNCA-AS1, APOE) that influence risk for developing LBD, whereas the genome-wide gene-based aggregation tests implicated mutations in GBA as being critical in the pathogenesis of the disease. We further detected strong cis-eQTL colocalization signals at the TMEM175 and SNCA-AS1 loci, indicating that the risk of disease at these genomic regions may be driven by expression changes of these particular genes. Finally, we provided definitive evidence that the risk of LBD is driven, at least in part, by genetic variants associated with the risk of developing both Alzheimer's disease and Parkinson's disease.
We replicated all five GWAS signals in an independent LBD case-control dataset derived from imputed genotyping array data. Among these, GBA (encoding the lysosomal enzyme glucocerebrosidase), APOE (encoding apolipoprotein E), and SNCA (encoding α-synuclein) are known LBD risk genes [7][8][9] . In addition to these previously described loci, we identified a novel locus on chromosome 2q14.3, located 28 kb downstream of the BIN1 gene, which is a known risk locus for Alzheimer's disease 11 . BIN1 encodes the bridging integrator 1 protein that is involved in endosomal trafficking. The depletion of BIN1 reduces the lysosomal degradation of β-site APP-cleaving enzyme 1 (BACE1), resulting in increased amyloid-β production 27 . Furthermore, the loss of BIN1 promotes the propagation of tau pathology by increasing aggregate internalization via endocytosis and endosomal trafficking 28 . The direction of effect observed in LBD is the same as in Alzheimer's disease (Supplementary Table 3). The observed pleiotropic effects between LBD and Alzheimer's disease prompt us to speculate that mitigating BIN1-mediated endosomal dysfunction could have therapeutic implications in both neurodegenerative diseases.
A second novel LBD signal was detected within the lysosomal TMEM175 gene on chromosome 4p16.3, a known Parkinson's disease risk locus 12 . Deficiency of TMEM175, encoding a transmembrane potassium channel, impairs lysosomal function, lysosomemediated autophagosome clearance, and mitochondrial respiratory capacity. Loss-offunction further increases the deposition of phosphorylated α-synuclein 29 , which makes TMEM175 a plausible LBD risk gene. The direction of effect is the same in LBD as it is in Parkinson's disease (Supplementary Table 3), and identification of TMEM175 underscores the role of lysosomal dysfunction in the pathogenesis of Lewy body diseases.
Our data confirm the hypothesis that the LBD genetic architecture is complex and overlaps with the risk profiles of Alzheimer's disease and Parkinson's disease. First, several genomewide significant risk loci in our GWAS analysis have been previously described either in the Alzheimer's disease literature (APOE, BIN1) or have been associated with risk of developing Parkinson's disease (GBA, TMEM175, SNCA) 11,12,[30][31][32] . Second, genome-wide gene-based aggregation tests of rare mutations similarly identified GBA, which has been previously implicated in Parkinson's disease 7 . Third, genetic risk scores derived from Alzheimer's disease and Parkinson's disease GWAS meta-analyses predicted risk for LBD independently, even after removal of the strongest signals (APOE, GBA, SNCA, and LRRK2). Interestingly, our data did not show a synergistic effect between the risk of Parkinson's disease and Alzheimer's disease in the pathogenesis of LBD, though analysis of larger cohorts will be required to confirm this observation.
Comparing the patterns of the risk loci in LBD with the patterns of risk in published Parkinson's disease and Alzheimer's disease GWAS meta-analyses provided additional insights into this complex relationship. The directions of effect at the index variants of the GBA and TMEM175 loci were the same in LBD as the directions observed in Parkinson's disease 23 . Likewise, the directions of effect for the BIN1 and APOE signals were the same as the directions detected in Alzheimer's disease (Supplementary Table 3) 33 . However, we observed a notably different profile at the SNCA locus in LBD compared to Parkinson's disease. Our GWAS and colocalization analyses implicated SNCA-AS1, a non-coding RNA that regulates SNCA expression, as the main signal at the SNCA locus. In contrast, the main signal in Parkinson's disease is detected at the 3'-end of SNCA 34 . This finding suggests that the regulation of SNCA expression may be different in LBD compared to Parkinson's disease and that only specific SNCA transcripts that are regulated by SNCA-AS1 drive risk for developing dementia. Further, SNCA-AS1 may prove to be a more amenable therapeutic target than SNCA itself due to its neuronal specificity.
As part of this study, we created a foundational resource that will facilitate the study of molecular mechanisms across a broad spectrum of neurodegenerative diseases. We anticipate that these data will be widely accessed for several reasons. First, the resource is the largest whole-genome sequence repository in LBD to date. Second, the nearly 2,000 neurologically healthy, aged individuals included within this resource can be used as control subjects for the study of other neurological and age-related diseases. Third, we prioritized the inclusion of pathologically confirmed LBD patients, representing more than two-thirds of the case cohort, to ensure high diagnostic accuracy among our case cohort participants. Finally, all genomes are of high quality and were generated using a uniform genome sequencing, alignment, and variant-calling pipeline. Whole genome sequencing data on this large case-control cohort has allowed us to undertake a comprehensive genomic evaluation of both common and rare variants, including immediate fine-mapping of association signals to pinpoint the functional variants at the TMEM175 and SNCA-AS1 loci. The availability of genome-sequence data will facilitate similar comprehensive evaluations of less commonly studied variant types, such as repeat expansions and structural variants.
Our study has limitations. We focused on individuals of European ancestry, as this is the population in which large cohorts of LBD patients were readily available. Recruiting patients and healthy controls from diverse populations will be crucial for future research to understand the genetic architecture of LBD. Another constraint is the use of short-read sequencing, rather than long-read sequencing applications, that limits the resolution of complex, repetitive, and GC-rich genomic regions 35 . Most study participants did not have in-depth phenotype information using standardized rating scales available. Further, despite our large sample size, we had limited power to detect common genetic variants of small effect size, and additional large-scale genomic studies will be required to unravel the missing heritability of LBD.
In conclusion, our study identified novel loci as relevant in the pathogenesis of LBD. Our findings confirmed that LBD genetically intersects with Alzheimer's disease and Parkinson's disease and highlighted the polygenic contributions of these other neurodegenerative diseases to its pathogenesis. Determining shared molecular genetic relationships among complex neurodegenerative diseases paves the way for precision medicine and has implications for prioritizing targets for therapeutic development. We have made the whole-genome sequence data available to the research community. These genomes constitute the largest sequencing effort in LBD to date and are designed to accelerate the pace of discovery in dementia.

Cohort description and study design.
A total of 5,154 participants of European ancestry (2,981 LBD cases, 2,173 neurologically healthy controls) were recruited across 17 European and 27 North American sites/consortia to create a genomic resource for LBD research (Supplementary Table 4). In addition to these resource genomes, we obtained convenience control genomes from (i) the Wellderly cohort (n = 1,202), a cohort of healthy, aged European-ancestry individuals recruited in the United States 36 , and (ii) European-ancestry control genomes generated by the National Institute on Aging and the Accelerating Medicine Partnership -Parkinson's Disease Initiative (www.amp-pd.org; n = 1,016). This brought the total number of control individuals available for this study to 4,391.
All control cohorts were selected based on a lack of evidence of cognitive decline in their clinical history and absence of neurological deficits on neurological examination. Pathologically confirmed control individuals (n = 605) had no evidence of significant neurodegenerative disease on histopathological examination. LBD patients were diagnosed with pathologically definite or clinically probable disease according to consensus criteria 2,37 . The case cohort included 1,789 (69.0%) autopsy-confirmed LBD cases and 802 (31.0%) clinically probable LBD patients. 63.4% of LBD cases were male, as is typical for the LBD patient population 38 . The demographic characteristics of the cohorts are summarized in Supplementary Table 5. The appropriate institutional review boards of participating institutions approved the study (03-AG-N329, NCT02014246), and informed consent was obtained from all subjects or their surrogate decision-makers, according to the Declaration of Helsinki.

Whole-genome sequencing.
Fluorometric quantitation of the genomic DNA samples was performed using the PicoGreen dsDNA assay (Thermo Fisher). PCR-free, paired-end libraries were constructed by automated liquid handlers using the Illumina TruSeq chemistry according to the manufacturer's protocol. DNA samples underwent sequencing on an Illumina HiSeq X Ten sequencer (v.2.5 chemistry, Illumina) using 150 bp, paired-end cycles.

Sequence alignment, variant calling.
Genome sequence data were processed using the pipeline standard developed by the Centers for Common Disease Genomics (CCDG; https://www.genome.gov/27563570/). This standard allows for whole-genome sequence data processed by different groups to generate 'functionally equivalent' results 39 . The GRCh38DH reference genome was used for alignment, as specified in the CCDG standard. For whole-genome sequence alignments and processing, the Broad Institute's implementation of the functional equivalence standardized pipeline was used. This pipeline, which incorporates the GATK (2016) Best Practices 40 , was implemented in the workflow description language for deployment and execution on the Google Cloud Platform. Single-nucleotide variants and indels were called from the processed whole-genome sequence data following the GATK Best Practices using another Broad Institute workflow for joint discovery and Variant Quality Score Recalibration. Both Broad workflows for WGS sample processing and joint discovery are publicly available (https://github.com/gatk-workflows/broad-prod-wgs-germline-snps-indels). All wholegenome sequence data were processed using the same pipeline.
After these quality control filters were applied, there were 6,651 samples available for analysis. Extended Data Figure 10 shows quality control metrics.

Statistical analysis for single-variant association.
We performed a GWAS in LBD (n = 2,591 cases and 4,027 controls) using logistic regression in PLINK (v.2.0) with a minor allele frequency threshold of >1% based on the allele frequency estimates in the LBD case cohort 45 . We used the step function in the R MASS package to determine the minimum number of principal components (generated from common single nucleotide variants) required to correct for population substructure 46 . The first two principal components in our study cohorts compared to the HapMap3 Genomic Resource Panel are shown in Extended Data Figure 9a. Based on this analysis, we incorporated sex, age, and five principal components (PC1, PC3, PC4, PC5, PC7) as covariates in our model. Quantile-quantile plots revealed minimal residual population substructure, as estimated by the sample size-adjusted genome-wide inflation factor λ 1000 of 1.004 (Extended Data Fig. 9b). The Bonferroni threshold for genome-wide significance was 5.0 × 10 -8 . A conditional analysis was performed for each GWAS locus by adding each respective index variant to the covariates (Extended Data Fig. 3).
For the LBD GWAS replication analysis, we obtained genotyping array data from two independent, non-overlapping, European-ancestry LBD case-control cohorts, totaling 970 LBD cases and 8,928 controls, as described elsewhere 5,6 . The data were cleaned by applying the same sample-and variant-level quality control steps that were used in the discovery genomes. We imputed the data against the NHLBI TOPMed imputation reference panel under default settings with Eagle v.2.4 phasing [47][48][49] . Variants with an R 2 value < 0.3 were excluded. A meta-analysis of the two cohorts was performed with METAL under a fixedeffects model and variants that were significant in the discovery stage were extracted 50 .

Genotype-pathology association analysis.
We evaluated the association of the newly identified LBD risk alleles in BIN1 (rs6733839-T) and TMEM175 (rs6599388-T) with the pathological changes of Alzheimer's disease.
Neuritic plaque staging information, assessed by the CERAD method 51 , was available for 700 pathologically confirmed LBD cases, while neurofibrillary tangle pathology staging, as assessed by Braak method 52 , was available for 1,459 definite LBD cases. Association testing between the risk alleles and the semi-quantitative neuritic plaque and neurofibrillary tangle burden was performed using Fisher's exact tests.

Colocalization analyses.
Coloc (v.4.0.1) was used to evaluate the probability of LBD loci and expression quantitative trait loci (eQTL) sharing a single causal variant 53 . This tool incorporates a Bayesian statistical framework that computes posterior probabilities for five hypotheses: namely, there is no association with either trait (hypothesis 0, H 0 ); an associated LBD variant exists but no associated eQTL variant (H 1 ); there is an associated eQTL variant but no associated LBD variant (H 2 ); there is an association with an eQTL and LBD risk variant, but they are two independent variants (H 3 ); and there is a shared associated LBD variant and eQTL variant within the analyzed region (H 4 ). Cis-eQTL were derived from eQTLGen (n = 31,684 individuals; accessed 19 February 2020) and PsychENCODE (n = 1,387 individuals; accessed 20 February 2020) 18,19 . For each locus, we examined all genes within 1 Mb of a significant region of interest, as defined by our LBD GWAS (P < 5.0 × 10 −8 ). Coloc was run using the default p 1 = 10 −4 and p 2 = 10 −4 priors, while the p 12 prior was set to p 12 = 5 × 10 −6 54 . Loci with a posterior probability for H 4 (PPH4) ≥ 0.90 were considered colocalized. All colocalizations were subjected to sensitivity analyses to explore the robustness of our conclusions to changes in the p 12 prior (i.e., the probability that a given variant affects both traits).

Cell-type and tissue specificity measures.
To determine specificity of a gene's expression to a tissue or cell-type, specificity values were generated from two independent gene expression datasets: (1) bulk-tissue RNAsequencing of 53 human tissues from the Genotype-Tissue Expression consortium (GTEx; v.8) 21 ; and (2) human single-nucleus RNA-sequencing of the middle temporal gyrus from the Allen Institute for Brain Science (n = 7 cell types) 20 . Specificity values for GTEx were generated using modified code from a previous publication 55 . Expression of tissues was averaged by organ (except in the case of brain; n = 35 tissues in total). Specificity values for the Allen Institute for Brain Science-derived dataset were generated using gene-level exonic reads and the 'generate.celltype.data' function of the EWCE package 56 . The specificity values for both datasets and the code used to generate these values are available at https:// github.com/RHReynolds/MarkerGenes.

Heritability analysis.
The narrow-sense heritability (h 2 ), a measure of the additive genetic variance, was calculated using GREML-LDMS to determine how much of the genetic liability for LBD is explained by common genetic variants 57 . This analysis included unrelated individuals (pi-hat < 0.125, n = 2,591 LBD cases, and n = 4,027 controls) and autosomal variants with a MAF >1%. The analysis was adjusted for sex, age, and five principal components (PC1, PC3, PC4, PC5, PC7), and a disease prevalence of 0.1% to account for ascertainment bias.

Gene-based rare variant association analysis.
We conducted a genome-wide, gene-based sequence kernel association test -optimized (SKAT-O) analysis of missense mutations to determine the difference in the aggregate burden of rare coding variants between LBD cases and controls 64 . This analysis was performed in RVTESTS (v.2.1.0) using default parameters after annotating variants in ANNOVAR (v.2018-04/16) 58,59 . The study cohort for this analysis consisted of 2,591 LBD cases and 4,027 control subjects. We used a MAF threshold of ≤ 1% and a minor allele count (MAC) of ≥ 3 as filters. The covariates used in this analysis included sex, age, and five principal components (PC1, PC3, PC4, PC5, PC7). The Bonferroni threshold for genomewide significance was 2.86 × 10 −6 (0.05 / 17,483 autosomal genes tested).

Predictions of LBD risk using Alzheimer's disease and Parkinson's disease risk scores.
Genetic risk scores were generated using PLINK (v.1.9) based on summary statistics from recent Alzheimer's disease and Parkinson's disease GWAS meta-analyses. Considering the LBD cohort as our target dataset, risk allele dosages were counted across Alzheimer's disease or Parkinson's disease loci per sample (i.e., giving a dose of two if homozygous for the risk allele, one if heterozygous, and zero if homozygous for the alternate allele). The SNPs were weighted by their log odds ratios, giving greater weight to alleles with higher risk estimates, and a composite genetic risk score was generated across all risk loci. Genetic risk scores were z-transformed prior to analysis, centered on controls, with a mean of zero and a standard deviation of one in the control subjects. Regression models were then applied to test for association with the risk of developing LBD (based on logistic regression) or the age at death, age at onset, and disease duration (linear regression), adjusting for sex, age (risk and disease duration only), and five principal components (PC1, PC3, PC4, PC5, PC7) to account for population stratification.

Polygenic risk score generation for pathway enrichment and phenotype associations.
A genome-wide LBD polygenic risk score was generated using PRSice-2. The polygenic risk score was computed by summing the risk alleles associated with LBD that had been weighted by the effect size estimates generated by performing a GWAS in the pathologically confirmed LBD cases and controls. This workflow identified the optimum P-value threshold (1 × 10 −4 in our dataset) for variant selection, allowing for the inclusion of variants that failed to reach genome-wide significance but that contributed to disease risk, nonetheless. After excluding variants without an rs-identifier, the remaining 122 variants were ranked based on their GWAS P-values, with the APOE, GBA, SNCA, BIN1 and TMEM175 genes added to the top five positions. The list was then analyzed for pathway enrichment using the g:Profiler toolkit (v.0.1.8). We defined the genes involved in the pathways and gene sets using the following databases: (i) Gene Ontology, (ii) Kyoto Encyclopedia of Genes and Genomes, (iii) Reactome, and (iv) WikiPathways 60,61 . Significant pathways and gene lists with a single gene or containing more than 1,000 genes were discarded. Significance was defined as P < 0.05. The g:Profiler algorithm applies a Bonferroni correction to the P-value for each pathway to correct for multiple testing.
Next, we tested whether the same LBD polygenic risk scores were associated with cognitive impairment, as measured by the Clinical Dementia Rating scale. This analysis was performed in the 214 LBD cases provided by the National Alzheimer's Coordinating Center, as this was the only cohort for which the Clinical Dementia Rating scale had been collected at baseline evaluation. Genetic risk scores were z-transformed before separating all cases into quintiles based on their individual polygenic risk scores. A two-proportions z-test was performed to compare the proportion of severe LBD cases within the highest genetic risk score quintile group versus the lowest quintile.

Extended Data
Extended Data Fig. 1. BIN1 and TMEM175

Author Manuscript
Author Manuscript Author Manuscript

Author Manuscript
Fisher's exact test P-value on Braak staging = 0.0002). Although the proportion of LBD cases that had high neuritic plaque burden was higher in homozygous risk allele carries compared to homozygous major allele carries, the difference between these groups was not statistically significant (P = 0.23). There was no association of TMEM175 risk allele dosage and Alzheimer's disease co-pathology, though a trend toward lower Alzheimer's disease copathology was observed among homozygous TMEM175 risk allele carriers.
a-g, Regional association plots, local linkage disequilibrium, and recombination rates at the significantly associated LBD GWAS risk signals. Regional associations are plotted as a function of their genomic position, denoting the index variant by a red diamond.  Bonferroni threshold for genome-wide significance. For a and b, the gene with the closest proximity to the top variant at each significant locus is listed. Green font was used to highlight known LBD risk loci, while black font indicates novel association signals.  Alzheimer's disease and Parkinson's disease genetic risk scores predict risk for LBD and highlight overlapping molecular risk profiles. a, Violin plots comparing z-transformed Alzheimer's disease genetic risk score distributions in LBD cases, controls, and 100 random Alzheimer's disease cases. b, Violin plots comparing z-transformed Parkinson's disease genetic risk score distributions for LBD cases, controls, and 100 random Parkinson's disease cases. The center line of each violin plot is the median, the box limits depict the interquartile range, and whiskers correspond to the 1.5x interquartile range. Abbreviations: GRS, genetic risk score; AD, Alzheimer's disease; PD, Parkinson's disease.  For each of the five loci, the variant with the lowest P-value is listed. The gene that is in closest proximity to the top variant at each locus is represented. The chromosomal position is shown according to hg38. Genome-wide significance was defined as P < 5 × 10 −8 . Abbreviations: A1, other allele; A2, effect allele; EAF, effect allele frequency; OR, odds ratio; Chr, chromosome; CI, confidence interval.
Nat Genet. Author manuscript; available in PMC 2021 August 15.