In Press
Kim J, Huang AY, Isacco L, Lai J, Miller MB, Lodato MA, Walsh CA**, Lee EA**. Prevalence and mechanisms of somatic deletions in single human neurons during normal aging and in DNA repair disorders. Nature Communications In Press;
Maury EA, Sherman MA, Genovese G, Gilgenast TG, Rajarajan P, Flaherty E, Akbarian S, Chess A, McCarroll SA, Loh P-R, Phillips-Cremins JE, Brennand KJ, Walters JTR, O’ Donovan M, Sullivan P, and workgroup PGCSCNV, and workgroup PGCSCNV, Sebat J, Lee EA, Walsh CA. Schizophrenia-associated somatic copy number variants from 12,834 cases reveal contribution to risk and recurrent, isoform-specific NRXN1 disruptions [Internet]. medRxiv Submitted; Publisher's Version
Maury EA*, Jones A*, Seplyarskiy V*, Rosenbluh C, Bae T, Wang Y, Abyzov A, Khoshkoo S, Chahine Y, Chahine Y, Park PJ, Akbarian S, Lee EA, Sunyaev SR, Walsh CA**, Chess A**. Enrichment of somatic mutations in schizophrenia brain targets prenatally active transcription factor bindings sites [Internet]. bioRxiv Submitted; Publisher's Version
Ortiz NJ*, Choi J*, Bodamer O, Lee EA, Heiman MG. A degron strategy for systematic partial depletion of proteins reveals a dose-dependent response to loss of the Kabuki Syndrome factors SET-16/KMT2D and UTX-1/KDM6A. Submitted;
Lai J, Kim J, Jeffries AM, Tolles A, Chittenden TW, Buckley PG, Yu TW, Lodato MA**, Lee EA**. Single-nucleus transcriptomic analyses reveal microglial activation underlying cerebellar degeneration in Ataxia Telangiectasia [Internet]. bioRxiv Submitted; Publisher's VersionAbstract
While ATM loss-of-function has long been identified as the genetic cause of Ataxia Telangiectasia (AT), how this genetic mutation leads to selective and progressive cerebellar degeneration of Purkinje and granule cells remains unknown. We performed single-nucleus RNA-sequencing of the human cerebellum and prefrontal cortex from individuals with AT and matched unaffected controls to identify AT-associated transcriptomic changes in a cell-type- and brain-region-specific manner. We provide the largest single-nucleus transcriptomic atlas of the adult human cerebellum to-date (126,356 nuclei), identify upregulation of apoptotic and ER stress pathways in Purkinje and granule neurons, and uncover strong downregulation of calcium ion homeostasis genes in Purkinje neurons. Our analysis reveals prominent inflammation of microglia in AT cerebellum with transcriptional signatures similar to aging and neurodegenerative microglia, and suggests that microglia activation precedes Purkinje and granule neuron death in disease progression. Our data implicates a novel role of microglial activation underlying cerebellar degeneration in AT.
Miller MB*, Huang AY*, Kim J, Zhou Z, Kirkham SL, Maury EA, Ziegenfuss JS, Reed HC, Neil JE, Rento L, Ryu SC, Ma CC, Luquette LJ, Ames HM, Oakley DH, Frosch MP, Hyman BT, Lodato MA**, Lee EA**, Walsh CA**. Somatic genomic changes in single Alzheimer’s disease neurons [Internet]. Nature 2022;604(7907):714-722. Publisher's VersionAbstract
Dementia in Alzheimer’s disease progresses alongside neurodegeneration, but the specific events that cause neuronal dysfunction and death remain poorly understood. During normal ageing, neurons progressively accumulate somatic mutations at rates similar to those of dividing cells which suggests that genetic factors, environmental exposures or disease states might influence this accumulation. Here we analysed single-cell whole-genome sequencing data from 319 neurons from the prefrontal cortex and hippocampus of individuals with Alzheimer’s disease and neurotypical control individuals. We found that somatic DNA alterations increase in individuals with Alzheimer’s disease, with distinct molecular patterns. Normal neurons accumulate mutations primarily in an age-related pattern (signature A), which closely resembles ‘clock-like’ mutational signatures that have been previously described in healthy and cancerous cells. In neurons affected by Alzheimer’s disease, additional DNA alterations are driven by distinct processes (signature C) that highlight C>A and other specific nucleotide changes. These changes potentially implicate nucleotide oxidation, which we show is increased in Alzheimer’s-disease-affected neurons in situ. Expressed genes exhibit signature-specific damage, and mutations show a transcriptional strand bias, which suggests that transcription-coupled nucleotide excision repair has a role in the generation of mutations. The alterations in Alzheimer’s disease affect coding exons and are predicted to create dysfunctional genetic knockout cells and proteostatic stress. Our results suggest that known pathogenic mechanisms in Alzheimer’s disease may lead to genomic damage to neurons that can progressively impair function. The aberrant accumulation of DNA alterations in neurodegeneration provides insight into the cascade of molecular and cellular events that occurs in the development of Alzheimer’s disease.
Choudhury S*, Huang AY*, Kim J, Zhou Z, Morillo K, Maury EA, Tsai JW, Miller MB, Lodato MA, Araten S, Hilal N, Lee EA**, Chen MH**, Walsh CA**. Somatic mutations in single human cardiomyocytes reveal age-associated DNA damage and widespread genotoxicity [Internet]. Nature Aging 2022; Publisher's VersionAbstract
The accumulation of somatic DNA mutations over time is a hallmark of aging in many dividing and nondividing cells but has not been studied in postmitotic human cardiomyocytes. Using single-cell whole-genome sequencing, we identified and characterized the landscape of somatic single-nucleotide variants (sSNVs) in 56 single cardiomyocytes from 12 individuals (aged from 0.4 to 82 years). Cardiomyocyte sSNVs accumulate with age at rates that are faster than in many dividing cell types and nondividing neurons. Cardiomyocyte sSNVs show distinctive mutational signatures that implicate failed nucleotide excision repair and base excision repair of oxidative DNA damage, and defective mismatch repair. Since age-accumulated sSNVs create many damaging mutations that disrupt gene functions, polyploidization in cardiomyocytes may provide a mechanism of genetic compensation to minimize the complete knockout of essential genes during aging. Age-related accumulation of cardiac mutations provides a paradigm to understand the influence of aging on cardiac dysfunction.
Bourseguin J, Cheng W, Talbot E, Hardy L, Lai J, Jeffries AM, Lodato MA, Lee EA, Khoronenkova SV. Persistent DNA damage associated with ATM kinase deficiency promotes microglial dysfunction [Internet]. Nucleic Acids Res 2022;50(5):2700-2718. Publisher's VersionAbstract
The autosomal recessive genome instability disorder Ataxia-telangiectasia, caused by mutations in ATM kinase, is characterized by the progressive loss of cerebellar neurons. We find that DNA damage associated with ATM loss results in dysfunctional behaviour of human microglia, immune cells of the central nervous system. Microglial dysfunction is mediated by the pro-inflammatory RELB/p52 non-canonical NF-κB transcriptional pathway and leads to excessive phagocytic clearance of neuronal material. Activation of the RELB/p52 pathway in ATM-deficient microglia is driven by persistent DNA damage and is dependent on the NIK kinase. Activation of non-canonical NF-κB signalling is also observed in cerebellar microglia of individuals with Ataxia-telangiectasia. These results provide insights into the underlying mechanisms of aberrant microglial behaviour in ATM deficiency, potentially contributing to neurodegeneration in Ataxia-telangiectasia.
Zhao B, Madden JA, Lin J, Berry GT, Wojcik MH, Zhao X, Brand H, Talkowski M, Lee EA**, Agrawal** PB. A neurodevelopmental disorder caused by a novel de novo SVA insertion in exon 13 of the SRCAP gene. [Internet]. European Journal of Human Genetics 2022; Publisher's Version
Huang AY, Lee EA. Identification of somatic mutations from bulk and single-cell sequencing data [Internet]. Frontiers in Aging (mini review) 2022;2:800380 Publisher's VersionAbstract

Somatic mutations are DNA variants that occur after the fertilization of zygotes and accumulate during the developmental and aging processes in the human lifespan. Somatic mutations have long been known to cause cancer, and more recently have been implicated in a variety of non-cancer diseases. The patterns of somatic mutations, or mutational signatures, also shed light on the underlying mechanisms of the mutational process. Advances in next-generation sequencing over the decades have enabled genome-wide profiling of DNA variants in a high-throughput manner; however, unlike germline mutations, somatic mutations are carried only by a subset of the cell population. Thus, sensitive bioinformatic methods are required to distinguish mutant alleles from sequencing and base calling errors in bulk tissue samples. An alternative way to study somatic mutations, especially those present in an extremely small number of cells or even in a single cell, is to sequence single-cell genomes after whole-genome amplification (WGA); however, it is critical and technically challenging to exclude numerous technical artifacts arising during error-prone and uneven genome amplification in current WGA methods. To address these challenges, multiple bioinformatic tools have been developed. In this review, we summarize the latest progress in methods for identification of somatic mutations and the challenges that remain to be addressed in the future.

Ganz J*, Maury EA*, Becerra B, Bizzotto S, Doan RN, Kenny CJ, Shin T, Kim J, Zhou Z, Ligon KL, Lee EA**, Walsh CA**. Rates and patterns of clonal oncogenic mutations in the normal human brain [Internet]. Cancer Discovery 2021; Publisher's VersionAbstract
While oncogenic mutations have been found in non-diseased, proliferative non-neural tissues, their prevalence in the human brain is unknown. Targeted sequencing of genes implicated in brain tumors in 418 samples derived from 110 individuals of varying ages, without tumor diagnoses, detected oncogenic somatic single-nucleotide variants (sSNVs) in 5.4% of the brains, including IDH1 R132H. These mutations were largely present in subcortical white matter and enriched in glial cells, and surprisingly, were less common in older individuals. A depletion of high-allele frequency sSNVs representing macroscopic clones with age was replicated by analysis of bulk RNAseq data from 1,816 non-diseased brain samples ranging from fetal to old age. We also describe large clonal copy number variants, and that sSNVs show mutational signatures resembling those found in gliomas, suggesting that mutational processes of the normal brain drive early glial oncogenesis. This study helps understand the origin and early evolution of brain tumors.
Borges-Monroy R*, Chu C*, Dias C, Choi J, Lee S, Gao Y, Shin T, Park PJ, Walsh CA**, Lee EA**. Whole-genome analysis of de novo and polymorphic retrotransposon insertions in Autism Spectrum Disorder [Internet]. Mobile DNA 2021;12(28) Publisher's VersionAbstract
Retrotransposons have been implicated as causes of Mendelian disease, but their role in autism spectrum disorder (ASD) has not been systematically defined, because they are only called with adequate sensitivity from whole genome sequencing (WGS) data and a large enough cohort for this analysis has only recently become available.
We analyzed WGS data from a cohort of 2288 ASD families from the Simons Simplex Collection by establishing a scalable computational pipeline for retrotransposon insertion detection. We report 86,154 polymorphic retrotransposon insertions—including > 60% not previously reported—and 158 de novo retrotransposition events. The overall burden of de novo events was similar between ASD individuals and unaffected siblings, with 1 de novo insertion per 29, 117, and 206 births for Alu, L1, and SVA respectively, and 1 de novo insertion per 21 births total. However, ASD cases showed more de novo L1 insertions than expected in ASD genes. Additionally, we observed exonic insertions in loss-of-function intolerant genes, including a likely pathogenic exonic insertion in CSDE1, only in ASD individuals.
These findings suggest a modest, but important, impact of intronic and exonic retrotransposon insertions in ASD, show the importance of WGS for their analysis, and highlight the utility of specific bioinformatic tools for high-throughput detection of retrotransposon insertions.
Bim LV, Carneiro TNR, Buzatto VC, Colozza-Gama GA, Koyama FC, Thomaz DMD, Paniza ACDJ, Lee EA, Galante PAF, Cerutti JM. Molecular Signature Expands the Landscape of Driver Negative Thyroid Cancers [Internet]. Cancers 2021;13(20) Publisher's VersionAbstract
Thyroid cancer is the most common endocrine malignancy. However, the cytological diagnosis of follicular thyroid carcinoma (FTC), Hürthle cell carcinoma (HCC), and follicular variant of papillary thyroid carcinoma (FVPTC) and their benign counterparts is a challenge for preoperative diagnosis. Nearly 20–30% of biopsied thyroid nodules are classified as having indeterminate risk of malignancy and incur costs to the health care system. Based on that, 120 patients were screened for the main driver mutations previously described in thyroid cancer. Subsequently, 14 mutation-negative cases that are the main source of diagnostic errors (FTC, HCC, or FVPTC) underwent RNA-Sequencing analysis. Somatic variants in candidate driver genes (ECD, NUP98,LRP1B, NCOR1, ATM, SOS1, and SPOP) and fusions were described. NCOR1 and SPOP variants underwent validation. Moreover, expression profiling of driver-negative samples was compared to 16 BRAF V600E, RAS, or PAX8-PPARg positive samples. Negative samples were separated in two clusters, following the expression pattern of the RAS/PAX8-PPARg or BRAF V600E positive samples. Both negative groups showed distinct BRS, ERK, and TDS scores, tumor mutation burden, signaling pathways and immune cell profile. Altogether, here we report novel gene variants and describe cancer-related pathways that might impact preoperative diagnosis and provide insights into thyroid tumor biology.
Wang Y*, Zhao B*, Choi J, Lee EA. Genomic approaches to trace the history of human brain evolution with an emerging opportunity for transposon profiling of ancient humans [Internet]. Mobile DNA (review) 2021;12(22) Publisher's VersionAbstract

Transposable elements (TEs) significantly contribute to shaping the diversity of the human genome, and lines of evidence suggest TEs as one of driving forces of human brain evolution. Existing computational approaches, including cross-species comparative genomics and population genetic modeling, can be adapted for the study of the role of TEs in evolution. In particular, diverse ancient and archaic human genome sequences are increasingly available, allowing reconstruction of past human migration events and holding the promise of identifying and tracking TEs among other evolutionarily important genetic variants at an unprecedented spatiotemporal resolution. However, highly degraded short DNA templates and other unique challenges presented by ancient human DNA call for major changes in current experimental and computational procedures to enable the identification of evolutionarily important TEs. Ancient human genomes are valuable resources for investigating TEs in the evolutionary context, and efforts to explore ancient human genomes will potentially provide a novel perspective on the genetic mechanism of human brain evolution and inspire a variety of technological and methodological advances. In this review, we summarize computational and experimental approaches that can be adapted to identify and validate evolutionarily important TEs, especially for human brain evolution. We also highlight strategies that leverage ancient genomic data and discuss unique challenges in ancient transposon genomics.

Chu C, Borges-Monroy R, Viswanadham VV, Lee S, Li H, Lee EA**, Park PJ**. Comprehensive identification of transposable element insertions using multiple sequencing technologies [Internet]. Nature Communications 2021;12(1):3836. Publisher's VersionAbstract
Transposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at
Park J-H*, Park I*, Youm EM*, Lee S, Park J-H, Lee J, Lee DY, Byun MS, Lee JH, Yi D, Chung SJ, Park KW, Choi N, Kim SY, Yoon W, An H, Kim KW, Choi SH, Jeong JH, Kim E-J, Kang H, Lee J, Kim Y, Lee EA, Seo SW, Na DL, Kim J-W. Novel Alzheimer's disease risk variants identified based on whole-genome sequencing of APOE ε4 carriers [Internet]. Translational Psychiatry 2021;11(1):296. Publisher's VersionAbstract
Alzheimer's disease (AD) is a progressive neurodegenerative disease associated with a complex genetic etiology. Besides the apolipoprotein E ε4 (APOE ε4) allele, a few dozen other genetic loci associated with AD have been identified through genome-wide association studies (GWAS) conducted mainly in individuals of European ancestry. Recently, several GWAS performed in other ethnic groups have shown the importance of replicating studies that identify previously established risk loci and searching for novel risk loci. APOE-stratified GWAS have yielded novel AD risk loci that might be masked by, or be dependent on, APOE alleles. We performed whole-genome sequencing (WGS) on DNA from blood samples of 331 AD patients and 169 elderly controls of Korean ethnicity who were APOE ε4 carriers. Based on WGS data, we designed a customized AD chip (cAD chip) for further analysis on an independent set of 543 AD patients and 894 elderly controls of the same ethnicity, regardless of their APOE ε4 allele status. Combined analysis of WGS and cAD chip data revealed that SNPs rs1890078 (P = 6.64E-07) and rs12594991 (P = 2.03E-07) in SORCS1 and CHD2 genes, respectively, are novel genetic variants among APOE ε4 carriers in the Korean population. In addition, nine possible novel variants that were rare in individuals of European ancestry but common in East Asia were identified. This study demonstrates that APOE-stratified analysis is important for understanding the genetic background of AD in different populations.
Carneiro TNR, Bim LV, Buzatto VC, Galdeno V, Asprino PF, Lee EA, Galante PAF, Cerutti JM. Evidence of Cooperation between Hippo Pathway and RAS Mutation in Thyroid Carcinomas [Internet]. Cancers 2021;13(10):2306. Publisher's VersionAbstract
Thyroid cancer incidences have been steadily increasing worldwide and are projected to become the fourth leading cancer diagnosis by 2030. Improved diagnosis and prognosis predictions for this type of cancer depend on understanding its genetic bases and disease biology. RAS mutations have been found in a wide range of thyroid tumors, from benign to aggressive thyroid carcinomas. Based on that and in vivo studies, it has been suggested that RAS cooperates with other driver mutations to induce tumorigenesis. This study aims to identify genetic alterations or pathways that cooperate with the RAS mutation in the pathogenesis of thyroid cancer. From a cohort of 120 thyroid carcinomas, 11 RAS-mutated samples were identified. The samples were subjected to RNA-Sequencing analyses. The mutation analysis in our eleven RAS-positive cases uncovered that four genes that belong to the Hippo pathway were mutated. The gene expression analysis revealed that this pathway was dysregulated in the RAS-positive samples. We additionally explored the mutational status and expression profiling of 60 RAS-positive papillary thyroid carcinomas (PTC) from The Cancer Genome Atlas (TCGA) cohort. Altogether, the mutational landscape and pathway enrichment analysis (gene set enrichment analysis (GSEA) and Kyoto Encyclopedia of Genes and Genome (KEGG)) detected the Hippo pathway as dysregulated in RAS-positive thyroid carcinomas. Finally, we suggest a crosstalk between the Hippo and other signaling pathways, such as Wnt and BMP.
Kim J, Zhao B, Huang AY, Miller MB, Lodato MA, Walsh CA**, Lee EA**. APP gene copy number changes reflect exogenous contamination [Internet]. Nature 2020;584:E20–E28. Publisher's VersionAbstract
Various types of somatic mutations occur in cells of the human body and cause human diseases, including cancer and some neurological disorders1. Recently, Lee et al.2 (hereafter ‘the Lee study’) reported somatic copy number gains of the APP gene, a known risk locus for Alzheimer’s disease (AD), in 69% and 25% of neurons of AD patients and controls, respectively, and argued that the mechanism of these copy number gains was somatic integration of APP mRNA into the genome, creating what they called genomic cDNA (gencDNA). Our reanalysis of the data from the Lee study and two additional whole-exome sequencing (WES) data sets by the authors of the Lee study3 and Park et al.4 revealed evidence that APP gencDNA originates mainly from exogenous contamination by APP recombinant vectors, nested PCR products, and human and mouse mRNA, respectively, rather than from true somatic integration of endogenous APP. We further present our own single-cell whole-genome sequencing (scWGS) data that show no evidence for somatic APP retrotransposition in neurons from individuals with AD or from healthy individuals of various ages.
Chu C*, Zhao B*, Park PJ, Lee EA. Identification and Genotyping of Transposable Element Insertions from Genome Sequencing Data [Internet]. Current Protocols in Human Genetics 2020;107(1):e102 Publisher's VersionAbstract

Transposable element (TE) mobilization is a significant source of genomic variation and has been associated with various human diseases. The exponential growth of population-scale whole-genome sequencing and rapid innovations in long-read sequencing technologies provide unprecedented opportunities to study TE insertions and their functional impact in human health and disease. Identifying TE insertions, however, is challenging due to the repetitive nature of the TE sequences. Here, we review computational approaches to detecting and genotyping TE insertions using short- and long-read sequencing and discuss the strengths and weaknesses of different approaches.

Huang AY*, Li P*, Rodin RE, Kim SN, Dou Y, Kenny CJ, Akula SK, Hodge RD, Bakken TE, Miller JA, Lein ES, Park PJ, Lee EA, Walsh CA. Parallel RNA and DNA analysis after deep sequencing (PRDD-seq) reveals cell type-specific lineage patterns in human brain. Proc Natl Acad Sci U S A 2020;Abstract
Elucidating the lineage relationships among different cell types is key to understanding human brain development. Here we developed parallel RNA and DNA analysis after deep sequencing (PRDD-seq), which combines RNA analysis of neuronal cell types with analysis of nested spontaneous DNA somatic mutations as cell lineage markers, identified from joint analysis of single-cell and bulk DNA sequencing by single-cell MosaicHunter (scMH). PRDD-seq enables simultaneous reconstruction of neuronal cell type, cell lineage, and sequential neuronal formation ("birthdate") in postmortem human cerebral cortex. Analysis of two human brains showed remarkable quantitative details that relate mutation mosaic frequency to clonal patterns, confirming an early divergence of precursors for excitatory and inhibitory neurons, and an "inside-out" layer formation of excitatory neurons as seen in other species. In addition our analysis allows an estimate of excitatory neuron-restricted precursors (about 10) that generate the excitatory neurons within a cortical column. Inhibitory neurons showed complex, subtype-specific patterns of neurogenesis, including some patterns of development conserved relative to mouse, but also some aspects of primate cortical interneuron development not seen in mouse. PRDD-seq can be broadly applied to characterize cell identity and lineage from diverse archival samples with single-cell resolution and in potentially any developmental or disease condition.