The Lee laboratory studies transposable elements and other types of genomic variations in human disease using computational genomic and bioinformatic approaches. Specifically, we develop and apply computational methods for genomic studies using next-generation sequencing and perform integrative analyses of DNA- and RNA-sequencing data.

Our research currently focuses on:

1 The role of retrotransposons in human health and disease. Retrotransposon mobilization is a significant source of human genomic variability and is causally implicated in various Mendelian disorders and complex diseases. We have reported that somatic retrotransposition frequently occurs in human cancers, and the rates are associated with TP53 mutation status and cancer immunity (Lee, Science 2012; Jung, Genome Res. 2018). We aim to define the relevance and importance of retrotransposon insertions in human health and diseases, including cancer, Mendelian disorders, and neurological disorders.

The role of somatic mutation in developmental and degenerative disorders. Our previous single-cell studies have revealed extensive somatic mosaicism in the human brain (Evrony & Cai, Cell 2012; Evrony & Lee, Neuron 2015; eLife 2016). Analysis of somatic mutation requires rigorous data analysis and validation since multiple confounding factors such as amplification artifacts and vector contamination can lead to misinterpretation (Evrony & Lee, eLife 2016; Kim, Nature 2020). We aim to analyze single-cell and other types of sequencing data and develop novel methods to detect low-clonal or single-cell unique somatic mutations and to understand how these mutations relate to aging, and developmental and degenerative disorders.


The effects of DNA variants on RNA splicing. One major pathogenic mechanism underlying human disease is disruption of RNA splicing caused by DNA mutations including retrotransposon insertions. Our analysis of cancer DNA- and RNA-seq profiles found a large number of somatic mutations that disrupted RNA splicing, highlighting intron retention as a common yet underappreciated mechanism of tumor suppressor inactivation (Jung, Nat. Gen. 2015). We also reported splice-altering somatic L1 insertions in cancers (Jung, Genome Res. 2018). Our goal is to systematically characterize the effects of pathogenic DNA variants, especially non-coding variants, on RNA splicing.


We aim to address the following major biomedical questions:


What are the causal genomic variants for genetic disorders of unknown etiology?

Advances in human genetics and genomics have uncovered causal variants for many hereditary human diseases. However, no link to causal variants has yet been identified for a significant fraction of Mendelian diseases. We explore causal genetic variants for genetics disorders with unknown etiology by investigating their genomic and transcriptomic data with a special focus on non-canonical types of variants including 1) non-coding variants, especially those associated with repetitive sequences such as transposable elements and tandem repeats, 2) genomic variants causing splicing aberrations, and 3) mosaic variants with low variant allele frequency. These types of variants cannot be detected through typical variant analyses.  We have developed effective computational methods and obtained in-depth understanding of existing tools so as to be able to study these types of variants. For example, the Tea (Transposable Element Analyzer) methods we developed to study somatic retrotransposition in cancer and single-neuronal genomes (Lee et al., Science, 2012; Evrony and Lee et al., Neuron, 2015) have been evolving to accommodate advances in genomic and computing technologies to study various genetic disorders. We have also developed a computational method to detect and predict splicing-disrupting somatic mutations, including synonymous ones (Jung et al., Nature Genetics, 2015).


How often do somatic mutations occur and what are their roles in developmental and degenerative disorders?

Somatic mutations have been studied most extensively in cancer, but they also cause neurodevelopmental disorders. Our previous studies of single-neuronal genomes in postmortem human brains have revealed abundant somatic mutations, such as mobilization of transposable elments and variation in short tandem repeats, suggesting their potential role in neurological disorders. Our studies have not only demonstrated great promise of single-cell genomics for studying somatic mutation but also highlighted the importance of rigorous data analysis and computational expertise to address technical artifacts in the data. We continue to investigate somatic mutations in several neurological disorders and other conditions in close collaboration with clinical and experimental scientists.