Publications

2020

Kim J, Zhao B, Huang AY, Miller MB, Lodato MA, Walsh** CA, Lee** EA. APP gene copy number changes reflect exogenous contamination. Nature. 2020;584:E20–E28.

Publisher's Version

Various types of somatic mutations occur in cells of the human body and cause human diseases, including cancer and some neurological disorders¹. Recently, Lee et al.² (hereafter ‘the Lee study’) reported somatic copy number gains of the APP gene, a known risk locus for Alzheimer’s disease (AD), in 69% and 25% of neurons of AD patients and controls, respectively, and argued that the mechanism of these copy number gains was somatic integration of APP mRNA into the genome, creating what they called genomic cDNA (gencDNA). Our reanalysis of the data from the Lee study and two additional whole-exome sequencing (WES) data sets by the authors of the Lee study³ and Park et al.⁴ revealed evidence that APP gencDNA originates mainly from exogenous contamination by APP recombinant vectors, nested PCR products, and human and mouse mRNA, respectively, rather than from true somatic integration of endogenous APP. We further present our own single-cell whole-genome sequencing (scWGS) data that show no evidence for somatic APP retrotransposition in neurons from individuals with AD or from healthy individuals of various ages.

Chu* C, Zhao* B, Park P, Lee EA. Identification and Genotyping of Transposable Element Insertions from Genome Sequencing Data. Current Protocols in Human Genetics (review). 2020;107(1):e102.

Publisher's Version

Transposable element (TE) mobilization is a significant source of genomic variation and has been associated with various human diseases. The exponential growth of population-scale whole-genome sequencing and rapid innovations in long-read sequencing technologies provide unprecedented opportunities to study TE insertions and their functional impact in human health and disease. Identifying TE insertions, however, is challenging due to the repetitive nature of the TE sequences. Here, we review computational approaches to detecting and genotyping TE insertions using short- and long-read sequencing and discuss the strengths and weaknesses of different approaches.

Huang* AY, Li* P, Rodin R, Kim S, Dou Y, Kenny C, Akula S, Hodge R, Bakken T, Miller J, Lein E, Park P, Lee EA, Walsh C. Parallel RNA and DNA analysis after deep sequencing (PRDD-seq) reveals cell type-specific lineage patterns in human brain. Proc Natl Acad Sci U S A. 2020;117(25):13886–13895.

Publisher's Version

Elucidating the lineage relationships among different cell types is key to understanding human brain development. Here we developed parallel RNA and DNA analysis after deep sequencing (PRDD-seq), which combines RNA analysis of neuronal cell types with analysis of nested spontaneous DNA somatic mutations as cell lineage markers, identified from joint analysis of single-cell and bulk DNA sequencing by single-cell MosaicHunter (scMH). PRDD-seq enables simultaneous reconstruction of neuronal cell type, cell lineage, and sequential neuronal formation ("birthdate") in postmortem human cerebral cortex. Analysis of two human brains showed remarkable quantitative details that relate mutation mosaic frequency to clonal patterns, confirming an early divergence of precursors for excitatory and inhibitory neurons, and an "inside-out" layer formation of excitatory neurons as seen in other species. In addition our analysis allows an estimate of excitatory neuron-restricted precursors (about 10) that generate the excitatory neurons within a cortical column. Inhibitory neurons showed complex, subtype-specific patterns of neurogenesis, including some patterns of development conserved relative to mouse, but also some aspects of primate cortical interneuron development not seen in mouse. PRDD-seq can be broadly applied to characterize cell identity and lineage from diverse archival samples with single-cell resolution and in potentially any developmental or disease condition.

Jain D, Chu C, Alver BH, Lee S, Lee EA, Park P. HiTea: a computational pipeline to identify non-reference transposable element insertions in Hi-C data. Bioinformatics. 2020;btaa923.

Publisher's Version

Hi-C is a common technique for assessing three-dimensional chromatin conformation. Recent studies have shown that long-range interaction information in Hi-C data can be used to generate chromosome-length genome assemblies and identify large-scale structural variations. Here, we demonstrate the use of Hi-C data in detecting mobile transposable element (TE) insertions genome-wide. Our pipeline HiTea (Hi-C based Transposable element analyzer) capitalizes on clipped Hi-C reads and is aided by a high proportion of discordant read pairs in Hi-C data to detect insertions of three major families of active human TEs. Despite the uneven genome coverage in Hi-C data, HiTea is competitive with the existing callers based on whole genome sequencing (WGS) data and can supplement the WGS-based characterization of the TE insertion landscape. We employ the pipeline to identify TE insertions from human cell-line Hi-C samples. HiTea is available at https://github.com/parklab/HiTea and as a Docker image. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Rodriguez-Martin B, Alvarez E, Baez-Ortega A, . , Lee EA, . , PCAWG-Structural-Variation-Working-Group, Campbell P, Tubio J, PCAWG-Consortium. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat Genet. 2020;52(3):306–319.

Publisher's Version

About half of all cancers have somatic integrations of retrotransposons. Here, to characterize their role in oncogenesis, we analyzed the patterns and mechanisms of somatic retrotransposition in 2,954 cancer genomes from 38 histological cancer subtypes within the framework of the Pan-Cancer Analysis of Whole Genomes (PCAWG) project. We identified 19,166 somatically acquired retrotransposition events, which affected 35% of samples and spanned a range of event types. Long interspersed nuclear element (LINE-1; L1 hereafter) insertions emerged as the first most frequent type of somatic structural variation in esophageal adenocarcinoma, and the second most frequent in head-and-neck and colorectal cancers. Aberrant L1 integrations can delete megabase-scale regions of a chromosome, which sometimes leads to the removal of tumor-suppressor genes, and can induce complex translocations and large-scale duplications. Somatic retrotranspositions can also initiate breakage-fusion-bridge cycles, leading to high-level amplification of oncogenes. These observations illuminate a relevant role of L1 retrotransposition in remodeling the cancer genome, with potential implications for the development of human tumors.

2019

Zhang Y, Yang L, Kucherlapati M, Hadjipanayis A, Pantazi A, Bristow C, Lee EA, Mahadeshwar H, Tang J, Zhang J, Seth S, Lee S, Ren X, Song X, Sun H, Seidman J, Luquette L, Xi R, Chin L, Protopopov A, Park P, Kucherlapati R, Creighton C. Global impact of somatic structural variation on the DNA methylome of human cancers. Genome Biol. 2019;20(1):209.

Publisher's Version

BACKGROUND: Genomic rearrangements exert a heavy influence on the molecular landscape of cancer. New analytical approaches integrating somatic structural variants (SSVs) with altered gene features represent a framework by which we can assign global significance to a core set of genes, analogous to established methods that identify genes non-randomly targeted by somatic mutation or copy number alteration. While recent studies have defined broad patterns of association involving gene transcription and nearby SSV breakpoints, global alterations in DNA methylation in the context of SSVs remain largely unexplored. RESULTS: By data integration of whole genome sequencing, RNA sequencing, and DNA methylation arrays from more than 1400 human cancers, we identify hundreds of genes and associated CpG islands (CGIs) for which the nearby presence of a somatic structural variant (SSV) breakpoint is recurrently associated with altered expression or DNA methylation, respectively, independently of copy number alterations. CGIs with SSV-associated increased methylation are predominantly promoter-associated, while CGIs with SSV-associated decreased methylation are enriched for gene body CGIs. Rearrangement of genomic regions normally having higher or lower methylation is often involved in SSV-associated CGI methylation alterations. Across cancers, the overall structural variation burden is associated with a global decrease in methylation, increased expression in methyltransferase genes and DNA damage response genes, and decreased immune cell infiltration. CONCLUSION: Genomic rearrangement appears to have a major role in shaping the cancer DNA methylome, to be considered alongside commonly accepted mechanisms including histone modifications and disruption of DNA methyltransferases.

Kim J, Hu C, Moufawad El Achkar C, Black LE, Douville J, Larson A, Pendergast MK, Goldkind SF, Lee EA, Kuniholm A, Soucy A, Vaze J, Belur NR, Fredriksen K, Stojkovska I, Tsytsykova A, Armant M, DiDonato RL, Choi J, Cornelissen L, Pereira LM, Augustine EF, Genetti CA, Dies K, Barton B, Williams L, Goodlett BD, Riley BL, Pasternak A, Berry ER, Pflock KA, Chu S, Reed C, Tyndall K, Agrawal PB, Beggs AH, Grant PE, Urion DK, Snyder RO, Waisbren SE, Poduri A, Park, Patterson A, Biffi A, Mazzulli JR, Bodamer O, Berde CB, Yu TW. Patient-Customized Oligonucleotide Therapy for a Rare Genetic Disease. N Engl J Med. 2019;381:1644–1652.

Publisher's Version

Genome sequencing is often pivotal in the diagnosis of rare diseases, but many of these conditions lack specific treatments. We describe how molecular diagnosis of a rare, fatal neurodegenerative condition led to the rational design, testing, and manufacture of milasen, a splice-modulating antisense oligonucleotide drug tailored to a particular patient. Proof-of-concept experiments in cell lines from the patient served as the basis for launching an "N-of-1" study of milasen within 1 year after first contact with the patient. There were no serious adverse events, and treatment was associated with objective reduction in seizures (determined by electroencephalography and parental reporting). This study offers a possible template for the rapid development of patient-customized treatments. (Funded by Mila's Miracle Foundation and others.).

Yang L, Wang S, Lee JJ, Lee S, Lee E, Shinbrot E, Wheeler DA, Kucherlapati R, Park. An enhanced genetic model of colorectal cancer progression history. Genome Biol. 2019;20:168.

Publisher's Version

BACKGROUND: The classical genetic model of colorectal cancer presents APC mutations as the earliest genomic alterations, followed by KRAS and TP53 mutations. However, the timing and relative order of clonal expansion and other types of genomic alterations, such as genomic rearrangements, are still unclear. RESULTS: Here, we perform comprehensive bioinformatic analysis to dissect the relative timing of somatic genetic alterations in 63 colorectal cancers with whole-genome sequencing data. Utilizing allele fractions of somatic single nucleotide variants as molecular clocks while accounting for the presence of copy number changes and structural alterations, we identify key events in the evolution of colorectal tumors. We find that driver point mutations, gene fusions, and arm-level copy losses typically arise early in tumorigenesis; different mechanisms act on distinct genomic regions to drive DNA copy changes; and chromothripsis-clustered rearrangements previously thought to occur as a single catastrophic event-is frequent and may occur multiple times independently in the same tumor through different mechanisms. Furthermore, our computational approach reveals that, in contrast to recent studies, selection is often present on subclones and that multiple evolutionary models can operate in a single tumor at different stages. CONCLUSION: Combining these results, we present a refined tumor progression model which significantly expands our understanding of the tumorigenic process of human colorectal cancer.

2018

Kang Y, Nam SH, Park KS, Kim JW, Lee EA, Ko JM, Lee KA, Park I. DeviCNV: Detection and Visualization of Exon-Level Copy Number Variants in Targeted Next Generation Sequencing Data. BMC Bioinformatics. 2018;19(381).

Publisher's Version

Background

Targeted next-generation sequencing (NGS) is increasingly being adopted in clinical laboratories for genomic diagnostic tests.

Results

We developed a new computational method, DeviCNV, intended for the detection of exon-level copy number variants (CNVs) in targeted NGS data. DeviCNV builds linear regression models with bootstrapping for every probe to capture the relationship between read depth of an individual probe and the median of read depth values of all probes in the sample. From the regression models, it estimates the read depth ratio of the observed and predicted read depth with confidence interval for each probe which is applied to a circular binary segmentation (CBS) algorithm to obtain CNV candidates. Then, it assigns confidence scores to those candidates based on the reliability and strength of the CNV signals inferred from the read depth ratios of the probes within them. Finally, it also provides gene-centric plots with confidence levels of CNV candidates for visual inspection. We applied DeviCNV to targeted NGS data generated for newborn screening and demonstrated its ability to detect novel pathogenic CNVs from clinical samples.

Conclusions

We propose a new pragmatic method for detecting CNVs in targeted NGS data with an intuitive visualization and a systematic method to assign confidence scores for candidate CNVs. Since DeviCNV was developed for use in clinical diagnosis, sensitivity is increased by the detection of exon-level CNVs.

Zhang Y, Yang L, Kucherlapati M, Chen F, Hadjipanayis A, Pantazi A, Bristow CA, Lee EA, . , Creighton CJ. A Pan-Cancer Compendium of Genes Deregulated by Somatic Genomic Rearrangement across More Than 1,400 Cases. Cell Reports. 2018;10(24(2):515–527.

Publisher's Version

A systematic cataloguing of genes impacted by genomic rearrangement, using multiple patient cohorts and cancer types, can provide insight into cancer-relevant alterations outside of exome boundaries. By integrative analysis of whole genome sequencing (predominantly low pass) and gene expression data from 1448 cancers involving 18 histopathological types in The Cancer Genome Atlas, we identified hundreds of genes for which the nearby presence (within 100kb) of a somatic Structural Variant (SV) breakpoint was associated with altered expression. While genomic rearrangements were associated with widespread copy number alteration (CNA) patterns, approximately 1100 genes—including over-expressed cancer driver genes (e.g. TERT, ERBB2, CDK12, CDK4) and under-expressed tumor suppressors (e.g. TP53, RB1, PTEN, STK11)—showed SV-associated deregulation independent of CNA. SVs associated with disruption of topologically-associated domains, enhancer hijacking, or fusion transcripts were implicated in gene up-regulation patterns. For cancer-relevant pathways, SVs considerably extended upon how genes are impacted, beyond point mutation or CNA.