Publications

2020

Rodriguez-Martin B, Alvarez E, Baez-Ortega A, . , Lee EA, . , PCAWG-Structural-Variation-Working-Group, Campbell P, Tubio J, PCAWG-Consortium. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat Genet. 2020;52(3):306–319.

About half of all cancers have somatic integrations of retrotransposons. Here, to characterize their role in oncogenesis, we analyzed the patterns and mechanisms of somatic retrotransposition in 2,954 cancer genomes from 38 histological cancer subtypes within the framework of the Pan-Cancer Analysis of Whole Genomes (PCAWG) project. We identified 19,166 somatically acquired retrotransposition events, which affected 35% of samples and spanned a range of event types. Long interspersed nuclear element (LINE-1; L1 hereafter) insertions emerged as the first most frequent type of somatic structural variation in esophageal adenocarcinoma, and the second most frequent in head-and-neck and colorectal cancers. Aberrant L1 integrations can delete megabase-scale regions of a chromosome, which sometimes leads to the removal of tumor-suppressor genes, and can induce complex translocations and large-scale duplications. Somatic retrotranspositions can also initiate breakage-fusion-bridge cycles, leading to high-level amplification of oncogenes. These observations illuminate a relevant role of L1 retrotransposition in remodeling the cancer genome, with potential implications for the development of human tumors.

2019

Zhang Y, Yang L, Kucherlapati M, Hadjipanayis A, Pantazi A, Bristow C, Lee EA, Mahadeshwar H, Tang J, Zhang J, Seth S, Lee S, Ren X, Song X, Sun H, Seidman J, Luquette L, Xi R, Chin L, Protopopov A, Park P, Kucherlapati R, Creighton C. Global impact of somatic structural variation on the DNA methylome of human cancers. Genome Biol. 2019;20(1):209.
BACKGROUND: Genomic rearrangements exert a heavy influence on the molecular landscape of cancer. New analytical approaches integrating somatic structural variants (SSVs) with altered gene features represent a framework by which we can assign global significance to a core set of genes, analogous to established methods that identify genes non-randomly targeted by somatic mutation or copy number alteration. While recent studies have defined broad patterns of association involving gene transcription and nearby SSV breakpoints, global alterations in DNA methylation in the context of SSVs remain largely unexplored. RESULTS: By data integration of whole genome sequencing, RNA sequencing, and DNA methylation arrays from more than 1400 human cancers, we identify hundreds of genes and associated CpG islands (CGIs) for which the nearby presence of a somatic structural variant (SSV) breakpoint is recurrently associated with altered expression or DNA methylation, respectively, independently of copy number alterations. CGIs with SSV-associated increased methylation are predominantly promoter-associated, while CGIs with SSV-associated decreased methylation are enriched for gene body CGIs. Rearrangement of genomic regions normally having higher or lower methylation is often involved in SSV-associated CGI methylation alterations. Across cancers, the overall structural variation burden is associated with a global decrease in methylation, increased expression in methyltransferase genes and DNA damage response genes, and decreased immune cell infiltration. CONCLUSION: Genomic rearrangement appears to have a major role in shaping the cancer DNA methylome, to be considered alongside commonly accepted mechanisms including histone modifications and disruption of DNA methyltransferases.
Kim J, Hu C, Moufawad El Achkar C, Black LE, Douville J, Larson A, Pendergast MK, Goldkind SF, Lee EA, Kuniholm A, Soucy A, Vaze J, Belur NR, Fredriksen K, Stojkovska I, Tsytsykova A, Armant M, DiDonato RL, Choi J, Cornelissen L, Pereira LM, Augustine EF, Genetti CA, Dies K, Barton B, Williams L, Goodlett BD, Riley BL, Pasternak A, Berry ER, Pflock KA, Chu S, Reed C, Tyndall K, Agrawal PB, Beggs AH, Grant PE, Urion DK, Snyder RO, Waisbren SE, Poduri A, Park, Patterson A, Biffi A, Mazzulli JR, Bodamer O, Berde CB, Yu TW. Patient-Customized Oligonucleotide Therapy for a Rare Genetic Disease. N Engl J Med. 2019;381:1644–1652.

Genome sequencing is often pivotal in the diagnosis of rare diseases, but many of these conditions lack specific treatments. We describe how molecular diagnosis of a rare, fatal neurodegenerative condition led to the rational design, testing, and manufacture of milasen, a splice-modulating antisense oligonucleotide drug tailored to a particular patient. Proof-of-concept experiments in cell lines from the patient served as the basis for launching an "N-of-1" study of milasen within 1 year after first contact with the patient. There were no serious adverse events, and treatment was associated with objective reduction in seizures (determined by electroencephalography and parental reporting). This study offers a possible template for the rapid development of patient-customized treatments. (Funded by Mila's Miracle Foundation and others.).

Yang L, Wang S, Lee JJ, Lee S, Lee E, Shinbrot E, Wheeler DA, Kucherlapati R, Park. An enhanced genetic model of colorectal cancer progression history. Genome Biol. 2019;20:168.
BACKGROUND: The classical genetic model of colorectal cancer presents APC mutations as the earliest genomic alterations, followed by KRAS and TP53 mutations. However, the timing and relative order of clonal expansion and other types of genomic alterations, such as genomic rearrangements, are still unclear. RESULTS: Here, we perform comprehensive bioinformatic analysis to dissect the relative timing of somatic genetic alterations in 63 colorectal cancers with whole-genome sequencing data. Utilizing allele fractions of somatic single nucleotide variants as molecular clocks while accounting for the presence of copy number changes and structural alterations, we identify key events in the evolution of colorectal tumors. We find that driver point mutations, gene fusions, and arm-level copy losses typically arise early in tumorigenesis; different mechanisms act on distinct genomic regions to drive DNA copy changes; and chromothripsis-clustered rearrangements previously thought to occur as a single catastrophic event-is frequent and may occur multiple times independently in the same tumor through different mechanisms. Furthermore, our computational approach reveals that, in contrast to recent studies, selection is often present on subclones and that multiple evolutionary models can operate in a single tumor at different stages. CONCLUSION: Combining these results, we present a refined tumor progression model which significantly expands our understanding of the tumorigenic process of human colorectal cancer.

2018

Kang Y, Nam SH, Park KS, Kim JW, Lee EA, Ko JM, Lee KA, Park I. DeviCNV: Detection and Visualization of Exon-Level Copy Number Variants in Targeted Next Generation Sequencing Data. BMC Bioinformatics. 2018;19(381).

Background

Targeted next-generation sequencing (NGS) is increasingly being adopted in clinical laboratories for genomic diagnostic tests.

Results

We developed a new computational method, DeviCNV, intended for the detection of exon-level copy number variants (CNVs) in targeted NGS data. DeviCNV builds linear regression models with bootstrapping for every probe to capture the relationship between read depth of an individual probe and the median of read depth values of all probes in the sample. From the regression models, it estimates the read depth ratio of the observed and predicted read depth with confidence interval for each probe which is applied to a circular binary segmentation (CBS) algorithm to obtain CNV candidates. Then, it assigns confidence scores to those candidates based on the reliability and strength of the CNV signals inferred from the read depth ratios of the probes within them. Finally, it also provides gene-centric plots with confidence levels of CNV candidates for visual inspection. We applied DeviCNV to targeted NGS data generated for newborn screening and demonstrated its ability to detect novel pathogenic CNVs from clinical samples.

Conclusions

We propose a new pragmatic method for detecting CNVs in targeted NGS data with an intuitive visualization and a systematic method to assign confidence scores for candidate CNVs. Since DeviCNV was developed for use in clinical diagnosis, sensitivity is increased by the detection of exon-level CNVs.

Zhang Y, Yang L, Kucherlapati M, Chen F, Hadjipanayis A, Pantazi A, Bristow CA, Lee EA, . , Creighton CJ. A Pan-Cancer Compendium of Genes Deregulated by Somatic Genomic Rearrangement across More Than 1,400 Cases. Cell Reports. 2018;10(24(2):515–527.

A systematic cataloguing of genes impacted by genomic rearrangement, using multiple patient cohorts and cancer types, can provide insight into cancer-relevant alterations outside of exome boundaries. By integrative analysis of whole genome sequencing (predominantly low pass) and gene expression data from 1448 cancers involving 18 histopathological types in The Cancer Genome Atlas, we identified hundreds of genes for which the nearby presence (within 100kb) of a somatic Structural Variant (SV) breakpoint was associated with altered expression. While genomic rearrangements were associated with widespread copy number alteration (CNA) patterns, approximately 1100 genes—including over-expressed cancer driver genes (e.g. TERT, ERBB2, CDK12, CDK4) and under-expressed tumor suppressors (e.g. TP53, RB1, PTEN, STK11)—showed SV-associated deregulation independent of CNA. SVs associated with disruption of topologically-associated domains, enhancer hijacking, or fusion transcripts were implicated in gene up-regulation patterns. For cancer-relevant pathways, SVs considerably extended upon how genes are impacted, beyond point mutation or CNA.

Jung H, Choi** J, Lee** EA. Immune signatures correlate with L1 retrotransposition in gastrointestinal cancers. Genome Research. 2018;28(8):1136–1146.

Long interspersed nuclear element-1 (LINE-1 or L1) retrotransposons are normally suppressed in somatic tissues mainly due to DNA methylation and antiviral defense. However, the mechanism to suppress L1s may be disrupted in cancers, thus allowing L1s to act as insertional mutagens and cause genomic rearrangement and instability. Whereas the frequency of somatic L1 insertions varies greatly among individual tumors, much remains to be learned about underlying genetic, cellular, or environmental factors. Here, we report multiple correlates of L1 activity in stomach, colorectal, and esophageal tumors through an integrative analysis of cancer whole genome and matched RNA sequencing profiles. Clinical indicators of tumor progression, such as tumor grade and patient age, showed positive association. A potential L1 expression suppressor, TP53, was mutated in tumors with frequent L1 insertions. We characterized the effects of somatic L1 insertions on mRNA splicing and expression, and demonstrated an increased risk of gene disruption in retrotransposition-prone cancers. In particular, we found that a cancer-specific L1 insertion in an exon of MOV10, a key L1 suppressor, caused exon skipping and decreased expression of the affected allele due to nonsense-mediated decay in a tumor with a high L1 insertion load. Importantly, tumors with high immune activity, for example, those associated with Epstein-Barr virus infection or microsatellite instability, tended to carry a low number of L1 insertions in genomes with high expression levels of L1 suppressors such as APOBEC3s and SAMHD1. Our results indicate that cancer immunity may contribute to genome stability by suppressing L1 retrotransposition in gastrointestinal cancers.

2017

Park J, Lee E, Park KJ, Park HD, Kim JW, Woo HI, Lee KH, Lee KT, Lee JK, Park JO, Park YS, Heo JS, Choi SH, Choi DW, Jang KT, Lee SY. Large-scale clinical validation of biomarkers for pancreatic cancer using a mass spectrometry-based proteomics approach. Oncotarget. 2017;8(26):42761–42771.

We performed an integrated analysis of proteomic and transcriptomic datasets to develop potential diagnostic markers for early pancreatic cancer. In the discovery phase, a multiple reaction monitoring assay of 90 proteins identified by either gene expression analysis or global serum proteome profiling was established and applied to 182 clinical specimens. Nine proteins (P < 0.05) were selected for the independent validation phase and quantified using stable isotope dilution-multiple reaction monitoring-mass spectrometry in 456 specimens. Of these proteins, four proteins (apolipoprotein A-IV, apolipoprotein CIII, insulin-like growth factor binding protein 2 and tissue inhibitor of metalloproteinase 1) were significantly altered in pancreatic cancer in both the discovery and validation phase (P < 0.01). Moreover, a panel including carbohydrate antigen 19-9, apolipoprotein A-IV and tissue inhibitor of metalloproteinase 1 showed better performance for distinguishing early pancreatic cancer from pancreatitis (Area under the curve = 0.934, 86% sensitivity at fixed 90% specificity) than carbohydrate antigen 19-9 alone (71% sensitivity).Overall, we present the panel of robust biomarkers for early pancreatic cancer diagnosis through bioinformatics analysis that combined transcriptomic and proteomic data as well as rigorous validation on a large number of independent clinical samples.

Lee* S, Lee* S, Park WY, Lee** EA, Park** PJ. NGSCheckMate: software for validating sample identity in next-generation sequencing studies within and across data types. Nucleic Acids Research. 2017;45(11):e103.

In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software package for verifying sample identities from FASTQ, BAM or VCF files. This tool uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms, considering depth-dependent behavior of similarity metrics for identical and unrelated samples. Our evaluation shows that NGSCheckMate is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNA-seq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth (>0.5X). An alignment-free module can be run directly on FASTQ files for a quick initial check. We recommend using this software as a QC step in NGS studies. AVAILABILITY: https://github.com/parklab/NGSCheckMate.

Zhang Y, Kwok-Shing Ng P, Kucherlapati M, Chen F, Liu Y, Tsang YH, Velasco G, Jeong KJ, Akbani R, Hadjipanayis A, Pantazi A, Bristow C, Lee E, Mahadeshwar H, Tang J, Zhang J, Yang L, Seth S, Lee S, Ren X, Song X, Sun H, Seidman J, Luquette L, Xi R, Chin L, Protopopov A, Westbrook T, Shelley CS, Choueiri T, Ittmann M, Van Waes C, Weinstein J, Liang H, Henske E, Godwin A, Park P, Kucherlapati R, Scott K, Mills G, Kwiatkowski D, Creighton C. A Pan-Cancer Proteogenomic Atlas of PI3K/AKT/mTOR Pathway Alterations. Cancer Cell. 2017;31(6):820–832.e3.

Molecular alterations involving the PI3K/AKT/mTOR pathway (including mutation, copy number, protein, or RNA) were examined across 11,219 human cancers representing 32 major types. Within specific mutated genes, frequency, mutation hotspot residues, in silico predictions, and functional assays were all informative in distinguishing the subset of genetic variants more likely to have functional relevance. Multiple oncogenic pathways including PI3K/AKT/mTOR converged on similar sets of downstream transcriptional targets. In addition to mutation, structural variations and partial copy losses involving PTEN and STK11 showed evidence for having functional relevance. A substantial fraction of cancers showed high mTOR pathway activity without an associated canonical genetic or genomic alteration, including cancers harboring IDH1 or VHL mutations, suggesting multiple mechanisms for pathway activation.