Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping
Oct 20, 2015·
,,,,,·
0 min read
X. zeng
Dr. Bo Li
R. welch
C. rojo
Y. zheng
C. n. dewey
S. keles
Abstract
Segmental duplications and other highly repetitive regions of genomes contribute significantly to cells’ regulatory programs. Advancements in next generation sequencing enabled genome-wide profiling of protein-DNA interactions by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq). However, interactions in highly repetitive regions of genomes have proven difficult to map since short reads of 50–100 base pairs (bps) from these regions map to multiple locations in reference genomes. Standard analytical methods discard such multi-mapping reads and the few that can accommodate them are prone to large false positive and negative rates. We developed Perm-seq, a prior-enhanced read allocation method for ChIP-seq experiments, that can allocate multi-mapping reads in highly repetitive regions of the genomes with high accuracy. We comprehensively evaluated Perm-seq, and found that our prior-enhanced approach significantly improves multi-read allocation accuracy over approaches that do not utilize additional data types. The statistical formalism underlying our approach facilitates supervising of multi-read allocation with a variety of data sources including histone ChIP-seq. We applied Perm-seq to 64 ENCODE ChIP-seq datasets from GM12878 and K562 cells and identified many novel protein-DNA interactions in segmental duplication regions. Our analysis reveals that although the protein-DNA interactions sites are evolutionarily less conserved in repetitive regions, they share the overall sequence characteristics of the protein-DNA interactions in non-repetitive regions.
Type
Publication
PLoS Computational Biology
Authors

Authors
Principal Scientist II
Dr. Bo Li is a Principal Scientist at Genentech, Inc. His research focuses on large-scale single-cell genomics data analysis.
Before joining in Genentech, he was an Assistant Professor of Medicine at Harvard Medical School and the director of Bioinformatics and Computational Biology at Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital.
He received his Ph.D. in computer science from UW-Madison and completed two postdoctoral trainings with Dr. Lior Pachter at UC Berkeley and Dr. Aviv Regev at Broad Institute.
He is best known for developing RSEM, an impactful RNA-seq transcript quantification software. RSEM is cited 22,602 times (Google Scholar) and adopted by several big consortia such as TCGA, ENCODE, GTEx and TOPMed.
Authors
Authors
Authors
Authors
Authors