Research Areas

Integrative Genomics Viewer (IGV) Bisulfite Methylation Viewer

(A) Two views of the IGF2/H19 imprinting control region (ICR), illustrating allele-specific methylation of CTCF binding sites. The top view shows a 13-kb region of ChIP-seq histone marks from the ENCODE normal human mammary epithelial tissue cell (HMEC) line. The second view shows whole-genome bisulfite-sequencing read alignments from normal colonic mucosa, zoomed in to 75 bp. CpG dinucleotides are shown as blue (unmethylated) and red (methylated) squares. A heterozygous C/T single-nucleotide polymorphism (SNP) is also apparent, and the T allele is overwhelmingly associated with reads that have methylated CpGs (from the paternal chromosome).

(B) The enhancer region surrounding exons 2 and 3 of the B3GNTL1 gene is apparent from the ENCODE tracks showing characteristic enhancer histone marks in a normal HMEC line. The bisulfite sequencing view of the read alignments shows that this enhancer is methylated (red – lighter) in normal colon mucosa but almost completely unmethylated (blue – darker) in the matched colon tumor sample. The cancer-specific demethylation of this enhancer is consistent with the upregulation of the B3GNTL1 transcript in the tumor.

Bis-SNP: Combined DNA Methylation and SNP Calling for Bisulfite-Sequencing Data

Bisulfite treatment of DNA followed by high-throughput sequencing (bisulfite-seq) is an important method for studying DNA methylation and epigenetic gene regulation, yet current software tools do not adequately address SNPs. Identifying SNPs is important for accurate quantification of methylation levels and for identification of allele-specific epigenetic events such as imprinting. The Berman Laboratory has developed a model-based bisulfite SNP caller, Bis-SNP, that results in substantially better SNP calls than existing methods, thereby improving methylation estimates. At an average 30× genomic coverage, Bis-SNP correctly identified 96 percent of SNPs using the default high-stringency settings.

Bis-SNP Workflow

(a) Bis-SNP accepts .bam files, produced by a genome-mapping tool (BSMAP, MAQ, Novoalign, Bismark, and so on). The local realignment and base quality recalibration steps result in a new BAM with the recalibrated base quality scores. Finally, Bis-SNP performs SNP calling and outputs both methylation levels and SNP calls.

(b) The SNP calling step is performed on each genomic position independently. Differences between the reference genome and the sample genome can produce one of 10 possible allele pairs or genotype (G, only four shown here). Frequencies of all possible substitutions in the population are taken from the dbSNP database and represented as π(G). A probabilistic model that incorporates prior probabilities for methylation level and bisulfite conversion efficiency is used to calculate the probability of observing the actual bisulfite read data (D) assuming each of the 10 genotypes (Pr(G|D)). Finally, bayesian inference uses the population frequencies of each SNP to calculate the posterior likelihood Pr(D|G).

FunciSNP: An R/Bioconductor Tool Integrating Functional Noncoding Data Sets with Genetic Association Studies to Identify Candidate Regulatory SNPs

(A) Distribution of R2 values of all YAFSNPs. Each marked bin contains the total number of YAFSNPs. The sum of all the counts would total the number of correlated SNPs.

(B) Distribution of R2 values of all YAFSNPs divided by the tagSNP and by its genomic location.

(C) Histogram distribution of R2 value for all 1kgSNP extracted and overlaps Pol II. R2 values are determined by its association to the tagSNP.

(D) Scatter plot of the R2 and distance to tagSNP for all 1kgSNP extracted and overlap Pol lI.

(E) Stacked bar chart summarizing all correlated SNPs for each of the identified genomic features: exon, intron, 5UTR, 3UTR, promoter, lincRNA or in gene desert. R2 cut off at 0.5. This plot is most informative if used with an rsq value.

(F) Heatmap of the number of 1kgSNPs by relationship between tagSNP and biofeature. Total number of YAFSNPs is listed within each quadrant to represent the number of potential candidate functional SNPs overlapping a biofeature (y-axis), which are in linkage disequilibrium to the original tagSNP (x-axis).

Contact the Berman Lab

8723 Alden Dr.
Steven Spielberg Building, Room 119
Los Angeles, CA 90048