Advanced Deeptools Platform
Comprehensive skill designed for analysis, toolkit, bigwig, conversion. Includes structured workflows, validation checks, and reusable patterns for scientific.
Advanced deepTools Platform
A scientific computing skill for analyzing high-throughput sequencing data using deepTools — the suite of Python tools for processing and visualizing next-generation sequencing data, particularly ChIP-seq, ATAC-seq, and RNA-seq signal tracks. Advanced deepTools Platform helps you generate coverage tracks, create heatmaps, and perform quality control on aligned sequencing data.
When to Use This Skill
Choose Advanced deepTools Platform when:
- Converting BAM alignments to bigWig coverage tracks
- Creating heatmaps and profile plots around genomic features
- Performing quality control on ChIP-seq or ATAC-seq data
- Comparing signal intensity across multiple samples or conditions
Consider alternatives when:
- You need peak calling (use MACS2 or MACS3)
- You need differential binding analysis (use DiffBind)
- You need genome assembly (use SPAdes or similar)
- You need variant calling (use GATK or bcftools)
Quick Start
claude "Generate a coverage bigWig and create a heatmap around TSS regions"
# Convert BAM to normalized bigWig bamCoverage \ --bam sample.bam \ --outFileName sample.bw \ --binSize 10 \ --normalizeUsing RPKM \ --numberOfProcessors 8 \ --extendReads 200 # Compute matrix around TSS computeMatrix reference-point \ --referencePoint TSS \ --scoreFileName sample.bw \ --regionsFileName genes.bed \ --outFileName matrix.gz \ --upstream 3000 \ --downstream 3000 \ --binSize 50 # Plot heatmap plotHeatmap \ --matrixFile matrix.gz \ --outFileName heatmap.png \ --colorMap RdYlBu_r \ --sortUsing mean \ --missingDataColor white
Core Concepts
deepTools Commands
| Command | Purpose | Input → Output |
|---|---|---|
bamCoverage | BAM to bigWig signal track | BAM → bigWig |
bamCompare | Log2 ratio of two BAM files | 2 BAMs → bigWig |
computeMatrix | Extract signal around features | bigWig + BED → matrix |
plotHeatmap | Heatmap from signal matrix | matrix → PNG/PDF |
plotProfile | Average profile plot | matrix → PNG/PDF |
multiBamSummary | Correlation across samples | BAMs → npz matrix |
plotCorrelation | Sample correlation heatmap | npz → PNG/PDF |
plotFingerprint | ChIP enrichment QC | BAMs → PNG/PDF |
Normalization Methods
# RPKM — Reads Per Kilobase per Million mapped reads bamCoverage --normalizeUsing RPKM # CPM — Counts Per Million mapped reads bamCoverage --normalizeUsing CPM # BPM — Bins Per Million (like TPM for coverage) bamCoverage --normalizeUsing BPM # RPGC — Reads Per Genomic Content (1x normalization) bamCoverage --normalizeUsing RPGC --effectiveGenomeSize 2913022398
Multi-Sample Comparison
# Compare ChIP vs Input (log2 ratio) bamCompare \ --bamfile1 chip.bam \ --bamfile2 input.bam \ --outFileName chip_vs_input_log2.bw \ --operation log2 \ --normalizeUsing RPKM # Sample correlation matrix multiBamSummary bins \ --bamfiles sample1.bam sample2.bam sample3.bam \ --outFileName multisample.npz \ --binSize 1000 plotCorrelation \ --corData multisample.npz \ --plotFile correlation.png \ --corMethod pearson \ --whatToPlot heatmap
Configuration
| Parameter | Description | Default |
|---|---|---|
binSize | Resolution in base pairs | 50 |
normalizeUsing | Normalization method | None |
numberOfProcessors | CPU threads to use | 1 |
extendReads | Extend reads to fragment length | False |
effectiveGenomeSize | Mappable genome size (for RPGC) | Organism-specific |
Best Practices
-
Always normalize coverage tracks for comparison. Raw coverage depends on sequencing depth. Use RPKM, CPM, or RPGC normalization when comparing samples. For ChIP-seq, use
bamComparewith input control for the most meaningful signal. -
Extend reads to fragment length. For single-end sequencing, use
--extendReads 200(or your estimated fragment length) to reconstruct the full fragment footprint. This produces smoother, more biologically meaningful coverage tracks. -
Use appropriate bin sizes for your analysis. Smaller bins (10bp) provide higher resolution but larger files and slower processing. For genome-wide heatmaps, 50bp bins are usually sufficient. For narrow peak analysis, use 10bp bins.
-
Run plotFingerprint for ChIP-seq QC. Before downstream analysis, verify ChIP enrichment quality. A good ChIP sample shows a characteristic curve distinct from input — flat curves indicate poor enrichment and unreliable results.
-
Sort heatmaps meaningfully. Use
--sortUsing meanor--sortUsing maxto order regions by signal intensity. This reveals patterns in the data (e.g., gene clusters with different signal levels) that random ordering obscures.
Common Issues
bigWig file is much larger than expected. Reduce bin size (larger value = smaller file) and ensure you're using normalization (normalized values compress better). Use --skipZeroOverZero to omit regions with no coverage.
Heatmap shows uniform color with no pattern. The signal range may be dominated by outliers. Set --zMin and --zMax manually to clip extreme values, or use --sortUsing mean to reveal signal patterns hidden by default sorting.
bamCoverage is extremely slow. Use --numberOfProcessors 8 (or more) for parallelization. Ensure the BAM file is sorted and indexed (samtools sort + samtools index). Remove duplicate reads before processing to reduce computation.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.