A

Advanced Deeptools Platform

Comprehensive skill designed for analysis, toolkit, bigwig, conversion. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

Advanced deepTools Platform

A scientific computing skill for analyzing high-throughput sequencing data using deepTools — the suite of Python tools for processing and visualizing next-generation sequencing data, particularly ChIP-seq, ATAC-seq, and RNA-seq signal tracks. Advanced deepTools Platform helps you generate coverage tracks, create heatmaps, and perform quality control on aligned sequencing data.

When to Use This Skill

Choose Advanced deepTools Platform when:

  • Converting BAM alignments to bigWig coverage tracks
  • Creating heatmaps and profile plots around genomic features
  • Performing quality control on ChIP-seq or ATAC-seq data
  • Comparing signal intensity across multiple samples or conditions

Consider alternatives when:

  • You need peak calling (use MACS2 or MACS3)
  • You need differential binding analysis (use DiffBind)
  • You need genome assembly (use SPAdes or similar)
  • You need variant calling (use GATK or bcftools)

Quick Start

claude "Generate a coverage bigWig and create a heatmap around TSS regions"
# Convert BAM to normalized bigWig bamCoverage \ --bam sample.bam \ --outFileName sample.bw \ --binSize 10 \ --normalizeUsing RPKM \ --numberOfProcessors 8 \ --extendReads 200 # Compute matrix around TSS computeMatrix reference-point \ --referencePoint TSS \ --scoreFileName sample.bw \ --regionsFileName genes.bed \ --outFileName matrix.gz \ --upstream 3000 \ --downstream 3000 \ --binSize 50 # Plot heatmap plotHeatmap \ --matrixFile matrix.gz \ --outFileName heatmap.png \ --colorMap RdYlBu_r \ --sortUsing mean \ --missingDataColor white

Core Concepts

deepTools Commands

CommandPurposeInput → Output
bamCoverageBAM to bigWig signal trackBAM → bigWig
bamCompareLog2 ratio of two BAM files2 BAMs → bigWig
computeMatrixExtract signal around featuresbigWig + BED → matrix
plotHeatmapHeatmap from signal matrixmatrix → PNG/PDF
plotProfileAverage profile plotmatrix → PNG/PDF
multiBamSummaryCorrelation across samplesBAMs → npz matrix
plotCorrelationSample correlation heatmapnpz → PNG/PDF
plotFingerprintChIP enrichment QCBAMs → PNG/PDF

Normalization Methods

# RPKM — Reads Per Kilobase per Million mapped reads bamCoverage --normalizeUsing RPKM # CPM — Counts Per Million mapped reads bamCoverage --normalizeUsing CPM # BPM — Bins Per Million (like TPM for coverage) bamCoverage --normalizeUsing BPM # RPGC — Reads Per Genomic Content (1x normalization) bamCoverage --normalizeUsing RPGC --effectiveGenomeSize 2913022398

Multi-Sample Comparison

# Compare ChIP vs Input (log2 ratio) bamCompare \ --bamfile1 chip.bam \ --bamfile2 input.bam \ --outFileName chip_vs_input_log2.bw \ --operation log2 \ --normalizeUsing RPKM # Sample correlation matrix multiBamSummary bins \ --bamfiles sample1.bam sample2.bam sample3.bam \ --outFileName multisample.npz \ --binSize 1000 plotCorrelation \ --corData multisample.npz \ --plotFile correlation.png \ --corMethod pearson \ --whatToPlot heatmap

Configuration

ParameterDescriptionDefault
binSizeResolution in base pairs50
normalizeUsingNormalization methodNone
numberOfProcessorsCPU threads to use1
extendReadsExtend reads to fragment lengthFalse
effectiveGenomeSizeMappable genome size (for RPGC)Organism-specific

Best Practices

  1. Always normalize coverage tracks for comparison. Raw coverage depends on sequencing depth. Use RPKM, CPM, or RPGC normalization when comparing samples. For ChIP-seq, use bamCompare with input control for the most meaningful signal.

  2. Extend reads to fragment length. For single-end sequencing, use --extendReads 200 (or your estimated fragment length) to reconstruct the full fragment footprint. This produces smoother, more biologically meaningful coverage tracks.

  3. Use appropriate bin sizes for your analysis. Smaller bins (10bp) provide higher resolution but larger files and slower processing. For genome-wide heatmaps, 50bp bins are usually sufficient. For narrow peak analysis, use 10bp bins.

  4. Run plotFingerprint for ChIP-seq QC. Before downstream analysis, verify ChIP enrichment quality. A good ChIP sample shows a characteristic curve distinct from input — flat curves indicate poor enrichment and unreliable results.

  5. Sort heatmaps meaningfully. Use --sortUsing mean or --sortUsing max to order regions by signal intensity. This reveals patterns in the data (e.g., gene clusters with different signal levels) that random ordering obscures.

Common Issues

bigWig file is much larger than expected. Reduce bin size (larger value = smaller file) and ensure you're using normalization (normalized values compress better). Use --skipZeroOverZero to omit regions with no coverage.

Heatmap shows uniform color with no pattern. The signal range may be dominated by outliers. Set --zMin and --zMax manually to clip extreme values, or use --sortUsing mean to reveal signal patterns hidden by default sorting.

bamCoverage is extremely slow. Use --numberOfProcessors 8 (or more) for parallelization. Ensure the BAM file is sorted and indexed (samtools sort + samtools index). Remove duplicate reads before processing to reduce computation.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates