M

Master Latchbio Suite

Comprehensive skill designed for latch, platform, bioinformatics, workflows. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

Master LatchBio Suite

Build and deploy bioinformatics workflows on the Latch platform using the Latch SDK. This skill covers workflow definition with Flyte tasks, container configuration, data management through Latch Data, and deployment of reproducible computational biology pipelines accessible through an auto-generated web interface.

When to Use This Skill

Choose Master LatchBio Suite when you need to:

  • Deploy bioinformatics pipelines with auto-generated web UIs for non-programmers
  • Run containerized workflows on managed cloud compute without infrastructure setup
  • Share reproducible analysis workflows across a research team or organization
  • Process large genomics datasets (FASTQ, BAM, VCF) with scalable cloud resources

Consider alternatives when:

  • You need a general-purpose workflow orchestrator (use Nextflow or Snakemake)
  • You need on-premises pipeline execution (use CWL or WDL runners)
  • You need real-time data streaming rather than batch processing (use Apache Kafka)

Quick Start

# Install Latch SDK pip install latch # Initialize a new workflow latch init my-alignment-workflow cd my-alignment-workflow
# wf/__init__.py from latch import workflow, small_task from latch.types import LatchFile, LatchDir from enum import Enum class Aligner(Enum): BWA = "bwa" BOWTIE2 = "bowtie2" @small_task def align_reads( reads: LatchFile, reference: LatchFile, aligner: Aligner = Aligner.BWA, threads: int = 4 ) -> LatchFile: """Align sequencing reads to a reference genome.""" import subprocess output = "/root/aligned.bam" if aligner == Aligner.BWA: subprocess.run([ "bwa", "mem", "-t", str(threads), reference.local_path, reads.local_path ], stdout=open(output, "w"), check=True) return LatchFile(output, f"latch:///results/aligned.bam") @workflow def alignment_workflow( reads: LatchFile, reference: LatchFile, aligner: Aligner = Aligner.BWA ) -> LatchFile: """Align reads to reference genome. This workflow takes FASTQ reads and a reference genome, performs alignment, and returns a sorted BAM file. """ return align_reads(reads=reads, reference=reference, aligner=aligner)
# Register and deploy latch register --remote my-alignment-workflow

Core Concepts

SDK Components

ComponentPurposeExample
@small_taskTask on 2 CPU / 4 GB RAMQuality control, file parsing
@medium_taskTask on 8 CPU / 32 GB RAMRead alignment, variant calling
@large_taskTask on 31 CPU / 120 GB RAMGenome assembly, large-scale analysis
@workflowOrchestrates tasks into DAGFull analysis pipeline
LatchFileSingle file referenceFASTQ, BAM, VCF
LatchDirDirectory referenceMulti-file outputs

Multi-Step Pipeline

from latch import workflow, small_task, medium_task from latch.types import LatchFile, LatchDir @small_task def quality_control(reads: LatchFile) -> LatchFile: """Run FastQC on input reads.""" import subprocess subprocess.run(["fastqc", reads.local_path, "-o", "/root/qc/"], check=True) return LatchFile("/root/qc/", "latch:///results/qc/") @medium_task def align_and_sort(reads: LatchFile, reference: LatchFile) -> LatchFile: """Align reads and sort the output BAM.""" import subprocess subprocess.run( f"bwa mem -t 8 {reference.local_path} {reads.local_path} " f"| samtools sort -@ 4 -o /root/sorted.bam", shell=True, check=True ) subprocess.run(["samtools", "index", "/root/sorted.bam"], check=True) return LatchFile("/root/sorted.bam", "latch:///results/sorted.bam") @small_task def call_variants(bam: LatchFile, reference: LatchFile) -> LatchFile: """Call variants using bcftools.""" import subprocess subprocess.run( f"bcftools mpileup -f {reference.local_path} {bam.local_path} " f"| bcftools call -mv -o /root/variants.vcf", shell=True, check=True ) return LatchFile("/root/variants.vcf", "latch:///results/variants.vcf") @workflow def variant_calling_pipeline( reads: LatchFile, reference: LatchFile ) -> LatchFile: """Complete variant calling pipeline: QC → Align → Call.""" quality_control(reads=reads) bam = align_and_sort(reads=reads, reference=reference) return call_variants(bam=bam, reference=reference)

Configuration

ParameterDescriptionDefault
task_sizeCompute resources (small/medium/large/gpu)small
dockerfileCustom Dockerfile for dependenciesAuto-generated
latch_data_pathOutput path in Latch Data"latch:///"
timeoutMaximum task execution time7200 (2 hours)
retriesNumber of retry attempts on failure0
cache_versionVersion string for task caching"v1"

Best Practices

  1. Choose the right task size — Start with @small_task and only upgrade to @medium_task or @large_task when your process actually needs more resources. Oversized tasks waste compute credits and queue behind other large jobs.

  2. Pin all software versions in Dockerfile — Specify exact versions for every tool (samtools==1.17, bwa==0.7.17) in your Dockerfile. Unpinned versions cause silent result differences when tools auto-update between deployments.

  3. Use type annotations for the web UI — Latch auto-generates the web interface from your function signatures. Use Enum for dropdown menus, int with defaults for number inputs, and docstrings for parameter descriptions.

  4. Store intermediate files in /root — Write temporary and intermediate files to /root/ within tasks, not to /tmp/. Latch tasks use /root as the writable workspace, and /tmp may have size limits on some instance types.

  5. Test locally before registering — Run latch local-execute to test your workflow with small data before deploying to the cloud. This catches import errors, missing dependencies, and logic bugs without consuming cloud compute time.

Common Issues

Registration fails with Docker build errors — The most common cause is missing system dependencies in the Dockerfile. If your Python package requires C libraries (e.g., htslib for pysam), add RUN apt-get install -y libhts-dev to your Dockerfile before the pip install step.

LatchFile not found at runtime — Latch files are downloaded lazily when you access .local_path. If you construct the file path as a string instead of using the .local_path property, the file won't be downloaded. Always access data through the LatchFile object, never by constructing paths manually.

Workflow runs succeed but outputs are empty — This happens when the LatchFile return path doesn't match where the tool actually wrote its output. Print the working directory and list files in your task to debug: os.listdir("/root/"). Ensure the local path in LatchFile(local_path, remote_path) matches the actual output location.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates