Master LatchBio Suite

Build and deploy bioinformatics workflows on the Latch platform using the Latch SDK. This skill covers workflow definition with Flyte tasks, container configuration, data management through Latch Data, and deployment of reproducible computational biology pipelines accessible through an auto-generated web interface.

When to Use This Skill

Choose Master LatchBio Suite when you need to:

Deploy bioinformatics pipelines with auto-generated web UIs for non-programmers
Run containerized workflows on managed cloud compute without infrastructure setup
Share reproducible analysis workflows across a research team or organization
Process large genomics datasets (FASTQ, BAM, VCF) with scalable cloud resources

Consider alternatives when:

You need a general-purpose workflow orchestrator (use Nextflow or Snakemake)
You need on-premises pipeline execution (use CWL or WDL runners)
You need real-time data streaming rather than batch processing (use Apache Kafka)

Quick Start


# Install Latch SDK
pip install latch

# Initialize a new workflow
latch init my-alignment-workflow
cd my-alignment-workflow


# wf/__init__.py
from latch import workflow, small_task
from latch.types import LatchFile, LatchDir
from enum import Enum

class Aligner(Enum):
    BWA = "bwa"
    BOWTIE2 = "bowtie2"

@small_task
def align_reads(
    reads: LatchFile,
    reference: LatchFile,
    aligner: Aligner = Aligner.BWA,
    threads: int = 4
) -> LatchFile:
    """Align sequencing reads to a reference genome."""
    import subprocess

    output = "/root/aligned.bam"

    if aligner == Aligner.BWA:
        subprocess.run([
            "bwa", "mem",
            "-t", str(threads),
            reference.local_path,
            reads.local_path
        ], stdout=open(output, "w"), check=True)

    return LatchFile(output, f"latch:///results/aligned.bam")

@workflow
def alignment_workflow(
    reads: LatchFile,
    reference: LatchFile,
    aligner: Aligner = Aligner.BWA
) -> LatchFile:
    """Align reads to reference genome.

    This workflow takes FASTQ reads and a reference genome,
    performs alignment, and returns a sorted BAM file.
    """
    return align_reads(reads=reads, reference=reference, aligner=aligner)


# Register and deploy
latch register --remote my-alignment-workflow

Core Concepts

SDK Components

Component	Purpose	Example
`@small_task`	Task on 2 CPU / 4 GB RAM	Quality control, file parsing
`@medium_task`	Task on 8 CPU / 32 GB RAM	Read alignment, variant calling
`@large_task`	Task on 31 CPU / 120 GB RAM	Genome assembly, large-scale analysis
`@workflow`	Orchestrates tasks into DAG	Full analysis pipeline
`LatchFile`	Single file reference	FASTQ, BAM, VCF
`LatchDir`	Directory reference	Multi-file outputs

Multi-Step Pipeline


from latch import workflow, small_task, medium_task
from latch.types import LatchFile, LatchDir

@small_task
def quality_control(reads: LatchFile) -> LatchFile:
    """Run FastQC on input reads."""
    import subprocess
    subprocess.run(["fastqc", reads.local_path, "-o", "/root/qc/"], check=True)
    return LatchFile("/root/qc/", "latch:///results/qc/")

@medium_task
def align_and_sort(reads: LatchFile, reference: LatchFile) -> LatchFile:
    """Align reads and sort the output BAM."""
    import subprocess
    subprocess.run(
        f"bwa mem -t 8 {reference.local_path} {reads.local_path} "
        f"| samtools sort -@ 4 -o /root/sorted.bam",
        shell=True, check=True
    )
    subprocess.run(["samtools", "index", "/root/sorted.bam"], check=True)
    return LatchFile("/root/sorted.bam", "latch:///results/sorted.bam")

@small_task
def call_variants(bam: LatchFile, reference: LatchFile) -> LatchFile:
    """Call variants using bcftools."""
    import subprocess
    subprocess.run(
        f"bcftools mpileup -f {reference.local_path} {bam.local_path} "
        f"| bcftools call -mv -o /root/variants.vcf",
        shell=True, check=True
    )
    return LatchFile("/root/variants.vcf", "latch:///results/variants.vcf")

@workflow
def variant_calling_pipeline(
    reads: LatchFile,
    reference: LatchFile
) -> LatchFile:
    """Complete variant calling pipeline: QC → Align → Call."""
    quality_control(reads=reads)
    bam = align_and_sort(reads=reads, reference=reference)
    return call_variants(bam=bam, reference=reference)

Configuration

Parameter	Description	Default
`task_size`	Compute resources (small/medium/large/gpu)	`small`
`dockerfile`	Custom Dockerfile for dependencies	Auto-generated
`latch_data_path`	Output path in Latch Data	`"latch:///"`
`timeout`	Maximum task execution time	`7200` (2 hours)
`retries`	Number of retry attempts on failure	`0`
`cache_version`	Version string for task caching	`"v1"`

Best Practices

Choose the right task size — Start with @small_task and only upgrade to @medium_task or @large_task when your process actually needs more resources. Oversized tasks waste compute credits and queue behind other large jobs.
Pin all software versions in Dockerfile — Specify exact versions for every tool (samtools==1.17, bwa==0.7.17) in your Dockerfile. Unpinned versions cause silent result differences when tools auto-update between deployments.
Use type annotations for the web UI — Latch auto-generates the web interface from your function signatures. Use Enum for dropdown menus, int with defaults for number inputs, and docstrings for parameter descriptions.
Store intermediate files in /root — Write temporary and intermediate files to /root/ within tasks, not to /tmp/. Latch tasks use /root as the writable workspace, and /tmp may have size limits on some instance types.
Test locally before registering — Run latch local-execute to test your workflow with small data before deploying to the cloud. This catches import errors, missing dependencies, and logic bugs without consuming cloud compute time.

Common Issues

Registration fails with Docker build errors — The most common cause is missing system dependencies in the Dockerfile. If your Python package requires C libraries (e.g., htslib for pysam), add RUN apt-get install -y libhts-dev to your Dockerfile before the pip install step.

LatchFile not found at runtime — Latch files are downloaded lazily when you access .local_path. If you construct the file path as a string instead of using the .local_path property, the file won't be downloaded. Always access data through the LatchFile object, never by constructing paths manually.

Workflow runs succeed but outputs are empty — This happens when the LatchFile return path doesn't match where the tool actually wrote its output. Print the working directory and list files in your task to debug: os.listdir("/root/"). Ensure the local path in LatchFile(local_path, remote_path) matches the actual output location.

⚠️ Loading Issue

Master Latchbio Suite

Master LatchBio Suite

When to Use This Skill

Quick Start

Core Concepts

SDK Components

Multi-Step Pipeline

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace