Ultimate Lamindb Framework
Battle-tested skill for skill, should, used, working. Includes structured workflows, validation checks, and reusable patterns for scientific.
Ultimate LaminDB Framework
Build production-grade biological data management pipelines using LaminDB, a data framework purpose-built for biology. This skill covers data ingestion, artifact tracking, lineage management, and integration with popular bioinformatics tools for managing datasets across experiments and analyses.
When to Use This Skill
Choose Ultimate LaminDB Framework when you need to:
- Track biological data artifacts with full provenance and lineage
- Manage AnnData, FASTA, BAM, or other bioinformatics file types with metadata
- Build reproducible analysis pipelines with automatic data versioning
- Query and retrieve datasets across multiple experiments using biological ontologies
Consider alternatives when:
- You need a general-purpose data warehouse (use Snowflake or BigQuery)
- You need only file storage without biological metadata (use S3 or GCS directly)
- You need interactive data exploration without pipeline management (use CellxGene)
Quick Start
# Install LaminDB with biological extras pip install lamindb[bionty]
import lamindb as ln import bionty as bt # Initialize a new LaminDB instance ln.setup.init(storage="./my_research_data", schema="bionty") # Register a dataset import anndata as ad adata = ad.read_h5ad("my_scrnaseq.h5ad") # Create an artifact with biological metadata artifact = ln.Artifact.from_anndata( adata, description="Single-cell RNA-seq of human PBMC", key="datasets/pbmc_10k.h5ad" ) # Annotate with ontology terms cell_types = bt.CellType.from_values(adata.obs["cell_type"].unique()) artifact.cell_types.set(cell_types) artifact.save() print(f"Saved artifact: {artifact.uid}")
Core Concepts
Data Model
| Component | Purpose | Example |
|---|---|---|
Artifact | Any data object (file, array, DataFrame) | H5AD file, CSV, FASTA |
Collection | Grouped artifacts with shared context | All samples from one experiment |
Transform | Code that creates or modifies artifacts | Jupyter notebook, Python script |
Run | Single execution of a transform | Analysis run on 2024-01-15 |
Feature | Measured variable or annotation column | Gene name, cell type label |
ULabel | Universal label for categorization | Tissue type, disease state |
Lineage Tracking
import lamindb as ln # Track a transform (analysis script) ln.track("my_analysis_v2") # Load input artifacts raw = ln.Artifact.filter(description__contains="raw counts").one() adata = raw.load() # Perform analysis import scanpy as sc sc.pp.normalize_total(adata) sc.pp.log1p(adata) sc.pp.highly_variable_genes(adata) sc.tl.pca(adata) sc.tl.umap(adata) # Save output — lineage automatically links input → transform → output processed = ln.Artifact.from_anndata( adata, description="Normalized and processed PBMC data" ) processed.save() # Query lineage print(processed.run) # Which run created this? print(processed.run.input_artifacts.all()) # What went in? print(processed.transform) # What code was used?
Querying with Biological Ontologies
import lamindb as ln import bionty as bt # Find all artifacts annotated with specific cell types t_cells = bt.CellType.filter(name__contains="T cell").all() artifacts = ln.Artifact.filter(cell_types__in=t_cells).all() # Search by tissue brain = bt.Tissue.filter(name="brain").one() brain_data = ln.Artifact.filter(tissues=brain).all() # Combine multiple filters results = ln.Artifact.filter( cell_types__name__contains="neuron", organisms__name="human", created_at__gte="2024-01-01" ).all() for r in results: print(f"{r.key}: {r.description} ({r.size} bytes)")
Configuration
| Parameter | Description | Default |
|---|---|---|
storage | Root storage path or cloud URI | Required |
schema | Schema modules to load | "bionty" |
name | Instance name | Directory name |
db | Database backend (SQLite or Postgres) | "sqlite" |
cache_dir | Local cache for cloud artifacts | "~/.cache/lamindb" |
auto_connect | Connect to instance on import | true |
Best Practices
-
Use descriptive artifact keys — Organize artifacts with meaningful path-like keys (
projects/pbmc/raw/sample_001.h5ad) rather than generic names. This makes browsing the data store intuitive and enables prefix-based queries. -
Annotate with ontology terms early — Attach cell type, tissue, organism, and disease annotations when you first register an artifact. Retroactively annotating hundreds of datasets is tedious and error-prone.
-
Track every analysis step — Call
ln.track()at the start of every notebook or script. This automatically records lineage so you can trace any result back to its raw data and code, which is critical for reproducibility. -
Version artifacts instead of overwriting — When re-processing data, save as a new artifact version rather than overwriting. LaminDB's versioning lets you compare results across processing iterations and roll back if needed.
-
Use Collections for experiment groups — Group related artifacts into Collections (e.g., all samples from one sequencing run) to simplify batch queries and downstream analysis pipelines.
Common Issues
Schema validation errors on save — When saving artifacts with biological annotations, all ontology terms must be registered in the current instance. Run bt.CellType.from_values(terms) to auto-register missing terms before calling artifact.save(), or use ln.save(terms) to bulk-register.
Storage permission errors with cloud backends — When using S3 or GCS storage, ensure your credentials have both read and write access to the bucket. LaminDB writes metadata to the database and files to storage simultaneously — partial permissions cause silent failures where metadata exists but files are missing.
Slow queries on large instances — SQLite backends slow down with more than 100,000 artifacts. Switch to PostgreSQL for production instances using ln.setup.init(storage="s3://bucket", db="postgresql://user:pass@host/db") and add indexes on frequently queried fields.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.