Ensembl Database Kit
Battle-tested skill for query, ensembl, genome, database. Includes structured workflows, validation checks, and reusable patterns for scientific.
Ensembl Database Kit
A scientific computing skill for querying Ensembl — the comprehensive genome database for vertebrates and other eukaryotes maintained by EMBL-EBI. Ensembl Database Kit helps you retrieve gene annotations, transcript variants, regulatory regions, and comparative genomics data through Ensembl's REST API and BioMart interface.
When to Use This Skill
Choose Ensembl Database Kit when:
- Looking up gene coordinates, exon structures, and transcript variants
- Retrieving ortholog/paralog information across species
- Querying regulatory features (promoters, enhancers, TFBS)
- Bulk downloading gene annotations via BioMart
Consider alternatives when:
- You need raw sequencing data (use ENA or NCBI SRA)
- You need clinical variant interpretation (use ClinVar)
- You need protein function annotations (use UniProt)
- You need non-vertebrate genomes (use Ensembl Genomes or NCBI)
Quick Start
claude "Look up the BRCA1 gene and all its transcript variants in Ensembl"
import requests # Ensembl REST API server = "https://rest.ensembl.org" # Look up gene by symbol response = requests.get( f"{server}/lookup/symbol/homo_sapiens/BRCA1", headers={"Content-Type": "application/json"}, params={"expand": 1} ) gene = response.json() print(f"Gene: {gene['display_name']}") print(f"Ensembl ID: {gene['id']}") print(f"Location: {gene['seq_region_name']}:{gene['start']}-{gene['end']}") print(f"Strand: {'+' if gene['strand'] == 1 else '-'}") print(f"Biotype: {gene['biotype']}") print(f"Transcripts: {len(gene.get('Transcript', []))}") for tx in gene.get("Transcript", []): print(f" {tx['id']} | {tx['biotype']} | {tx['length']} bp")
Core Concepts
Ensembl REST API Endpoints
| Endpoint | Purpose | Example |
|---|---|---|
/lookup/id/{id} | Look up by Ensembl ID | ENSG00000012048 |
/lookup/symbol/{species}/{symbol} | Look up by gene symbol | BRCA1 |
/sequence/id/{id} | Get sequence | DNA, cDNA, protein |
/overlap/region/{species}/{region} | Features in region | Genes, transcripts, variants |
/homology/id/{id} | Orthologs/paralogs | Cross-species comparisons |
/variation/{species}/{variant} | Variant info | rsID lookup |
/regulatory/species/{species}/{id} | Regulatory features | Promoters, enhancers |
BioMart Queries
from pybiomart import Server server = Server(host="http://www.ensembl.org") dataset = server.marts["ENSEMBL_MART_ENSEMBL"].datasets["hsapiens_gene_ensembl"] # Get gene annotations results = dataset.query( attributes=[ "ensembl_gene_id", "external_gene_name", "chromosome_name", "start_position", "end_position", "strand", "gene_biotype" ], filters={"chromosome_name": ["1", "2", "3"]} ) print(f"Genes on chr1-3: {len(results)}")
Comparative Genomics
# Get orthologs across species response = requests.get( f"{server}/homology/id/ENSG00000012048", headers={"Content-Type": "application/json"}, params={ "type": "orthologues", "target_taxon": "10090" # Mouse } ) homologies = response.json()["data"][0]["homologies"] for h in homologies: target = h["target"] print(f"Ortholog: {target['species']} - {target.get('id', 'N/A')}") print(f" Percent identity: {h.get('dn_ds', 'N/A')}")
Configuration
| Parameter | Description | Default |
|---|---|---|
server | Ensembl REST API base URL | https://rest.ensembl.org |
species | Default organism | homo_sapiens |
assembly | Genome assembly version | GRCh38 |
content_type | Response format | application/json |
biomart_host | BioMart server | www.ensembl.org |
Best Practices
-
Use Ensembl stable IDs for persistent references. Ensembl gene IDs (ENSG...) are versioned and stable across releases. Use these in publications and databases rather than gene symbols, which can be ambiguous or change over time.
-
Check the Ensembl release version. Ensembl updates quarterly. Gene coordinates, annotations, and transcript models can change between releases. Note the release number when recording results for reproducibility.
-
Use BioMart for bulk queries. For genome-wide data (all genes, all transcripts), use BioMart instead of individual REST API calls. BioMart is optimized for bulk retrieval and returns tabular data suitable for analysis.
-
Rate limit REST API requests. Ensembl allows 15 requests per second. For batch lookups, add small delays or use the POST endpoint for multiple IDs in a single request.
-
Use the GRCh37 archive for legacy coordinates. Some datasets use GRCh37 (hg19) coordinates. Access the GRCh37 version at
grch37.rest.ensembl.orgrather than converting coordinates, which can introduce errors.
Common Issues
Gene symbol not found. Gene symbols are species-specific and case-sensitive. Use BRCA1 for human, Brca1 for mouse. If the symbol isn't recognized, search by Ensembl ID or use the /xrefs endpoint to find the correct symbol.
REST API returns 429 Too Many Requests. You've exceeded the rate limit. Add time.sleep(0.1) between requests, or use POST endpoints to batch multiple queries into single requests. For large-scale analyses, use BioMart.
Transcript coordinates differ between databases. Ensembl and NCBI RefSeq may annotate different transcripts for the same gene. Discrepancies in exon boundaries are common. Specify which transcript annotation source you're using and stick to one system within an analysis.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.