Ultimate Gget Framework
Battle-tested skill for python, toolkit, rapid, bioinformatics. Includes structured workflows, validation checks, and reusable patterns for scientific.
Ultimate gget Framework
A scientific computing skill for querying genomic databases using gget — the Python package that provides a simple interface to query Ensembl, UniProt, NCBI, PDB, and other biological databases directly from Python or the command line without complex API setup.
When to Use This Skill
Choose Ultimate gget Framework when:
- Quickly looking up gene/protein information without setting up database APIs
- Fetching sequences, annotations, or structures by gene name or ID
- Running BLAST searches programmatically
- Performing enrichment analysis on gene lists
Consider alternatives when:
- You need complex, filtered database queries (use specific database APIs)
- You need bulk data downloads (use BioMart or FTP)
- You need real-time database monitoring (use database-specific tools)
- You need single-cell data (use CellxGene Census)
Quick Start
claude "Look up the TP53 gene and get its protein structure with gget"
import gget # Search for a gene results = gget.search(["TP53"], species="homo_sapiens") print(results[["ensembl_id", "gene_name", "description", "biotype"]]) # Get detailed gene info info = gget.info(["ENSG00000141510"]) print(f"Symbol: {info['gene_name'].values[0]}") print(f"Location: chr{info['chromosome'].values[0]}:{info['start'].values[0]}-{info['end'].values[0]}") # Get protein sequence seq = gget.seq("ENSG00000141510", translate=True) print(f"Protein sequence length: {len(seq['sequence'].values[0])} aa") # Predict structure with ESMFold structure = gget.alphafold("ENSP00000269305") # Returns predicted PDB structure # Run BLAST blast_results = gget.blast(seq["sequence"].values[0][:100]) print(blast_results[["scientific_name", "percent_identity", "e_value"]].head())
Core Concepts
gget Functions
| Function | Database | Purpose |
|---|---|---|
gget.search() | Ensembl | Find genes by keyword |
gget.info() | Ensembl | Detailed gene/transcript info |
gget.seq() | Ensembl | Get DNA/protein sequences |
gget.blast() | NCBI BLAST | Sequence similarity search |
gget.alphafold() | AlphaFold DB | Get predicted structures |
gget.enrichr() | Enrichr | Gene set enrichment analysis |
gget.archs4() | ARCHS4 | Gene expression correlations |
gget.pdb() | RCSB PDB | Query protein structures |
gget.muscle() | MUSCLE | Multiple sequence alignment |
Gene Set Enrichment
# Enrichment analysis with Enrichr gene_list = ["TP53", "BRCA1", "MDM2", "CDKN2A", "RB1", "ATM", "CHEK2", "PTEN", "APC", "VHL"] enrichment = gget.enrichr( genes=gene_list, database="KEGG_2021_Human" ) print("Top enriched pathways:") print(enrichment[["Term", "Adjusted P-value", "Genes"]].head(10))
Cross-Database Lookups
# Gene → Protein → Structure pipeline gene_id = gget.search(["insulin"], species="homo_sapiens") ensembl_id = gene_id["ensembl_id"].values[0] # Get protein info info = gget.info([ensembl_id]) # Get protein sequence protein_seq = gget.seq(ensembl_id, translate=True) # Find PDB structures pdb_results = gget.pdb(ensembl_id) if pdb_results is not None: print(f"PDB structures: {len(pdb_results)}")
Configuration
| Parameter | Description | Default |
|---|---|---|
species | Target organism | homo_sapiens |
ensembl_release | Ensembl version to query | Latest |
translate | Return protein instead of DNA | False |
database | Enrichr library to use | KEGG_2021_Human |
json | Return JSON instead of DataFrame | False |
Best Practices
-
Use Ensembl IDs for precision. Gene symbols can be ambiguous across species. When you find a gene with
gget.search(), use the returned Ensembl ID for subsequent queries to avoid mismatches. -
Combine gget functions for research workflows. Chain
search → info → seq → blastorsearch → enrichrto build end-to-end analysis pipelines. Each function's output feeds naturally into the next. -
Cache results for reproducibility. gget queries live databases that update regularly. Save important results to local files with timestamps so you can reproduce your analysis even if the database content changes.
-
Use
gget.enrichr()with multiple databases. Don't rely on a single enrichment database. Run enrichment against KEGG, GO, Reactome, and disease databases to get a comprehensive functional picture. -
Check gget version compatibility. gget's API may change between versions. Pin the version in your requirements and check the changelog when upgrading to ensure backward compatibility.
Common Issues
gget.search() returns no results. The search term may not match Ensembl's naming. Try alternative names, gene symbols, or descriptions. Also verify the species parameter matches Ensembl's naming convention (e.g., homo_sapiens not human).
gget.alphafold() fails for a valid protein. Not all proteins have AlphaFold predictions. The protein must be in the AlphaFold database with a valid UniProt accession. Use gget.pdb() as an alternative for experimental structures.
gget.blast() times out on long sequences. NCBI BLAST has query length limits and may time out for very long sequences or during high-traffic periods. Split long sequences or reduce the database scope. Add retry logic for intermittent failures.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.