Ultimate gget Framework

A scientific computing skill for querying genomic databases using gget — the Python package that provides a simple interface to query Ensembl, UniProt, NCBI, PDB, and other biological databases directly from Python or the command line without complex API setup.

When to Use This Skill

Choose Ultimate gget Framework when:

Quickly looking up gene/protein information without setting up database APIs
Fetching sequences, annotations, or structures by gene name or ID
Running BLAST searches programmatically
Performing enrichment analysis on gene lists

Consider alternatives when:

You need complex, filtered database queries (use specific database APIs)
You need bulk data downloads (use BioMart or FTP)
You need real-time database monitoring (use database-specific tools)
You need single-cell data (use CellxGene Census)

Quick Start


claude "Look up the TP53 gene and get its protein structure with gget"


import gget

# Search for a gene
results = gget.search(["TP53"], species="homo_sapiens")
print(results[["ensembl_id", "gene_name", "description", "biotype"]])

# Get detailed gene info
info = gget.info(["ENSG00000141510"])
print(f"Symbol: {info['gene_name'].values[0]}")
print(f"Location: chr{info['chromosome'].values[0]}:{info['start'].values[0]}-{info['end'].values[0]}")

# Get protein sequence
seq = gget.seq("ENSG00000141510", translate=True)
print(f"Protein sequence length: {len(seq['sequence'].values[0])} aa")

# Predict structure with ESMFold
structure = gget.alphafold("ENSP00000269305")
# Returns predicted PDB structure

# Run BLAST
blast_results = gget.blast(seq["sequence"].values[0][:100])
print(blast_results[["scientific_name", "percent_identity", "e_value"]].head())

Core Concepts

gget Functions

Function	Database	Purpose
`gget.search()`	Ensembl	Find genes by keyword
`gget.info()`	Ensembl	Detailed gene/transcript info
`gget.seq()`	Ensembl	Get DNA/protein sequences
`gget.blast()`	NCBI BLAST	Sequence similarity search
`gget.alphafold()`	AlphaFold DB	Get predicted structures
`gget.enrichr()`	Enrichr	Gene set enrichment analysis
`gget.archs4()`	ARCHS4	Gene expression correlations
`gget.pdb()`	RCSB PDB	Query protein structures
`gget.muscle()`	MUSCLE	Multiple sequence alignment

Gene Set Enrichment


# Enrichment analysis with Enrichr
gene_list = ["TP53", "BRCA1", "MDM2", "CDKN2A", "RB1",
             "ATM", "CHEK2", "PTEN", "APC", "VHL"]

enrichment = gget.enrichr(
    genes=gene_list,
    database="KEGG_2021_Human"
)
print("Top enriched pathways:")
print(enrichment[["Term", "Adjusted P-value", "Genes"]].head(10))

Cross-Database Lookups


# Gene → Protein → Structure pipeline
gene_id = gget.search(["insulin"], species="homo_sapiens")
ensembl_id = gene_id["ensembl_id"].values[0]

# Get protein info
info = gget.info([ensembl_id])

# Get protein sequence
protein_seq = gget.seq(ensembl_id, translate=True)

# Find PDB structures
pdb_results = gget.pdb(ensembl_id)
if pdb_results is not None:
    print(f"PDB structures: {len(pdb_results)}")

Configuration

Parameter	Description	Default
`species`	Target organism	`homo_sapiens`
`ensembl_release`	Ensembl version to query	Latest
`translate`	Return protein instead of DNA	`False`
`database`	Enrichr library to use	`KEGG_2021_Human`
`json`	Return JSON instead of DataFrame	`False`

Best Practices

Use Ensembl IDs for precision. Gene symbols can be ambiguous across species. When you find a gene with gget.search(), use the returned Ensembl ID for subsequent queries to avoid mismatches.
Combine gget functions for research workflows. Chain search → info → seq → blast or search → enrichr to build end-to-end analysis pipelines. Each function's output feeds naturally into the next.
Cache results for reproducibility. gget queries live databases that update regularly. Save important results to local files with timestamps so you can reproduce your analysis even if the database content changes.
Use gget.enrichr() with multiple databases. Don't rely on a single enrichment database. Run enrichment against KEGG, GO, Reactome, and disease databases to get a comprehensive functional picture.
Check gget version compatibility. gget's API may change between versions. Pin the version in your requirements and check the changelog when upgrading to ensure backward compatibility.

Common Issues

gget.search() returns no results. The search term may not match Ensembl's naming. Try alternative names, gene symbols, or descriptions. Also verify the species parameter matches Ensembl's naming convention (e.g., homo_sapiens not human).

gget.alphafold() fails for a valid protein. Not all proteins have AlphaFold predictions. The protein must be in the AlphaFold database with a valid UniProt accession. Use gget.pdb() as an alternative for experimental structures.

gget.blast() times out on long sequences. NCBI BLAST has query length limits and may time out for very long sequences or during high-traffic periods. Split long sequences or reduce the database scope. Add retry logic for intermittent failures.

⚠️ Loading Issue

Ultimate Gget Framework

Ultimate gget Framework

When to Use This Skill

Quick Start

Core Concepts

gget Functions

Gene Set Enrichment

Cross-Database Lookups

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace