U

Ultimate Gget Framework

Battle-tested skill for python, toolkit, rapid, bioinformatics. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

Ultimate gget Framework

A scientific computing skill for querying genomic databases using gget — the Python package that provides a simple interface to query Ensembl, UniProt, NCBI, PDB, and other biological databases directly from Python or the command line without complex API setup.

When to Use This Skill

Choose Ultimate gget Framework when:

  • Quickly looking up gene/protein information without setting up database APIs
  • Fetching sequences, annotations, or structures by gene name or ID
  • Running BLAST searches programmatically
  • Performing enrichment analysis on gene lists

Consider alternatives when:

  • You need complex, filtered database queries (use specific database APIs)
  • You need bulk data downloads (use BioMart or FTP)
  • You need real-time database monitoring (use database-specific tools)
  • You need single-cell data (use CellxGene Census)

Quick Start

claude "Look up the TP53 gene and get its protein structure with gget"
import gget # Search for a gene results = gget.search(["TP53"], species="homo_sapiens") print(results[["ensembl_id", "gene_name", "description", "biotype"]]) # Get detailed gene info info = gget.info(["ENSG00000141510"]) print(f"Symbol: {info['gene_name'].values[0]}") print(f"Location: chr{info['chromosome'].values[0]}:{info['start'].values[0]}-{info['end'].values[0]}") # Get protein sequence seq = gget.seq("ENSG00000141510", translate=True) print(f"Protein sequence length: {len(seq['sequence'].values[0])} aa") # Predict structure with ESMFold structure = gget.alphafold("ENSP00000269305") # Returns predicted PDB structure # Run BLAST blast_results = gget.blast(seq["sequence"].values[0][:100]) print(blast_results[["scientific_name", "percent_identity", "e_value"]].head())

Core Concepts

gget Functions

FunctionDatabasePurpose
gget.search()EnsemblFind genes by keyword
gget.info()EnsemblDetailed gene/transcript info
gget.seq()EnsemblGet DNA/protein sequences
gget.blast()NCBI BLASTSequence similarity search
gget.alphafold()AlphaFold DBGet predicted structures
gget.enrichr()EnrichrGene set enrichment analysis
gget.archs4()ARCHS4Gene expression correlations
gget.pdb()RCSB PDBQuery protein structures
gget.muscle()MUSCLEMultiple sequence alignment

Gene Set Enrichment

# Enrichment analysis with Enrichr gene_list = ["TP53", "BRCA1", "MDM2", "CDKN2A", "RB1", "ATM", "CHEK2", "PTEN", "APC", "VHL"] enrichment = gget.enrichr( genes=gene_list, database="KEGG_2021_Human" ) print("Top enriched pathways:") print(enrichment[["Term", "Adjusted P-value", "Genes"]].head(10))

Cross-Database Lookups

# Gene → Protein → Structure pipeline gene_id = gget.search(["insulin"], species="homo_sapiens") ensembl_id = gene_id["ensembl_id"].values[0] # Get protein info info = gget.info([ensembl_id]) # Get protein sequence protein_seq = gget.seq(ensembl_id, translate=True) # Find PDB structures pdb_results = gget.pdb(ensembl_id) if pdb_results is not None: print(f"PDB structures: {len(pdb_results)}")

Configuration

ParameterDescriptionDefault
speciesTarget organismhomo_sapiens
ensembl_releaseEnsembl version to queryLatest
translateReturn protein instead of DNAFalse
databaseEnrichr library to useKEGG_2021_Human
jsonReturn JSON instead of DataFrameFalse

Best Practices

  1. Use Ensembl IDs for precision. Gene symbols can be ambiguous across species. When you find a gene with gget.search(), use the returned Ensembl ID for subsequent queries to avoid mismatches.

  2. Combine gget functions for research workflows. Chain search → info → seq → blast or search → enrichr to build end-to-end analysis pipelines. Each function's output feeds naturally into the next.

  3. Cache results for reproducibility. gget queries live databases that update regularly. Save important results to local files with timestamps so you can reproduce your analysis even if the database content changes.

  4. Use gget.enrichr() with multiple databases. Don't rely on a single enrichment database. Run enrichment against KEGG, GO, Reactome, and disease databases to get a comprehensive functional picture.

  5. Check gget version compatibility. gget's API may change between versions. Pin the version in your requirements and check the changelog when upgrading to ensure backward compatibility.

Common Issues

gget.search() returns no results. The search term may not match Ensembl's naming. Try alternative names, gene symbols, or descriptions. Also verify the species parameter matches Ensembl's naming convention (e.g., homo_sapiens not human).

gget.alphafold() fails for a valid protein. Not all proteins have AlphaFold predictions. The protein must be in the AlphaFold database with a valid UniProt accession. Use gget.pdb() as an alternative for experimental structures.

gget.blast() times out on long sequences. NCBI BLAST has query length limits and may time out for very long sequences or during high-traffic periods. Split long sequences or reduce the database scope. Add retry logic for intermittent failures.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates