Pro Gene Workspace
Powerful skill for query, ncbi, gene, utilities. Includes structured workflows, validation checks, and reusable patterns for scientific.
Pro Gene Workspace
A scientific computing skill for gene-centric bioinformatics analysis — retrieving gene information, annotations, expression data, and functional characterizations from major genomics databases. Pro Gene Workspace provides a unified workflow for investigating gene function across NCBI Gene, Ensembl, UniProt, and Gene Ontology.
When to Use This Skill
Choose Pro Gene Workspace when:
- Looking up comprehensive gene information across databases
- Retrieving gene annotations, GO terms, and pathway memberships
- Analyzing gene expression patterns across tissues and conditions
- Building gene-centric reports for research or clinical interpretation
Consider alternatives when:
- You need variant-level data (use ClinVar, gnomAD)
- You need protein structure data (use PDB, AlphaFold)
- You need single-cell expression data (use CellxGene)
- You need genome-wide analysis (use Ensembl BioMart for bulk queries)
Quick Start
claude "Get comprehensive information about the TP53 gene"
from Bio import Entrez import requests Entrez.email = "[email protected]" # NCBI Gene search handle = Entrez.esearch(db="gene", term="TP53[gene] AND Homo sapiens[orgn]") results = Entrez.read(handle) gene_id = results["IdList"][0] # Get gene summary handle = Entrez.efetch(db="gene", id=gene_id, rettype="gene_table", retmode="text") gene_info = handle.read() print(gene_info[:500]) # UniProt annotations uniprot_url = "https://rest.uniprot.org/uniprotkb/search" response = requests.get(uniprot_url, params={ "query": "gene:TP53 AND organism_id:9606 AND reviewed:true", "format": "json", "fields": "accession,gene_names,protein_name,go_p,go_f,go_c,length" }) protein = response.json()["results"][0] print(f"\nProtein: {protein['proteinDescription']['recommendedName']['fullName']['value']}") print(f"UniProt: {protein['primaryAccession']}") print(f"Length: {protein['sequence']['length']} aa")
Core Concepts
Gene Information Sources
| Database | Focus | Key Data |
|---|---|---|
| NCBI Gene | Gene-centric aggregation | Summary, references, homologs |
| Ensembl | Genomic annotations | Coordinates, transcripts, regulation |
| UniProt | Protein annotations | Function, GO terms, domains |
| Gene Ontology | Functional classification | Molecular function, process, component |
| GTEx | Expression across tissues | TPM values, eQTLs |
| OMIM | Disease associations | Phenotype-gene relationships |
Multi-Database Gene Report
def gene_report(gene_symbol, species="Homo sapiens"): """Compile gene information from multiple databases""" report = {"symbol": gene_symbol} # NCBI Gene handle = Entrez.esearch(db="gene", term=f"{gene_symbol}[gene] AND {species}[orgn]") results = Entrez.read(handle) if results["IdList"]: report["ncbi_gene_id"] = results["IdList"][0] # Ensembl ens_resp = requests.get( f"https://rest.ensembl.org/lookup/symbol/homo_sapiens/{gene_symbol}", headers={"Content-Type": "application/json"} ) if ens_resp.ok: ens = ens_resp.json() report["ensembl_id"] = ens["id"] report["location"] = f"chr{ens['seq_region_name']}:{ens['start']}-{ens['end']}" report["biotype"] = ens["biotype"] # UniProt up_resp = requests.get( "https://rest.uniprot.org/uniprotkb/search", params={ "query": f"gene:{gene_symbol} AND organism_id:9606 AND reviewed:true", "format": "json", "fields": "accession,protein_name,go_p,length" } ) if up_resp.ok and up_resp.json()["results"]: up = up_resp.json()["results"][0] report["uniprot_id"] = up["primaryAccession"] report["protein_length"] = up["sequence"]["length"] return report tp53 = gene_report("TP53")
Gene Ontology Analysis
# GO term enrichment using goatools from goatools.go_enrichment import GOEnrichmentStudy from goatools.obo_parser import GODag # Load GO DAG obo_dag = GODag("go-basic.obo") # Run enrichment analysis gene_list = ["TP53", "BRCA1", "MDM2", "CDKN2A", "RB1"] # Study genes background = all_genes # All genes in genome goe = GOEnrichmentStudy( background, gene2go, # Gene-to-GO mapping obo_dag, methods=["fdr_bh"] ) results = goe.run_study(gene_list) significant = [r for r in results if r.p_fdr_bh < 0.05] for r in sorted(significant, key=lambda x: x.p_fdr_bh)[:10]: print(f"{r.GO}: {r.name} (FDR={r.p_fdr_bh:.4f})")
Configuration
| Parameter | Description | Default |
|---|---|---|
species | Target organism | Homo sapiens |
databases | Sources to query | [ncbi, ensembl, uniprot] |
include_go | Retrieve GO annotations | true |
include_expression | Retrieve GTEx expression | false |
enrichment_method | GO enrichment p-value correction | fdr_bh |
Best Practices
-
Cross-reference across databases. No single database has complete gene information. Combine NCBI Gene (summary, references), Ensembl (coordinates, transcripts), and UniProt (protein function, domains) for a comprehensive picture.
-
Use approved gene symbols. HGNC (Hugo Gene Nomenclature Committee) maintains the official gene naming standard. Use approved symbols to avoid confusion from aliases — "p53" might not resolve correctly, but "TP53" will.
-
Check gene aliases for database mismatches. The same gene may have different names or IDs across databases. Use NCBI Gene's alias list or UniProt's gene name mappings to resolve discrepancies.
-
Include tissue-specific context. A gene's function varies by tissue. Include GTEx expression data to understand where the gene is active, which is critical for interpreting disease associations and drug target potential.
-
Use GO enrichment with appropriate backgrounds. GO enrichment requires a background gene set. Use all expressed genes (not all genes in the genome) as background to avoid inflating significance. The choice of background dramatically affects results.
Common Issues
Gene symbol maps to multiple Ensembl IDs. Some gene symbols refer to readthrough transcripts or pseudogenes that have separate Ensembl IDs. Filter by biotype: protein_coding to focus on the primary gene, and verify the chromosomal location matches expectations.
GO enrichment returns no significant terms. Common causes: gene list too small (<10 genes), inappropriate background set, or genes don't share functional themes. Try relaxing the FDR threshold or using a different enrichment method.
Expression data varies between databases. GTEx, Human Protein Atlas, and NCBI GEO may show different expression patterns due to different sample preparations, normalization methods, and tissue definitions. Note the data source and version when reporting expression data.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.