Bioservices Studio
Battle-tested skill for primary, python, tool, bioinformatics. Includes structured workflows, validation checks, and reusable patterns for scientific.
BioServices Studio
A scientific computing skill for accessing biological databases through BioServices — the Python package providing programmatic access to approximately 40 bioinformatics web services including UniProt, KEGG, ChEBI, BioModels, and many more through a unified interface.
When to Use This Skill
Choose BioServices Studio when:
- Querying multiple biological databases through a single Python interface
- Retrieving protein, pathway, chemical, or genomic data programmatically
- Cross-referencing identifiers between databases (UniProt, KEGG, ChEBI)
- Building data integration pipelines across bioinformatics databases
Consider alternatives when:
- You only need one database (use its specific API/library directly)
- You need NCBI specifically (use Biopython Entrez — more features)
- You need high-throughput sequence analysis (use bioinformatics pipelines)
- You need custom database queries (use direct REST API calls)
Quick Start
claude "Query UniProt for human kinase proteins and cross-reference with KEGG pathways"
from bioservices import UniProt, KEGG # UniProt: Search for human kinases u = UniProt() results = u.search( "kinase AND organism_id:9606 AND reviewed:true", frmt="tsv", columns="accession,gene_names,protein_name,length,go_p" ) print(results[:500]) # Preview first results # KEGG: Get pathway information k = KEGG() pathway = k.get("hsa04010") # MAPK signaling pathway print(k.parse(pathway)["NAME"]) print(k.parse(pathway)["DESCRIPTION"][:200]) # Cross-reference: UniProt to KEGG mapping = u.mapping("UniProtKB_AC-ID", "KEGG", query="P04637") print(f"TP53 KEGG ID: {mapping}")
Core Concepts
Supported Services
| Service | Database | Data Type |
|---|---|---|
UniProt | UniProt KB | Protein sequences and annotations |
KEGG | KEGG | Pathways, genes, compounds |
ChEBI | ChEBI | Chemical entities of biological interest |
BioModels | BioModels | Mathematical models of biological systems |
Ensembl | Ensembl | Genomic data and annotations |
PDB | Protein Data Bank | 3D protein structures |
ArrayExpress | ArrayExpress | Gene expression experiments |
BioGRID | BioGRID | Protein-protein interactions |
Reactome | Reactome | Biological pathways |
WikiPathways | WikiPathways | Community curated pathways |
Identifier Mapping
from bioservices import UniProt u = UniProt() # Map between identifier systems # UniProt → PDB pdb_ids = u.mapping("UniProtKB_AC-ID", "PDB", query="P53_HUMAN") # UniProt → Ensembl Gene ensembl = u.mapping("UniProtKB_AC-ID", "Ensembl", query="P04637") # UniProt → RefSeq refseq = u.mapping("UniProtKB_AC-ID", "RefSeq_Protein", query="P04637") # Gene name → UniProt uniprot_id = u.search("gene:BRCA1 AND organism_id:9606 AND reviewed:true", frmt="list")
Pathway Analysis
from bioservices import KEGG, Reactome # KEGG pathway genes k = KEGG() genes = k.get("hsa04010", "kgml") # Get pathway in KGML format # List all human pathways pathways = k.list("pathway", "hsa") print(f"Human pathways: {len(pathways.split(chr(10)))}") # Reactome pathway analysis r = Reactome() result = r.pathway_analysis(["P04637", "P38398", "Q13315"], species="Homo sapiens")
Configuration
| Parameter | Description | Default |
|---|---|---|
cache | Enable response caching | True |
verbose | Print request/response details | False |
timeout | Request timeout in seconds | 30 |
max_retries | Retry count for failed requests | 3 |
output_format | Default response format | Service-specific |
Best Practices
-
Enable caching for repeated queries. BioServices caches responses by default. For large-scale analyses where you query the same identifiers multiple times, this dramatically reduces API calls. Disable caching only when you need real-time data.
-
Use batch queries instead of loops. Most services support querying multiple identifiers at once. Instead of looping through 100 UniProt IDs one at a time, pass them as a comma-separated string or list to reduce API calls from 100 to 1.
-
Check service availability before long pipelines. Biological databases have maintenance windows. Use a quick test query before starting a pipeline that depends on a specific service. BioServices wraps errors, but a down service will cause pipeline failures.
-
Use structured output formats. Request data in TSV or JSON format rather than raw text for easier parsing. BioServices supports format specification on most service methods — use
frmt="json"orfrmt="tsv"for programmatic processing. -
Cross-reference identifiers through UniProt. UniProt's mapping service is the most reliable way to convert between identifier systems (Ensembl, RefSeq, PDB, KEGG). Use it as a hub for identifier translation rather than maintaining your own mapping tables.
Common Issues
Service returns timeout error. Increase the timeout: service = UniProt(timeout=60). Some queries against large databases take longer than the default timeout. For very large result sets, paginate or narrow the search query.
Mapping returns no results for valid identifiers. Identifier formats matter. UniProt accessions (P04637) differ from entry names (P53_HUMAN) — specify the correct source database format. Also check that the identifier exists in the target database.
Different services return conflicting data. Biological databases update on different schedules. UniProt may have annotations that KEGG hasn't incorporated yet, and vice versa. When data conflicts arise, check the source database's release date and prefer the most recently updated source.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.