BioServices Studio

A scientific computing skill for accessing biological databases through BioServices — the Python package providing programmatic access to approximately 40 bioinformatics web services including UniProt, KEGG, ChEBI, BioModels, and many more through a unified interface.

When to Use This Skill

Choose BioServices Studio when:

Querying multiple biological databases through a single Python interface
Retrieving protein, pathway, chemical, or genomic data programmatically
Cross-referencing identifiers between databases (UniProt, KEGG, ChEBI)
Building data integration pipelines across bioinformatics databases

Consider alternatives when:

You only need one database (use its specific API/library directly)
You need NCBI specifically (use Biopython Entrez — more features)
You need high-throughput sequence analysis (use bioinformatics pipelines)
You need custom database queries (use direct REST API calls)

Quick Start


claude "Query UniProt for human kinase proteins and cross-reference with KEGG pathways"


from bioservices import UniProt, KEGG

# UniProt: Search for human kinases
u = UniProt()
results = u.search(
    "kinase AND organism_id:9606 AND reviewed:true",
    frmt="tsv",
    columns="accession,gene_names,protein_name,length,go_p"
)
print(results[:500])  # Preview first results

# KEGG: Get pathway information
k = KEGG()
pathway = k.get("hsa04010")  # MAPK signaling pathway
print(k.parse(pathway)["NAME"])
print(k.parse(pathway)["DESCRIPTION"][:200])

# Cross-reference: UniProt to KEGG
mapping = u.mapping("UniProtKB_AC-ID", "KEGG", query="P04637")
print(f"TP53 KEGG ID: {mapping}")

Core Concepts

Supported Services

Service	Database	Data Type
`UniProt`	UniProt KB	Protein sequences and annotations
`KEGG`	KEGG	Pathways, genes, compounds
`ChEBI`	ChEBI	Chemical entities of biological interest
`BioModels`	BioModels	Mathematical models of biological systems
`Ensembl`	Ensembl	Genomic data and annotations
`PDB`	Protein Data Bank	3D protein structures
`ArrayExpress`	ArrayExpress	Gene expression experiments
`BioGRID`	BioGRID	Protein-protein interactions
`Reactome`	Reactome	Biological pathways
`WikiPathways`	WikiPathways	Community curated pathways

Identifier Mapping


from bioservices import UniProt

u = UniProt()

# Map between identifier systems
# UniProt → PDB
pdb_ids = u.mapping("UniProtKB_AC-ID", "PDB", query="P53_HUMAN")

# UniProt → Ensembl Gene
ensembl = u.mapping("UniProtKB_AC-ID", "Ensembl", query="P04637")

# UniProt → RefSeq
refseq = u.mapping("UniProtKB_AC-ID", "RefSeq_Protein", query="P04637")

# Gene name → UniProt
uniprot_id = u.search("gene:BRCA1 AND organism_id:9606 AND reviewed:true",
                       frmt="list")

Pathway Analysis


from bioservices import KEGG, Reactome

# KEGG pathway genes
k = KEGG()
genes = k.get("hsa04010", "kgml")  # Get pathway in KGML format

# List all human pathways
pathways = k.list("pathway", "hsa")
print(f"Human pathways: {len(pathways.split(chr(10)))}")

# Reactome pathway analysis
r = Reactome()
result = r.pathway_analysis(["P04637", "P38398", "Q13315"],
                            species="Homo sapiens")

Configuration

Parameter	Description	Default
`cache`	Enable response caching	`True`
`verbose`	Print request/response details	`False`
`timeout`	Request timeout in seconds	`30`
`max_retries`	Retry count for failed requests	`3`
`output_format`	Default response format	Service-specific

Best Practices

Enable caching for repeated queries. BioServices caches responses by default. For large-scale analyses where you query the same identifiers multiple times, this dramatically reduces API calls. Disable caching only when you need real-time data.
Use batch queries instead of loops. Most services support querying multiple identifiers at once. Instead of looping through 100 UniProt IDs one at a time, pass them as a comma-separated string or list to reduce API calls from 100 to 1.
Check service availability before long pipelines. Biological databases have maintenance windows. Use a quick test query before starting a pipeline that depends on a specific service. BioServices wraps errors, but a down service will cause pipeline failures.
Use structured output formats. Request data in TSV or JSON format rather than raw text for easier parsing. BioServices supports format specification on most service methods — use frmt="json" or frmt="tsv" for programmatic processing.
Cross-reference identifiers through UniProt. UniProt's mapping service is the most reliable way to convert between identifier systems (Ensembl, RefSeq, PDB, KEGG). Use it as a hub for identifier translation rather than maintaining your own mapping tables.

Common Issues

Service returns timeout error. Increase the timeout: service = UniProt(timeout=60). Some queries against large databases take longer than the default timeout. For very large result sets, paginate or narrow the search query.

Mapping returns no results for valid identifiers. Identifier formats matter. UniProt accessions (P04637) differ from entry names (P53_HUMAN) — specify the correct source database format. Also check that the identifier exists in the target database.

Different services return conflicting data. Biological databases update on different schedules. UniProt may have annotations that KEGG hasn't incorporated yet, and vice versa. When data conflicts arise, check the source database's release date and prefer the most recently updated source.

⚠️ Loading Issue

Bioservices Studio

BioServices Studio

When to Use This Skill

Quick Start

Core Concepts

Supported Services

Identifier Mapping

Pathway Analysis

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace