B

Bioservices Studio

Battle-tested skill for primary, python, tool, bioinformatics. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

BioServices Studio

A scientific computing skill for accessing biological databases through BioServices — the Python package providing programmatic access to approximately 40 bioinformatics web services including UniProt, KEGG, ChEBI, BioModels, and many more through a unified interface.

When to Use This Skill

Choose BioServices Studio when:

  • Querying multiple biological databases through a single Python interface
  • Retrieving protein, pathway, chemical, or genomic data programmatically
  • Cross-referencing identifiers between databases (UniProt, KEGG, ChEBI)
  • Building data integration pipelines across bioinformatics databases

Consider alternatives when:

  • You only need one database (use its specific API/library directly)
  • You need NCBI specifically (use Biopython Entrez — more features)
  • You need high-throughput sequence analysis (use bioinformatics pipelines)
  • You need custom database queries (use direct REST API calls)

Quick Start

claude "Query UniProt for human kinase proteins and cross-reference with KEGG pathways"
from bioservices import UniProt, KEGG # UniProt: Search for human kinases u = UniProt() results = u.search( "kinase AND organism_id:9606 AND reviewed:true", frmt="tsv", columns="accession,gene_names,protein_name,length,go_p" ) print(results[:500]) # Preview first results # KEGG: Get pathway information k = KEGG() pathway = k.get("hsa04010") # MAPK signaling pathway print(k.parse(pathway)["NAME"]) print(k.parse(pathway)["DESCRIPTION"][:200]) # Cross-reference: UniProt to KEGG mapping = u.mapping("UniProtKB_AC-ID", "KEGG", query="P04637") print(f"TP53 KEGG ID: {mapping}")

Core Concepts

Supported Services

ServiceDatabaseData Type
UniProtUniProt KBProtein sequences and annotations
KEGGKEGGPathways, genes, compounds
ChEBIChEBIChemical entities of biological interest
BioModelsBioModelsMathematical models of biological systems
EnsemblEnsemblGenomic data and annotations
PDBProtein Data Bank3D protein structures
ArrayExpressArrayExpressGene expression experiments
BioGRIDBioGRIDProtein-protein interactions
ReactomeReactomeBiological pathways
WikiPathwaysWikiPathwaysCommunity curated pathways

Identifier Mapping

from bioservices import UniProt u = UniProt() # Map between identifier systems # UniProt → PDB pdb_ids = u.mapping("UniProtKB_AC-ID", "PDB", query="P53_HUMAN") # UniProt → Ensembl Gene ensembl = u.mapping("UniProtKB_AC-ID", "Ensembl", query="P04637") # UniProt → RefSeq refseq = u.mapping("UniProtKB_AC-ID", "RefSeq_Protein", query="P04637") # Gene name → UniProt uniprot_id = u.search("gene:BRCA1 AND organism_id:9606 AND reviewed:true", frmt="list")

Pathway Analysis

from bioservices import KEGG, Reactome # KEGG pathway genes k = KEGG() genes = k.get("hsa04010", "kgml") # Get pathway in KGML format # List all human pathways pathways = k.list("pathway", "hsa") print(f"Human pathways: {len(pathways.split(chr(10)))}") # Reactome pathway analysis r = Reactome() result = r.pathway_analysis(["P04637", "P38398", "Q13315"], species="Homo sapiens")

Configuration

ParameterDescriptionDefault
cacheEnable response cachingTrue
verbosePrint request/response detailsFalse
timeoutRequest timeout in seconds30
max_retriesRetry count for failed requests3
output_formatDefault response formatService-specific

Best Practices

  1. Enable caching for repeated queries. BioServices caches responses by default. For large-scale analyses where you query the same identifiers multiple times, this dramatically reduces API calls. Disable caching only when you need real-time data.

  2. Use batch queries instead of loops. Most services support querying multiple identifiers at once. Instead of looping through 100 UniProt IDs one at a time, pass them as a comma-separated string or list to reduce API calls from 100 to 1.

  3. Check service availability before long pipelines. Biological databases have maintenance windows. Use a quick test query before starting a pipeline that depends on a specific service. BioServices wraps errors, but a down service will cause pipeline failures.

  4. Use structured output formats. Request data in TSV or JSON format rather than raw text for easier parsing. BioServices supports format specification on most service methods — use frmt="json" or frmt="tsv" for programmatic processing.

  5. Cross-reference identifiers through UniProt. UniProt's mapping service is the most reliable way to convert between identifier systems (Ensembl, RefSeq, PDB, KEGG). Use it as a hub for identifier translation rather than maintaining your own mapping tables.

Common Issues

Service returns timeout error. Increase the timeout: service = UniProt(timeout=60). Some queries against large databases take longer than the default timeout. For very large result sets, paginate or narrow the search query.

Mapping returns no results for valid identifiers. Identifier formats matter. UniProt accessions (P04637) differ from entry names (P53_HUMAN) — specify the correct source database format. Also check that the identifier exists in the target database.

Different services return conflicting data. Biological databases update on different schedules. UniProt may have annotations that KEGG hasn't incorporated yet, and vice versa. When data conflicts arise, check the source database's release date and prefer the most recently updated source.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates