Advanced Chembl Database
Production-ready skill that handles query, chembl, bioactive, molecules. Includes structured workflows, validation checks, and reusable patterns for scientific.
Advanced ChEMBL Database
A scientific computing skill for querying ChEMBL — the European Bioinformatics Institute's manually curated database of bioactive molecules with drug-like properties. Advanced ChEMBL Database helps you search compounds by target, retrieve bioactivity data, and build structure-activity relationship (SAR) analyses for drug discovery research.
When to Use This Skill
Choose Advanced ChEMBL Database when:
- Searching for bioactive compounds against a specific protein target
- Retrieving IC50, EC50, Ki, or Kd bioactivity measurements
- Building structure-activity relationship datasets for lead optimization
- Finding approved drugs and their mechanism of action
Consider alternatives when:
- You need enzyme kinetics data (use BRENDA)
- You need drug-drug interactions (use DrugBank)
- You need chemical structures only (use PubChem)
- You need clinical trial data (use ClinicalTrials.gov)
Quick Start
claude "Find all compounds active against EGFR with IC50 below 100 nM"
from chembl_webresource_client.new_client import new_client # Search for target target = new_client.target egfr_results = target.search("EGFR human") egfr_id = egfr_results[0]["target_chembl_id"] print(f"Target: {egfr_id}") # Get bioactivity data activity = new_client.activity acts = activity.filter( target_chembl_id=egfr_id, standard_type="IC50", standard_relation="=", standard_units="nM" ).filter(standard_value__lte=100) print(f"Compounds with IC50 ≤ 100 nM: {len(acts)}") for a in acts[:5]: print(f" {a['molecule_chembl_id']}: IC50 = {a['standard_value']} nM")
Core Concepts
ChEMBL Data Model
| Entity | Description | Example ID |
|---|---|---|
| Target | Biological target (protein, organism) | CHEMBL203 (EGFR) |
| Molecule | Chemical compound | CHEMBL941 (Erlotinib) |
| Assay | Experimental measurement setup | CHEMBL674840 |
| Activity | Measured bioactivity value | IC50, EC50, Ki |
| Document | Source publication | CHEMBL1127557 |
Querying Bioactivity Data
from chembl_webresource_client.new_client import new_client import pandas as pd molecule = new_client.molecule activity = new_client.activity # Get all activities for a known drug erlotinib = molecule.search("erlotinib")[0] drug_activities = activity.filter( molecule_chembl_id=erlotinib["molecule_chembl_id"] ) # Convert to DataFrame for analysis df = pd.DataFrame(drug_activities) df = df[df["standard_type"].isin(["IC50", "EC50", "Ki"])] df["standard_value"] = pd.to_numeric(df["standard_value"], errors="coerce") print(df.groupby("standard_type")["standard_value"].describe())
Structure-Activity Relationships
# Build SAR dataset for a target target_id = "CHEMBL203" # EGFR activities = activity.filter( target_chembl_id=target_id, standard_type="IC50", standard_units="nM" ) sar_data = [] for act in activities: mol = molecule.get(act["molecule_chembl_id"]) if mol and mol.get("molecule_structures"): sar_data.append({ "chembl_id": act["molecule_chembl_id"], "smiles": mol["molecule_structures"]["canonical_smiles"], "ic50_nM": float(act["standard_value"]), "pIC50": -np.log10(float(act["standard_value"]) * 1e-9) }) sar_df = pd.DataFrame(sar_data) print(f"SAR dataset: {len(sar_df)} compounds")
Configuration
| Parameter | Description | Default |
|---|---|---|
api_base_url | ChEMBL API endpoint | https://www.ebi.ac.uk/chembl/api/data |
result_limit | Max results per query | 1000 |
standard_type_filter | Activity types to include | ["IC50", "EC50", "Ki"] |
species_filter | Target organism | Homo sapiens |
include_structures | Fetch SMILES with results | true |
Best Practices
-
Filter by standard_type and standard_units. ChEMBL contains diverse assay types with different units. Always filter to specific measurement types (IC50, Ki) and units (nM) to ensure comparable values. Mixing IC50 and percentage inhibition data produces meaningless analyses.
-
Use pChEMBL values for cross-assay comparison. ChEMBL provides normalized pChEMBL values (-log10 of molar activity) that are comparable across different activity types. Use these for ranking compounds when mixing IC50, EC50, and Ki data.
-
Check the confidence score for target assignments. ChEMBL assigns confidence scores (1-9) to target-activity relationships. For SAR analysis, filter to confidence ≥ 7 to ensure the activity data is reliably linked to your target of interest.
-
Paginate through large result sets. The ChEMBL API returns paginated results. Use the Python client's lazy loading to iterate through all results, but be aware that very large queries (>100K activities) may take several minutes.
-
Cache compound structures locally. If your analysis repeatedly accesses the same molecules, cache their SMILES and properties locally. Network latency for individual molecule lookups adds up quickly in large SAR analyses.
Common Issues
Query returns no results for a known target. The target name might not match ChEMBL's naming convention. Search by gene name, UniProt accession, or target type rather than common name. Use target.search() with broad terms to find the correct CHEMBL target ID.
Activity values vary wildly for the same compound-target pair. Different assays measure different things — cell-based vs. biochemical, different cell lines, different incubation times. Group activities by assay description and compare within assay groups rather than across all measurements.
Structure search returns too many or too few hits. Substructure searches can be broad (matching many irrelevant compounds) while exact searches may miss close analogs. Use similarity search with a Tanimoto threshold (0.7-0.8) for balanced SAR analysis.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.