A

Advanced Chembl Database

Production-ready skill that handles query, chembl, bioactive, molecules. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

Advanced ChEMBL Database

A scientific computing skill for querying ChEMBL — the European Bioinformatics Institute's manually curated database of bioactive molecules with drug-like properties. Advanced ChEMBL Database helps you search compounds by target, retrieve bioactivity data, and build structure-activity relationship (SAR) analyses for drug discovery research.

When to Use This Skill

Choose Advanced ChEMBL Database when:

  • Searching for bioactive compounds against a specific protein target
  • Retrieving IC50, EC50, Ki, or Kd bioactivity measurements
  • Building structure-activity relationship datasets for lead optimization
  • Finding approved drugs and their mechanism of action

Consider alternatives when:

  • You need enzyme kinetics data (use BRENDA)
  • You need drug-drug interactions (use DrugBank)
  • You need chemical structures only (use PubChem)
  • You need clinical trial data (use ClinicalTrials.gov)

Quick Start

claude "Find all compounds active against EGFR with IC50 below 100 nM"
from chembl_webresource_client.new_client import new_client # Search for target target = new_client.target egfr_results = target.search("EGFR human") egfr_id = egfr_results[0]["target_chembl_id"] print(f"Target: {egfr_id}") # Get bioactivity data activity = new_client.activity acts = activity.filter( target_chembl_id=egfr_id, standard_type="IC50", standard_relation="=", standard_units="nM" ).filter(standard_value__lte=100) print(f"Compounds with IC50 ≤ 100 nM: {len(acts)}") for a in acts[:5]: print(f" {a['molecule_chembl_id']}: IC50 = {a['standard_value']} nM")

Core Concepts

ChEMBL Data Model

EntityDescriptionExample ID
TargetBiological target (protein, organism)CHEMBL203 (EGFR)
MoleculeChemical compoundCHEMBL941 (Erlotinib)
AssayExperimental measurement setupCHEMBL674840
ActivityMeasured bioactivity valueIC50, EC50, Ki
DocumentSource publicationCHEMBL1127557

Querying Bioactivity Data

from chembl_webresource_client.new_client import new_client import pandas as pd molecule = new_client.molecule activity = new_client.activity # Get all activities for a known drug erlotinib = molecule.search("erlotinib")[0] drug_activities = activity.filter( molecule_chembl_id=erlotinib["molecule_chembl_id"] ) # Convert to DataFrame for analysis df = pd.DataFrame(drug_activities) df = df[df["standard_type"].isin(["IC50", "EC50", "Ki"])] df["standard_value"] = pd.to_numeric(df["standard_value"], errors="coerce") print(df.groupby("standard_type")["standard_value"].describe())

Structure-Activity Relationships

# Build SAR dataset for a target target_id = "CHEMBL203" # EGFR activities = activity.filter( target_chembl_id=target_id, standard_type="IC50", standard_units="nM" ) sar_data = [] for act in activities: mol = molecule.get(act["molecule_chembl_id"]) if mol and mol.get("molecule_structures"): sar_data.append({ "chembl_id": act["molecule_chembl_id"], "smiles": mol["molecule_structures"]["canonical_smiles"], "ic50_nM": float(act["standard_value"]), "pIC50": -np.log10(float(act["standard_value"]) * 1e-9) }) sar_df = pd.DataFrame(sar_data) print(f"SAR dataset: {len(sar_df)} compounds")

Configuration

ParameterDescriptionDefault
api_base_urlChEMBL API endpointhttps://www.ebi.ac.uk/chembl/api/data
result_limitMax results per query1000
standard_type_filterActivity types to include["IC50", "EC50", "Ki"]
species_filterTarget organismHomo sapiens
include_structuresFetch SMILES with resultstrue

Best Practices

  1. Filter by standard_type and standard_units. ChEMBL contains diverse assay types with different units. Always filter to specific measurement types (IC50, Ki) and units (nM) to ensure comparable values. Mixing IC50 and percentage inhibition data produces meaningless analyses.

  2. Use pChEMBL values for cross-assay comparison. ChEMBL provides normalized pChEMBL values (-log10 of molar activity) that are comparable across different activity types. Use these for ranking compounds when mixing IC50, EC50, and Ki data.

  3. Check the confidence score for target assignments. ChEMBL assigns confidence scores (1-9) to target-activity relationships. For SAR analysis, filter to confidence ≥ 7 to ensure the activity data is reliably linked to your target of interest.

  4. Paginate through large result sets. The ChEMBL API returns paginated results. Use the Python client's lazy loading to iterate through all results, but be aware that very large queries (>100K activities) may take several minutes.

  5. Cache compound structures locally. If your analysis repeatedly accesses the same molecules, cache their SMILES and properties locally. Network latency for individual molecule lookups adds up quickly in large SAR analyses.

Common Issues

Query returns no results for a known target. The target name might not match ChEMBL's naming convention. Search by gene name, UniProt accession, or target type rather than common name. Use target.search() with broad terms to find the correct CHEMBL target ID.

Activity values vary wildly for the same compound-target pair. Different assays measure different things — cell-based vs. biochemical, different cell lines, different incubation times. Group activities by assay description and compare within assay groups rather than across all measurements.

Structure search returns too many or too few hits. Substructure searches can be broad (matching many irrelevant compounds) while exact searches may miss close analogs. Use similarity search with a Tanimoto threshold (0.7-0.8) for balanced SAR analysis.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates