Advanced ChEMBL Database

A scientific computing skill for querying ChEMBL — the European Bioinformatics Institute's manually curated database of bioactive molecules with drug-like properties. Advanced ChEMBL Database helps you search compounds by target, retrieve bioactivity data, and build structure-activity relationship (SAR) analyses for drug discovery research.

When to Use This Skill

Choose Advanced ChEMBL Database when:

Searching for bioactive compounds against a specific protein target
Retrieving IC50, EC50, Ki, or Kd bioactivity measurements
Building structure-activity relationship datasets for lead optimization
Finding approved drugs and their mechanism of action

Consider alternatives when:

You need enzyme kinetics data (use BRENDA)
You need drug-drug interactions (use DrugBank)
You need chemical structures only (use PubChem)
You need clinical trial data (use ClinicalTrials.gov)

Quick Start


claude "Find all compounds active against EGFR with IC50 below 100 nM"


from chembl_webresource_client.new_client import new_client

# Search for target
target = new_client.target
egfr_results = target.search("EGFR human")
egfr_id = egfr_results[0]["target_chembl_id"]
print(f"Target: {egfr_id}")

# Get bioactivity data
activity = new_client.activity
acts = activity.filter(
    target_chembl_id=egfr_id,
    standard_type="IC50",
    standard_relation="=",
    standard_units="nM"
).filter(standard_value__lte=100)

print(f"Compounds with IC50 ≤ 100 nM: {len(acts)}")
for a in acts[:5]:
    print(f"  {a['molecule_chembl_id']}: IC50 = {a['standard_value']} nM")

Core Concepts

ChEMBL Data Model

Entity	Description	Example ID
Target	Biological target (protein, organism)	CHEMBL203 (EGFR)
Molecule	Chemical compound	CHEMBL941 (Erlotinib)
Assay	Experimental measurement setup	CHEMBL674840
Activity	Measured bioactivity value	IC50, EC50, Ki
Document	Source publication	CHEMBL1127557

Querying Bioactivity Data


from chembl_webresource_client.new_client import new_client
import pandas as pd

molecule = new_client.molecule
activity = new_client.activity

# Get all activities for a known drug
erlotinib = molecule.search("erlotinib")[0]
drug_activities = activity.filter(
    molecule_chembl_id=erlotinib["molecule_chembl_id"]
)

# Convert to DataFrame for analysis
df = pd.DataFrame(drug_activities)
df = df[df["standard_type"].isin(["IC50", "EC50", "Ki"])]
df["standard_value"] = pd.to_numeric(df["standard_value"], errors="coerce")

print(df.groupby("standard_type")["standard_value"].describe())

Structure-Activity Relationships


# Build SAR dataset for a target
target_id = "CHEMBL203"  # EGFR

activities = activity.filter(
    target_chembl_id=target_id,
    standard_type="IC50",
    standard_units="nM"
)

sar_data = []
for act in activities:
    mol = molecule.get(act["molecule_chembl_id"])
    if mol and mol.get("molecule_structures"):
        sar_data.append({
            "chembl_id": act["molecule_chembl_id"],
            "smiles": mol["molecule_structures"]["canonical_smiles"],
            "ic50_nM": float(act["standard_value"]),
            "pIC50": -np.log10(float(act["standard_value"]) * 1e-9)
        })

sar_df = pd.DataFrame(sar_data)
print(f"SAR dataset: {len(sar_df)} compounds")

Configuration

Parameter	Description	Default
`api_base_url`	ChEMBL API endpoint	`https://www.ebi.ac.uk/chembl/api/data`
`result_limit`	Max results per query	`1000`
`standard_type_filter`	Activity types to include	`["IC50", "EC50", "Ki"]`
`species_filter`	Target organism	`Homo sapiens`
`include_structures`	Fetch SMILES with results	`true`

Best Practices

Filter by standard_type and standard_units. ChEMBL contains diverse assay types with different units. Always filter to specific measurement types (IC50, Ki) and units (nM) to ensure comparable values. Mixing IC50 and percentage inhibition data produces meaningless analyses.
Use pChEMBL values for cross-assay comparison. ChEMBL provides normalized pChEMBL values (-log10 of molar activity) that are comparable across different activity types. Use these for ranking compounds when mixing IC50, EC50, and Ki data.
Check the confidence score for target assignments. ChEMBL assigns confidence scores (1-9) to target-activity relationships. For SAR analysis, filter to confidence ≥ 7 to ensure the activity data is reliably linked to your target of interest.
Paginate through large result sets. The ChEMBL API returns paginated results. Use the Python client's lazy loading to iterate through all results, but be aware that very large queries (>100K activities) may take several minutes.
Cache compound structures locally. If your analysis repeatedly accesses the same molecules, cache their SMILES and properties locally. Network latency for individual molecule lookups adds up quickly in large SAR analyses.

Common Issues

Query returns no results for a known target. The target name might not match ChEMBL's naming convention. Search by gene name, UniProt accession, or target type rather than common name. Use target.search() with broad terms to find the correct CHEMBL target ID.

Activity values vary wildly for the same compound-target pair. Different assays measure different things — cell-based vs. biochemical, different cell lines, different incubation times. Group activities by assay description and compare within assay groups rather than across all measurements.

Structure search returns too many or too few hits. Substructure searches can be broad (matching many irrelevant compounds) while exact searches may miss close analogs. Use similarity search with a Tanimoto threshold (0.7-0.8) for balanced SAR analysis.

⚠️ Loading Issue

Advanced Chembl Database

Advanced ChEMBL Database

When to Use This Skill

Quick Start

Core Concepts

ChEMBL Data Model

Querying Bioactivity Data

Structure-Activity Relationships

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace