Ultimate Medchem

Apply medicinal chemistry filters and structural alerts to prioritize compound libraries for drug discovery using the medchem Python library. This skill covers molecular filtering rules, PAINS detection, lead-likeness scoring, and automated triage of chemical libraries at scale.

When to Use This Skill

Choose Ultimate Medchem when you need to:

Filter compound libraries against established medicinal chemistry rules (Lipinski, Veber, PAINS)
Identify structural alerts and reactive functional groups in hit compounds
Score and rank molecules by drug-likeness for lead optimization
Build automated compound triage pipelines for high-throughput screening results

Consider alternatives when:

You need molecular docking or binding affinity prediction (use AutoDock or DiffDock)
You need ADMET property prediction with ML models (use DeepChem or ADMETlab)
You need de novo molecule generation (use generative chemistry tools)

Quick Start


# Install medchem with dependencies
pip install medchem rdkit-pypi


import medchem
from medchem.filter import RuleFilters
from rdkit import Chem

# Load molecules from SMILES
smiles_list = [
    "CC(=O)Oc1ccccc1C(=O)O",       # Aspirin
    "CN1C=NC2=C1C(=O)N(C(=O)N2C)C", # Caffeine
    "CC12CCC3C(C1CCC2O)CCC4=CC(=O)CCC34C"  # Testosterone
]
mols = [Chem.MolFromSmiles(s) for s in smiles_list]

# Apply Lipinski Rule of Five
rf = RuleFilters()
for smi, mol in zip(smiles_list, mols):
    passes = rf.filter_lipinski(mol)
    print(f"{smi[:30]:30s} Lipinski: {'PASS' if passes else 'FAIL'}")

Core Concepts

Available Filter Sets

Filter	Rules Applied	Use Case
Lipinski Ro5	MW<500, LogP<5, HBD<5, HBA<10	Oral bioavailability
Veber	PSA<140, RotBonds<10	Intestinal absorption
PAINS	Pan-assay interference patterns	False positive removal
Brenk	Reactive/toxic substructures	Safety filtering
Lead-likeness	MW 200-350, LogP -1 to 3, RotBonds<7	Lead optimization starts
Ghose	MW 160-480, LogP -0.4 to 5.6, atoms 20-70	Drug-like range
REOS	Reactive groups, warheads	HTS triage

Comprehensive Compound Triage


import medchem
from medchem.filter import RuleFilters, StructuralAlerts
from rdkit import Chem
from rdkit.Chem import Descriptors, rdMolDescriptors
import pandas as pd

def triage_compounds(smiles_list):
    """Apply comprehensive medchem filters to a compound list."""
    rf = RuleFilters()
    sa = StructuralAlerts()

    results = []
    for smi in smiles_list:
        mol = Chem.MolFromSmiles(smi)
        if mol is None:
            results.append({"smiles": smi, "status": "INVALID"})
            continue

        record = {
            "smiles": smi,
            "mw": Descriptors.MolWt(mol),
            "logp": Descriptors.MolLogP(mol),
            "hbd": rdMolDescriptors.CalcNumHBD(mol),
            "hba": rdMolDescriptors.CalcNumHBA(mol),
            "psa": Descriptors.TPSA(mol),
            "rotatable_bonds": rdMolDescriptors.CalcNumRotatableBonds(mol),
            "lipinski_pass": rf.filter_lipinski(mol),
            "veber_pass": rf.filter_veber(mol),
            "pains_free": not sa.has_pains(mol),
            "brenk_free": not sa.has_brenk_alerts(mol),
        }

        # Overall verdict
        record["verdict"] = (
            "PASS" if all([
                record["lipinski_pass"],
                record["veber_pass"],
                record["pains_free"],
                record["brenk_free"]
            ]) else "FAIL"
        )
        results.append(record)

    return pd.DataFrame(results)

# Triage a compound library
df = triage_compounds(smiles_list)
print(f"Pass rate: {(df['verdict']=='PASS').sum()}/{len(df)}")
print(df[["smiles", "mw", "logp", "verdict"]])

Custom Filter Rules


from rdkit import Chem
from rdkit.Chem import Descriptors

def custom_cns_filter(mol):
    """Filter for CNS drug-likeness (Lipinski + CNS-specific criteria)."""
    if mol is None:
        return False

    mw = Descriptors.MolWt(mol)
    logp = Descriptors.MolLogP(mol)
    hbd = Chem.rdMolDescriptors.CalcNumHBD(mol)
    psa = Descriptors.TPSA(mol)

    # CNS-specific thresholds (stricter than general Ro5)
    return (
        mw <= 400 and
        logp >= 1.0 and logp <= 3.0 and
        hbd <= 2 and
        psa <= 90
    )

# Apply custom filter
for smi in smiles_list:
    mol = Chem.MolFromSmiles(smi)
    passes = custom_cns_filter(mol)
    print(f"{smi[:30]:30s} CNS filter: {'PASS' if passes else 'FAIL'}")

Configuration

Parameter	Description	Default
`filter_set`	Which rule set to apply	`"lipinski"`
`alert_collection`	Structural alert database	`"pains"`
`strict_mode`	Fail on any single violation	`true`
`max_violations`	Allowed rule violations	`0`
`output_format`	Results format	`"dataframe"`
`include_descriptors`	Calculate molecular descriptors	`true`

Best Practices

Apply filters in the correct order — Run PAINS and structural alert filters first to remove assay artifacts, then apply drug-likeness rules. Calculating expensive descriptors on PAINS compounds wastes compute time on molecules you'll reject anyway.
Allow one Lipinski violation for known drug space — Strict Ro5 with zero violations eliminates many approved drugs. Allow one violation (max_violations=1) when evaluating natural-product-derived or macrocyclic compounds, which often break one rule while remaining bioavailable.
Validate SMILES before filtering — Invalid SMILES strings cause Chem.MolFromSmiles() to return None, which crashes downstream filters. Always check for None molecules and log invalid entries separately rather than silently skipping them.
Combine filters with property distributions — Don't just count pass/fail. Plot distributions of MW, LogP, and PSA for your library to identify systematic biases. A library that technically passes Lipinski but clusters at MW 490 is riskier than one centered at 350.
Document which filters you applied — When reporting triage results, specify the exact filter sets, thresholds, and structural alert collections used. Different PAINS implementations flag different numbers of compounds, and reviewers need to assess your methodology.

Common Issues

RDKit molecule parsing returns None silently — SMILES with invalid valences, kekulization errors, or non-standard atoms produce None without raising exceptions. Always count and report the number of unparseable molecules. Use Chem.MolFromSmiles(smi, sanitize=False) followed by Chem.SanitizeMol() in a try/except to capture specific parsing errors.

PAINS filter flagging too many compounds — The original PAINS filter set includes some patterns that appear in legitimate drug scaffolds. Use the refined PAINS-A set (most problematic patterns) rather than the full PAINS-A/B/C set for initial screening. Always inspect flagged substructures visually before discarding entire compound series.

Conflicting filter results between tools — Different tools (RDKit, DataWarrior, medchem) implement slightly different threshold values or descriptor calculations. For example, LogP values vary between Wildman-Crippen (RDKit default) and XLogP3. Standardize on one descriptor calculation method and document it clearly.

⚠️ Loading Issue

Ultimate Medchem

Ultimate Medchem

When to Use This Skill

Quick Start

Core Concepts

Available Filter Sets

Comprehensive Compound Triage

Custom Filter Rules

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace