Ultimate Medchem
Streamline your workflow with this medicinal, chemistry, filters, apply. Includes structured workflows, validation checks, and reusable patterns for scientific.
Ultimate Medchem
Apply medicinal chemistry filters and structural alerts to prioritize compound libraries for drug discovery using the medchem Python library. This skill covers molecular filtering rules, PAINS detection, lead-likeness scoring, and automated triage of chemical libraries at scale.
When to Use This Skill
Choose Ultimate Medchem when you need to:
- Filter compound libraries against established medicinal chemistry rules (Lipinski, Veber, PAINS)
- Identify structural alerts and reactive functional groups in hit compounds
- Score and rank molecules by drug-likeness for lead optimization
- Build automated compound triage pipelines for high-throughput screening results
Consider alternatives when:
- You need molecular docking or binding affinity prediction (use AutoDock or DiffDock)
- You need ADMET property prediction with ML models (use DeepChem or ADMETlab)
- You need de novo molecule generation (use generative chemistry tools)
Quick Start
# Install medchem with dependencies pip install medchem rdkit-pypi
import medchem from medchem.filter import RuleFilters from rdkit import Chem # Load molecules from SMILES smiles_list = [ "CC(=O)Oc1ccccc1C(=O)O", # Aspirin "CN1C=NC2=C1C(=O)N(C(=O)N2C)C", # Caffeine "CC12CCC3C(C1CCC2O)CCC4=CC(=O)CCC34C" # Testosterone ] mols = [Chem.MolFromSmiles(s) for s in smiles_list] # Apply Lipinski Rule of Five rf = RuleFilters() for smi, mol in zip(smiles_list, mols): passes = rf.filter_lipinski(mol) print(f"{smi[:30]:30s} Lipinski: {'PASS' if passes else 'FAIL'}")
Core Concepts
Available Filter Sets
| Filter | Rules Applied | Use Case |
|---|---|---|
| Lipinski Ro5 | MW<500, LogP<5, HBD<5, HBA<10 | Oral bioavailability |
| Veber | PSA<140, RotBonds<10 | Intestinal absorption |
| PAINS | Pan-assay interference patterns | False positive removal |
| Brenk | Reactive/toxic substructures | Safety filtering |
| Lead-likeness | MW 200-350, LogP -1 to 3, RotBonds<7 | Lead optimization starts |
| Ghose | MW 160-480, LogP -0.4 to 5.6, atoms 20-70 | Drug-like range |
| REOS | Reactive groups, warheads | HTS triage |
Comprehensive Compound Triage
import medchem from medchem.filter import RuleFilters, StructuralAlerts from rdkit import Chem from rdkit.Chem import Descriptors, rdMolDescriptors import pandas as pd def triage_compounds(smiles_list): """Apply comprehensive medchem filters to a compound list.""" rf = RuleFilters() sa = StructuralAlerts() results = [] for smi in smiles_list: mol = Chem.MolFromSmiles(smi) if mol is None: results.append({"smiles": smi, "status": "INVALID"}) continue record = { "smiles": smi, "mw": Descriptors.MolWt(mol), "logp": Descriptors.MolLogP(mol), "hbd": rdMolDescriptors.CalcNumHBD(mol), "hba": rdMolDescriptors.CalcNumHBA(mol), "psa": Descriptors.TPSA(mol), "rotatable_bonds": rdMolDescriptors.CalcNumRotatableBonds(mol), "lipinski_pass": rf.filter_lipinski(mol), "veber_pass": rf.filter_veber(mol), "pains_free": not sa.has_pains(mol), "brenk_free": not sa.has_brenk_alerts(mol), } # Overall verdict record["verdict"] = ( "PASS" if all([ record["lipinski_pass"], record["veber_pass"], record["pains_free"], record["brenk_free"] ]) else "FAIL" ) results.append(record) return pd.DataFrame(results) # Triage a compound library df = triage_compounds(smiles_list) print(f"Pass rate: {(df['verdict']=='PASS').sum()}/{len(df)}") print(df[["smiles", "mw", "logp", "verdict"]])
Custom Filter Rules
from rdkit import Chem from rdkit.Chem import Descriptors def custom_cns_filter(mol): """Filter for CNS drug-likeness (Lipinski + CNS-specific criteria).""" if mol is None: return False mw = Descriptors.MolWt(mol) logp = Descriptors.MolLogP(mol) hbd = Chem.rdMolDescriptors.CalcNumHBD(mol) psa = Descriptors.TPSA(mol) # CNS-specific thresholds (stricter than general Ro5) return ( mw <= 400 and logp >= 1.0 and logp <= 3.0 and hbd <= 2 and psa <= 90 ) # Apply custom filter for smi in smiles_list: mol = Chem.MolFromSmiles(smi) passes = custom_cns_filter(mol) print(f"{smi[:30]:30s} CNS filter: {'PASS' if passes else 'FAIL'}")
Configuration
| Parameter | Description | Default |
|---|---|---|
filter_set | Which rule set to apply | "lipinski" |
alert_collection | Structural alert database | "pains" |
strict_mode | Fail on any single violation | true |
max_violations | Allowed rule violations | 0 |
output_format | Results format | "dataframe" |
include_descriptors | Calculate molecular descriptors | true |
Best Practices
-
Apply filters in the correct order — Run PAINS and structural alert filters first to remove assay artifacts, then apply drug-likeness rules. Calculating expensive descriptors on PAINS compounds wastes compute time on molecules you'll reject anyway.
-
Allow one Lipinski violation for known drug space — Strict Ro5 with zero violations eliminates many approved drugs. Allow one violation (
max_violations=1) when evaluating natural-product-derived or macrocyclic compounds, which often break one rule while remaining bioavailable. -
Validate SMILES before filtering — Invalid SMILES strings cause
Chem.MolFromSmiles()to returnNone, which crashes downstream filters. Always check forNonemolecules and log invalid entries separately rather than silently skipping them. -
Combine filters with property distributions — Don't just count pass/fail. Plot distributions of MW, LogP, and PSA for your library to identify systematic biases. A library that technically passes Lipinski but clusters at MW 490 is riskier than one centered at 350.
-
Document which filters you applied — When reporting triage results, specify the exact filter sets, thresholds, and structural alert collections used. Different PAINS implementations flag different numbers of compounds, and reviewers need to assess your methodology.
Common Issues
RDKit molecule parsing returns None silently — SMILES with invalid valences, kekulization errors, or non-standard atoms produce None without raising exceptions. Always count and report the number of unparseable molecules. Use Chem.MolFromSmiles(smi, sanitize=False) followed by Chem.SanitizeMol() in a try/except to capture specific parsing errors.
PAINS filter flagging too many compounds — The original PAINS filter set includes some patterns that appear in legitimate drug scaffolds. Use the refined PAINS-A set (most problematic patterns) rather than the full PAINS-A/B/C set for initial screening. Always inspect flagged substructures visually before discarding entire compound series.
Conflicting filter results between tools — Different tools (RDKit, DataWarrior, medchem) implement slightly different threshold values or descriptor calculations. For example, LogP values vary between Wildman-Crippen (RDKit default) and XLogP3. Standardize on one descriptor calculation method and document it clearly.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.