U

Ultimate Medchem

Streamline your workflow with this medicinal, chemistry, filters, apply. Includes structured workflows, validation checks, and reusable patterns for scientific.

SkillClipticsscientificv1.0.0MIT
0 views0 copies

Ultimate Medchem

Apply medicinal chemistry filters and structural alerts to prioritize compound libraries for drug discovery using the medchem Python library. This skill covers molecular filtering rules, PAINS detection, lead-likeness scoring, and automated triage of chemical libraries at scale.

When to Use This Skill

Choose Ultimate Medchem when you need to:

  • Filter compound libraries against established medicinal chemistry rules (Lipinski, Veber, PAINS)
  • Identify structural alerts and reactive functional groups in hit compounds
  • Score and rank molecules by drug-likeness for lead optimization
  • Build automated compound triage pipelines for high-throughput screening results

Consider alternatives when:

  • You need molecular docking or binding affinity prediction (use AutoDock or DiffDock)
  • You need ADMET property prediction with ML models (use DeepChem or ADMETlab)
  • You need de novo molecule generation (use generative chemistry tools)

Quick Start

# Install medchem with dependencies pip install medchem rdkit-pypi
import medchem from medchem.filter import RuleFilters from rdkit import Chem # Load molecules from SMILES smiles_list = [ "CC(=O)Oc1ccccc1C(=O)O", # Aspirin "CN1C=NC2=C1C(=O)N(C(=O)N2C)C", # Caffeine "CC12CCC3C(C1CCC2O)CCC4=CC(=O)CCC34C" # Testosterone ] mols = [Chem.MolFromSmiles(s) for s in smiles_list] # Apply Lipinski Rule of Five rf = RuleFilters() for smi, mol in zip(smiles_list, mols): passes = rf.filter_lipinski(mol) print(f"{smi[:30]:30s} Lipinski: {'PASS' if passes else 'FAIL'}")

Core Concepts

Available Filter Sets

FilterRules AppliedUse Case
Lipinski Ro5MW<500, LogP<5, HBD<5, HBA<10Oral bioavailability
VeberPSA<140, RotBonds<10Intestinal absorption
PAINSPan-assay interference patternsFalse positive removal
BrenkReactive/toxic substructuresSafety filtering
Lead-likenessMW 200-350, LogP -1 to 3, RotBonds<7Lead optimization starts
GhoseMW 160-480, LogP -0.4 to 5.6, atoms 20-70Drug-like range
REOSReactive groups, warheadsHTS triage

Comprehensive Compound Triage

import medchem from medchem.filter import RuleFilters, StructuralAlerts from rdkit import Chem from rdkit.Chem import Descriptors, rdMolDescriptors import pandas as pd def triage_compounds(smiles_list): """Apply comprehensive medchem filters to a compound list.""" rf = RuleFilters() sa = StructuralAlerts() results = [] for smi in smiles_list: mol = Chem.MolFromSmiles(smi) if mol is None: results.append({"smiles": smi, "status": "INVALID"}) continue record = { "smiles": smi, "mw": Descriptors.MolWt(mol), "logp": Descriptors.MolLogP(mol), "hbd": rdMolDescriptors.CalcNumHBD(mol), "hba": rdMolDescriptors.CalcNumHBA(mol), "psa": Descriptors.TPSA(mol), "rotatable_bonds": rdMolDescriptors.CalcNumRotatableBonds(mol), "lipinski_pass": rf.filter_lipinski(mol), "veber_pass": rf.filter_veber(mol), "pains_free": not sa.has_pains(mol), "brenk_free": not sa.has_brenk_alerts(mol), } # Overall verdict record["verdict"] = ( "PASS" if all([ record["lipinski_pass"], record["veber_pass"], record["pains_free"], record["brenk_free"] ]) else "FAIL" ) results.append(record) return pd.DataFrame(results) # Triage a compound library df = triage_compounds(smiles_list) print(f"Pass rate: {(df['verdict']=='PASS').sum()}/{len(df)}") print(df[["smiles", "mw", "logp", "verdict"]])

Custom Filter Rules

from rdkit import Chem from rdkit.Chem import Descriptors def custom_cns_filter(mol): """Filter for CNS drug-likeness (Lipinski + CNS-specific criteria).""" if mol is None: return False mw = Descriptors.MolWt(mol) logp = Descriptors.MolLogP(mol) hbd = Chem.rdMolDescriptors.CalcNumHBD(mol) psa = Descriptors.TPSA(mol) # CNS-specific thresholds (stricter than general Ro5) return ( mw <= 400 and logp >= 1.0 and logp <= 3.0 and hbd <= 2 and psa <= 90 ) # Apply custom filter for smi in smiles_list: mol = Chem.MolFromSmiles(smi) passes = custom_cns_filter(mol) print(f"{smi[:30]:30s} CNS filter: {'PASS' if passes else 'FAIL'}")

Configuration

ParameterDescriptionDefault
filter_setWhich rule set to apply"lipinski"
alert_collectionStructural alert database"pains"
strict_modeFail on any single violationtrue
max_violationsAllowed rule violations0
output_formatResults format"dataframe"
include_descriptorsCalculate molecular descriptorstrue

Best Practices

  1. Apply filters in the correct order — Run PAINS and structural alert filters first to remove assay artifacts, then apply drug-likeness rules. Calculating expensive descriptors on PAINS compounds wastes compute time on molecules you'll reject anyway.

  2. Allow one Lipinski violation for known drug space — Strict Ro5 with zero violations eliminates many approved drugs. Allow one violation (max_violations=1) when evaluating natural-product-derived or macrocyclic compounds, which often break one rule while remaining bioavailable.

  3. Validate SMILES before filtering — Invalid SMILES strings cause Chem.MolFromSmiles() to return None, which crashes downstream filters. Always check for None molecules and log invalid entries separately rather than silently skipping them.

  4. Combine filters with property distributions — Don't just count pass/fail. Plot distributions of MW, LogP, and PSA for your library to identify systematic biases. A library that technically passes Lipinski but clusters at MW 490 is riskier than one centered at 350.

  5. Document which filters you applied — When reporting triage results, specify the exact filter sets, thresholds, and structural alert collections used. Different PAINS implementations flag different numbers of compounds, and reviewers need to assess your methodology.

Common Issues

RDKit molecule parsing returns None silently — SMILES with invalid valences, kekulization errors, or non-standard atoms produce None without raising exceptions. Always count and report the number of unparseable molecules. Use Chem.MolFromSmiles(smi, sanitize=False) followed by Chem.SanitizeMol() in a try/except to capture specific parsing errors.

PAINS filter flagging too many compounds — The original PAINS filter set includes some patterns that appear in legitimate drug scaffolds. Use the refined PAINS-A set (most problematic patterns) rather than the full PAINS-A/B/C set for initial screening. Always inspect flagged substructures visually before discarding entire compound series.

Conflicting filter results between tools — Different tools (RDKit, DataWarrior, medchem) implement slightly different threshold values or descriptor calculations. For example, LogP values vary between Wildman-Crippen (RDKit default) and XLogP3. Standardize on one descriptor calculation method and document it clearly.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates