Rdkit Elite
All-in-one skill covering cheminformatics, toolkit, fine, grained. Includes structured workflows, validation checks, and reusable patterns for scientific.
RDKit Elite
Perform cheminformatics analysis using RDKit, the industry-standard open-source toolkit for molecular informatics. This skill covers molecule handling, fingerprint generation, substructure search, descriptor calculation, chemical reactions, and building molecular analysis pipelines.
When to Use This Skill
Choose RDKit Elite when you need to:
- Parse, validate, and manipulate chemical structures (SMILES, SDF, MOL)
- Calculate molecular descriptors and fingerprints for ML models
- Perform substructure and similarity searching across compound libraries
- Enumerate chemical reactions and R-group decomposition
Consider alternatives when:
- You need a unified molecular featurization API (use Molfeat)
- You need drug-likeness filtering with pre-built rules (use medchem)
- You need 3D molecular docking (use AutoDock Vina)
Quick Start
pip install rdkit-pypi
from rdkit import Chem from rdkit.Chem import Descriptors, Draw, AllChem # Parse and analyze a molecule mol = Chem.MolFromSmiles("CC(=O)Oc1ccccc1C(=O)O") # Aspirin print(f"Formula: {Chem.rdMolDescriptors.CalcMolFormula(mol)}") print(f"MW: {Descriptors.MolWt(mol):.2f}") print(f"LogP: {Descriptors.MolLogP(mol):.2f}") print(f"HBD: {Descriptors.NumHDonors(mol)}") print(f"HBA: {Descriptors.NumHAcceptors(mol)}") print(f"TPSA: {Descriptors.TPSA(mol):.2f}") print(f"Rotatable bonds: {Descriptors.NumRotatableBonds(mol)}")
Core Concepts
Molecule Operations
| Operation | Function | Description |
|---|---|---|
| Parse SMILES | Chem.MolFromSmiles(smi) | Create mol from SMILES |
| Parse file | Chem.MolFromMolFile(path) | Read MOL/SDF file |
| Canonical SMILES | Chem.MolToSmiles(mol) | Canonical string representation |
| Add Hs | Chem.AddHs(mol) | Add explicit hydrogens |
| 3D coordinates | AllChem.EmbedMolecule(mol) | Generate 3D conformation |
| Sanitize | Chem.SanitizeMol(mol) | Validate and clean structure |
Similarity Search and Fingerprints
from rdkit import Chem, DataStructs from rdkit.Chem import AllChem import numpy as np def similarity_search(query_smi, database_smiles, top_n=10): """Find most similar molecules using Morgan fingerprints.""" query_mol = Chem.MolFromSmiles(query_smi) query_fp = AllChem.GetMorganFingerprintAsBitVect( query_mol, radius=2, nBits=2048 ) results = [] for smi in database_smiles: mol = Chem.MolFromSmiles(smi) if mol is None: continue fp = AllChem.GetMorganFingerprintAsBitVect( mol, radius=2, nBits=2048 ) similarity = DataStructs.TanimotoSimilarity(query_fp, fp) results.append((smi, similarity)) results.sort(key=lambda x: -x[1]) return results[:top_n] # Search query = "c1ccc(NC(=O)c2ccccc2)cc1" # Benzanilide database = [ "c1ccc(NC(=O)c2ccc(Cl)cc2)cc1", "CC(=O)Oc1ccccc1C(=O)O", "c1ccc(NC(=O)c2ccc(F)cc2)cc1", "c1ccccc1", "c1ccc(NC(=O)c2ccncc2)cc1" ] results = similarity_search(query, database) for smi, sim in results: print(f" {sim:.3f} | {smi}")
Substructure Analysis
from rdkit import Chem from rdkit.Chem import rdFMCS def find_common_substructure(smiles_list): """Find maximum common substructure across molecules.""" mols = [Chem.MolFromSmiles(smi) for smi in smiles_list] mols = [m for m in mols if m is not None] mcs = rdFMCS.FindMCS( mols, threshold=0.8, ringMatchesRingOnly=True, timeout=60 ) print(f"MCS SMARTS: {mcs.smartsString}") print(f"MCS atoms: {mcs.numAtoms}") print(f"MCS bonds: {mcs.numBonds}") # Check which molecules contain the MCS pattern = Chem.MolFromSmarts(mcs.smartsString) for smi in smiles_list: mol = Chem.MolFromSmiles(smi) if mol and mol.HasSubstructMatch(pattern): print(f" ✓ {smi}") return mcs smiles = [ "c1ccc(NC(=O)c2ccccc2)cc1", "c1ccc(NC(=O)c2ccc(Cl)cc2)cc1", "c1ccc(NC(=O)c2ccc(F)cc2)cc1" ] find_common_substructure(smiles)
Configuration
| Parameter | Description | Default |
|---|---|---|
sanitize | Validate molecular structure on creation | true |
radius | Morgan fingerprint radius | 2 |
nBits | Fingerprint bit vector length | 2048 |
useChirality | Include stereochemistry in fingerprints | false |
force_field | 3D optimization force field | "MMFF94" |
max_attempts | 3D embedding retries | 10 |
Best Practices
-
Always check for None after parsing —
Chem.MolFromSmiles()returnsNonefor invalid SMILES silently. Always checkif mol is None: continuebefore proceeding. Log failed molecules for investigation rather than silently skipping them. -
Use canonical SMILES for deduplication — Different SMILES strings can represent the same molecule. Convert all SMILES to canonical form with
Chem.MolToSmiles(mol)before comparing or storing. This ensures consistent representation across your dataset. -
Choose the right fingerprint for your task — Morgan (ECFP) fingerprints capture circular substructures and work well for activity prediction. MACCS keys are better for substructure-based similarity. Topological fingerprints capture path-based features. Benchmark different types on your specific task.
-
Sanitize molecules explicitly for custom workflows — When modifying molecules programmatically (adding atoms, changing bonds), call
Chem.SanitizeMol(mol)afterward to validate the structure. Unsanitized molecules can have invalid valences that produce incorrect descriptors. -
Use
Chem.MolFromSmiles(smi, sanitize=False)for debugging — When a SMILES string fails to parse, disable sanitization to isolate whether the error is in the SMILES syntax or in chemical validation. Then callChem.SanitizeMol(mol)separately to get a specific error message.
Common Issues
Morgan fingerprints give different results than ECFP — RDKit's Morgan fingerprints use a different hashing algorithm than Pipeline Pilot's ECFP, so bit positions differ even with the same radius. The fingerprints are functionally equivalent for similarity ranking but not bit-for-bit identical. Don't compare fingerprints generated by different tools.
3D conformer generation fails for some molecules — AllChem.EmbedMolecule() returns -1 when it can't find a valid 3D conformation. Try AllChem.EmbedMolecule(mol, maxAttempts=100, useRandomCoords=True) for difficult cases. Some molecules (strained rings, unusual valences) may require force field relaxation with AllChem.MMFFOptimizeMolecule(mol).
Descriptor calculation returns NaN or infinity — Some descriptors are undefined for certain molecule types (e.g., LogP for salts, TPSA for molecules without polar atoms). Check for NaN values with np.isnan() and handle them before feeding descriptors into ML models. Use Descriptors.MolWt(mol) as a sanity check — if it's NaN, the molecule is malformed.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.