Zinc Database Smart
Production-ready skill that handles access, zinc, purchasable, compounds. Includes structured workflows, validation checks, and reusable patterns for scientific.
Zinc Database Smart
Search and retrieve chemical compounds from the ZINC database, a freely accessible repository of 230M+ commercially available molecules for virtual screening and drug discovery. This skill covers ZINC API queries, SMILES-based search, substructure and similarity filtering, 3D conformer retrieval, and integration with docking workflows.
When to Use This Skill
Choose Zinc Database Smart when you need to:
- Search for purchasable compounds by structure, similarity, or molecular properties
- Download 3D-ready conformers for molecular docking campaigns
- Filter compounds by drug-likeness (Lipinski rules), reactivity, or availability
- Build compound libraries for high-throughput virtual screening
Consider alternatives when:
- You need bioactivity data for known compounds (use PubChem or ChEMBL)
- You need protein-ligand interaction prediction (use docking software directly)
- You need compound synthesis routes (use retrosynthesis tools)
Quick Start
pip install requests rdkit pandas
import requests import pandas as pd ZINC_API = "https://zinc15.docking.org" def search_zinc_by_smiles(smiles, similarity=0.7, max_results=20): """Search ZINC for compounds similar to a query molecule.""" params = { "smiles": smiles, "similarity": similarity, "count": max_results, "output_format": "json", } response = requests.get(f"{ZINC_API}/substances/search/", params=params) response.raise_for_status() return response.json() def get_compound(zinc_id): """Retrieve detailed compound information by ZINC ID.""" response = requests.get(f"{ZINC_API}/substances/{zinc_id}.json") response.raise_for_status() return response.json() # Search for compounds similar to aspirin aspirin_smiles = "CC(=O)OC1=CC=CC=C1C(=O)O" print(f"Searching for compounds similar to aspirin...") # results = search_zinc_by_smiles(aspirin_smiles, similarity=0.6) # Drug-likeness filter (Lipinski's Rule of Five) from rdkit import Chem from rdkit.Chem import Descriptors def check_lipinski(smiles): mol = Chem.MolFromSmiles(smiles) if mol is None: return None return { "MW": Descriptors.MolWt(mol), "LogP": Descriptors.MolLogP(mol), "HBD": Descriptors.NumHDonors(mol), "HBA": Descriptors.NumHAcceptors(mol), "passes": (Descriptors.MolWt(mol) <= 500 and Descriptors.MolLogP(mol) <= 5 and Descriptors.NumHDonors(mol) <= 5 and Descriptors.NumHAcceptors(mol) <= 10), } props = check_lipinski(aspirin_smiles) print(f"Aspirin: MW={props['MW']:.1f}, LogP={props['LogP']:.2f}, " f"HBD={props['HBD']}, HBA={props['HBA']}, Lipinski={props['passes']}")
Core Concepts
ZINC Subsets and Tranches
| Subset | Description | Size |
|---|---|---|
| ZINC-All | Complete database | ~230M compounds |
| ZINC-In-Stock | Immediately purchasable | ~10M |
| ZINC-Drug-Like | Lipinski-compliant | ~20M |
| ZINC-Lead-Like | 250 < MW < 350, LogP < 3.5 | ~6M |
| ZINC-Fragment-Like | MW < 250, LogP < 2.5 | ~1M |
| ZINC-Goldilocks | 200 < MW < 500, -2 < LogP < 5 | ~15M |
Virtual Screening Library Builder
from rdkit import Chem from rdkit.Chem import Descriptors, AllChem, DataStructs from rdkit.Chem import FilterCatalog import numpy as np class ScreeningLibrary: """Build and filter compound libraries for virtual screening.""" def __init__(self): self.compounds = [] def add_smiles(self, smiles_list, ids=None): for i, smi in enumerate(smiles_list): mol = Chem.MolFromSmiles(smi) if mol is None: continue compound = { 'id': ids[i] if ids else f'CPD-{i}', 'smiles': Chem.MolToSmiles(mol), # Canonical 'mol': mol, 'mw': Descriptors.MolWt(mol), 'logp': Descriptors.MolLogP(mol), 'hbd': Descriptors.NumHDonors(mol), 'hba': Descriptors.NumHAcceptors(mol), 'tpsa': Descriptors.TPSA(mol), 'rotatable_bonds': Descriptors.NumRotatableBonds(mol), } self.compounds.append(compound) def filter_druglike(self, rule="lipinski"): """Filter by drug-likeness rules.""" if rule == "lipinski": return [c for c in self.compounds if ( c['mw'] <= 500 and c['logp'] <= 5 and c['hbd'] <= 5 and c['hba'] <= 10 )] elif rule == "veber": return [c for c in self.compounds if ( c['tpsa'] <= 140 and c['rotatable_bonds'] <= 10 )] def deduplicate(self, threshold=0.95): """Remove near-duplicate compounds by Tanimoto similarity.""" fps = [AllChem.GetMorganFingerprintAsBitVect(c['mol'], 2, nBits=2048) for c in self.compounds] keep = [True] * len(self.compounds) for i in range(len(fps)): if not keep[i]: continue for j in range(i+1, len(fps)): if not keep[j]: continue sim = DataStructs.TanimotoSimilarity(fps[i], fps[j]) if sim >= threshold: keep[j] = False return [c for c, k in zip(self.compounds, keep) if k] def diversity_select(self, n_select, fps=None): """Select diverse subset using MaxMin algorithm.""" if fps is None: fps = [AllChem.GetMorganFingerprintAsBitVect(c['mol'], 2, nBits=2048) for c in self.compounds] selected = [0] remaining = set(range(1, len(fps))) while len(selected) < n_select and remaining: max_min_dist = -1 best_idx = None for idx in remaining: min_dist = min( 1 - DataStructs.TanimotoSimilarity(fps[idx], fps[s]) for s in selected ) if min_dist > max_min_dist: max_min_dist = min_dist best_idx = idx selected.append(best_idx) remaining.discard(best_idx) return [self.compounds[i] for i in selected] # Usage lib = ScreeningLibrary() lib.add_smiles([ "CC(=O)OC1=CC=CC=C1C(=O)O", # Aspirin "CC1=CC=C(C=C1)C(C)C(=O)O", # Ibuprofen "OC(=O)C1=CC=CC=C1O", # Salicylic acid "CC(=O)NC1=CC=C(C=C1)O", # Acetaminophen ]) druglike = lib.filter_druglike("lipinski") print(f"Drug-like compounds: {len(druglike)}/{len(lib.compounds)}")
Configuration
| Parameter | Description | Default |
|---|---|---|
api_url | ZINC API base URL | "https://zinc15.docking.org" |
similarity_threshold | Minimum Tanimoto similarity for search | 0.7 |
output_format | Response format (json, smi, sdf, mol2) | "json" |
subset | ZINC subset to search | "drug-like" |
max_results | Maximum results per query | 100 |
purchasability | Filter by availability (in-stock, make-on-demand) | "all" |
mw_range | Molecular weight filter | [150, 500] |
logp_range | LogP filter | [-2, 5] |
Best Practices
-
Start with lead-like or fragment-like subsets for early discovery — The full 230M compound database is too large for exhaustive screening. Start with ZINC-Lead-Like (6M, optimized for hit-to-lead optimization) or ZINC-Fragment-Like (1M, for fragment-based drug design) for focused campaigns.
-
Deduplicate before virtual screening — ZINC contains many near-identical compounds (stereoisomers, salt forms). Remove compounds with Tanimoto similarity > 0.95 to avoid wasting computational resources docking essentially the same molecule multiple times.
-
Use Morgan fingerprints (radius=2) for similarity searching — Morgan fingerprints (ECFP4 equivalent) capture local chemical environments and are the standard for similarity-based virtual screening. Radius 2 captures up to 4-bond-diameter substructures, balancing specificity and generalization.
-
Verify purchasability before ordering hits — Computational hits are worthless if compounds can't be obtained. Filter by "in-stock" status for immediate availability or "make-on-demand" for 4-8 week lead times. Check vendor catalogs directly as ZINC availability data may be outdated.
-
Apply diversity selection for screening libraries — Rather than screening all similar compounds, use MaxMin or other diversity selection algorithms to pick a maximally diverse subset. A diverse library of 10K compounds often outperforms a redundant library of 100K in hit discovery.
Common Issues
ZINC API returns HTTP 503 during peak hours — ZINC is an academic resource with limited server capacity. Implement retry logic with exponential backoff and avoid bulk downloads during US business hours. For large-scale downloads, use the ZINC tranches file system directly.
SMILES strings don't match between ZINC and other databases — Different databases use different SMILES canonicalization. Always convert to canonical SMILES with RDKit before comparison: Chem.MolToSmiles(Chem.MolFromSmiles(smiles)). This normalizes tautomers, stereochemistry, and atom ordering.
3D conformers have unrealistic geometry — ZINC provides pre-generated 3D conformers that may not be the lowest-energy conformation. Re-optimize geometries with RDKit: AllChem.EmbedMolecule(mol) followed by AllChem.MMFFOptimizeMolecule(mol) before docking.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.