RDKit Reference

Q: What is RDKit and what is it used for?

RDKit is an open-source cheminformatics library for Python (and C++) used for molecular manipulation, descriptor calculation, fingerprint generation, similarity searching, and visualization. It is the de facto standard toolkit in computational chemistry, drug discovery, and chemical data science for tasks like virtual screening, QSAR modeling, chemical library analysis, and lead optimization.

Q: How do I create molecules from SMILES in RDKit?

Use Chem.MolFromSmiles("c1ccccc1") to create a molecule object from a SMILES string (e.g., benzene). For canonical SMILES output, use Chem.MolToSmiles(mol). For SMARTS-based query patterns, use Chem.MolFromSmarts("[OH]"). To read from files, use Chem.MolFromMolFile() for single molecules or SDMolSupplier() for multi-molecule SDF files.

Q: What molecular fingerprints are available in RDKit?

RDKit provides Morgan fingerprints (equivalent to ECFP: radius=2 for ECFP4, radius=3 for ECFP6) via AllChem.GetMorganFingerprintAsBitVect(), RDKit topological fingerprints via Chem.RDKFingerprint(), and MACCS 166-bit structural keys via MACCSkeys.GenMACCSKeys(). Morgan fingerprints are the most widely used for similarity searching and machine learning in drug discovery.

Q: How do I calculate molecular similarity in RDKit?

Generate fingerprints for both molecules (typically Morgan with radius=2 and nBits=2048), then use DataStructs.TanimotoSimilarity(fp1, fp2) to compute the Tanimoto coefficient, which ranges from 0.0 (completely dissimilar) to 1.0 (identical). Tanimoto similarity is the most common metric for comparing chemical structures in virtual screening and clustering workflows.

Q: What molecular descriptors can RDKit calculate?

RDKit calculates over 200 molecular descriptors. Key ones include Descriptors.MolWt() for molecular weight, Descriptors.MolLogP() for Wildman-Crippen LogP, Descriptors.TPSA() for topological polar surface area, Descriptors.NumHAcceptors() and NumHDonors() for hydrogen bond counts, and rdMolDescriptors.CalcMolFormula() for molecular formula. These are essential for drug-likeness assessment and QSAR modeling.

Q: How does the Lipinski Rule of Five work in RDKit?

The Lipinski Rule of Five predicts oral bioavailability based on four criteria: molecular weight <= 500, LogP <= 5, hydrogen bond acceptors <= 10, and hydrogen bond donors <= 5. In RDKit, compute each using Descriptors.MolWt(), MolLogP(), NumHAcceptors(), and NumHDonors(). A compound violating two or more rules is less likely to be orally bioavailable.

Q: How do I generate 3D structures in RDKit?

First add hydrogens with Chem.AddHs(mol), then use AllChem.EmbedMolecule(mol, randomSeed=42) to generate initial 3D coordinates. Optimize with AllChem.MMFFOptimizeMolecule(mol) using the MMFF94 force field (or UFF as alternative). For conformational analysis, use AllChem.EmbedMultipleConfs() to generate multiple conformers and optimize each individually.

Q: Is this RDKit reference free to use?

Yes, this RDKit reference is completely free with no usage limits, no account required, and no software installation needed. All content is rendered client-side in your browser. It is designed as a practical daily companion for computational chemists and cheminformatics researchers working with the RDKit Python library.

Free reference guide: RDKit Reference

25 results

About RDKit Reference

The RDKit Reference is a comprehensive, searchable cheat sheet for the RDKit cheminformatics library in Python. It covers molecule creation from SMILES and SMARTS strings, reading MOL/SDF files, substructure matching with GetSubstructMatches() and HasSubstructMatch(), molecular fingerprint generation (Morgan/ECFP, RDKit topological, MACCS keys), Tanimoto similarity calculation, and molecular descriptor computation including molecular weight, LogP, TPSA, and hydrogen bond donor/acceptor counts.

This reference also covers 3D structure generation with EmbedMolecule() and EmbedMultipleConfs(), force field optimization using MMFF94 and UFF, the Lipinski Rule of Five drug-likeness filter, molecule visualization with Draw.MolToImage() and MolsToGridImage(), PandasTools integration for DataFrame-based workflows, and functional group counting with Chem.Fragments. Each entry includes copy-ready Python code with expected output.

Designed for computational chemists, medicinal chemists, drug discovery scientists, cheminformatics researchers, and bioinformatics students, this reference is organized into SMILES/Molecules, Substructure, Fingerprints, Descriptors, 3D Structure, and Visualization categories. All content runs entirely in your browser with no server processing.

Key Features

SMILES/SMARTS molecule parsing — MolFromSmiles(), MolFromSmarts(), MolToSmiles() canonical conversion, and SDF file reading
Substructure search — GetSubstructMatches(), HasSubstructMatch(), and SMARTS pattern syntax for functional group queries
Molecular fingerprints — Morgan/ECFP4/ECFP6 (radius=2/3), RDKit topological, and MACCS 166-bit structural keys
Tanimoto similarity calculation between fingerprints with DataStructs.TanimotoSimilarity()
Molecular descriptors — MolWt, ExactMolWt, MolLogP, TPSA, NumHAcceptors, NumHDonors, and molecular formula
Lipinski Rule of Five implementation for drug-likeness filtering (MW<=500, LogP<=5, HBA<=10, HBD<=5)
3D conformer generation with EmbedMolecule/EmbedMultipleConfs and MMFF94/UFF force field optimization
Molecule visualization — MolToImage(), MolsToGridImage(), and PandasTools DataFrame integration

Frequently Asked Questions

What is RDKit and what is it used for?

RDKit is an open-source cheminformatics library for Python (and C++) used for molecular manipulation, descriptor calculation, fingerprint generation, similarity searching, and visualization. It is the de facto standard toolkit in computational chemistry, drug discovery, and chemical data science for tasks like virtual screening, QSAR modeling, chemical library analysis, and lead optimization.

How do I create molecules from SMILES in RDKit?

Use Chem.MolFromSmiles("c1ccccc1") to create a molecule object from a SMILES string (e.g., benzene). For canonical SMILES output, use Chem.MolToSmiles(mol). For SMARTS-based query patterns, use Chem.MolFromSmarts("[OH]"). To read from files, use Chem.MolFromMolFile() for single molecules or SDMolSupplier() for multi-molecule SDF files.

What molecular fingerprints are available in RDKit?

RDKit provides Morgan fingerprints (equivalent to ECFP: radius=2 for ECFP4, radius=3 for ECFP6) via AllChem.GetMorganFingerprintAsBitVect(), RDKit topological fingerprints via Chem.RDKFingerprint(), and MACCS 166-bit structural keys via MACCSkeys.GenMACCSKeys(). Morgan fingerprints are the most widely used for similarity searching and machine learning in drug discovery.

How do I calculate molecular similarity in RDKit?

Generate fingerprints for both molecules (typically Morgan with radius=2 and nBits=2048), then use DataStructs.TanimotoSimilarity(fp1, fp2) to compute the Tanimoto coefficient, which ranges from 0.0 (completely dissimilar) to 1.0 (identical). Tanimoto similarity is the most common metric for comparing chemical structures in virtual screening and clustering workflows.

What molecular descriptors can RDKit calculate?

RDKit calculates over 200 molecular descriptors. Key ones include Descriptors.MolWt() for molecular weight, Descriptors.MolLogP() for Wildman-Crippen LogP, Descriptors.TPSA() for topological polar surface area, Descriptors.NumHAcceptors() and NumHDonors() for hydrogen bond counts, and rdMolDescriptors.CalcMolFormula() for molecular formula. These are essential for drug-likeness assessment and QSAR modeling.

How does the Lipinski Rule of Five work in RDKit?

The Lipinski Rule of Five predicts oral bioavailability based on four criteria: molecular weight <= 500, LogP <= 5, hydrogen bond acceptors <= 10, and hydrogen bond donors <= 5. In RDKit, compute each using Descriptors.MolWt(), MolLogP(), NumHAcceptors(), and NumHDonors(). A compound violating two or more rules is less likely to be orally bioavailable.

How do I generate 3D structures in RDKit?

First add hydrogens with Chem.AddHs(mol), then use AllChem.EmbedMolecule(mol, randomSeed=42) to generate initial 3D coordinates. Optimize with AllChem.MMFFOptimizeMolecule(mol) using the MMFF94 force field (or UFF as alternative). For conformational analysis, use AllChem.EmbedMultipleConfs() to generate multiple conformers and optimize each individually.

Is this RDKit reference free to use?

Yes, this RDKit reference is completely free with no usage limits, no account required, and no software installation needed. All content is rendered client-side in your browser. It is designed as a practical daily companion for computational chemists and cheminformatics researchers working with the RDKit Python library.