PDB Reference
Free reference guide: PDB Reference
About PDB Reference
The PDB Reference is a searchable quick-reference guide for the Protein Data Bank (PDB), the global repository of experimentally determined 3D structures of biological macromolecules. It covers PDB basics including the 4-character alphanumeric ID system, the legacy PDB flat file format with detailed ATOM and HETATM record field layouts (serial number, atom name, residue, chain, coordinates, occupancy, B-factor), the next-generation mmCIF/PDBx format, resolution interpretation scales from atomic level (below 1 Angstrom) to low resolution (above 4 Angstrom), B-factor meaning and ranges, experimental methods (X-ray crystallography, NMR, cryo-EM), and biological assembly versus asymmetric unit concepts.
The reference provides comprehensive guidance for searching and downloading structures from the RCSB PDB, including text, sequence (BLAST), structural similarity, and sequence motif searches, REST API endpoints for programmatic access, batch download via rsync, and UniProt cross-referencing through SIFTS mapping. Visualization tool entries cover the three major molecular graphics programs: PyMOL (fetch, select, show, color, distance measurement, ray tracing commands), UCSF ChimeraX (style, color by B-factor, surface, matchmaker alignment), and VMD (mol load, modstyle, TCL scripting for atom selection).
For structure analysis and validation, the reference covers secondary structure elements (alpha helix with 3.6 residues per turn and i to i+4 hydrogen bonds, beta sheets, 310 helix, turns) with DSSP assignment codes, Ramachandran plot phi/psi angle analysis with allowed regions (alpha helix at phi=-60/psi=-45, beta sheet at phi=-135/psi=135), R-factor and R-free validation metrics, contact maps for residue distance visualization, RMSD calculation and interpretation for structural comparison, and structure prediction resources including AlphaFold DB with pLDDT confidence scores, Swiss-Model homology modeling, and AutoDock Vina molecular docking with binding energy assessment.
Key Features
- PDB file format with detailed ATOM/HETATM record field layout: serial number, atom name, residue, chain ID, coordinates, occupancy, B-factor
- Resolution interpretation scale from ultra-high (<1 Angstrom) to very low (>4 Angstrom) and B-factor ranges for ordered/flexible regions
- RCSB PDB search methods (text, BLAST, structural similarity, motif), REST API, batch download, and UniProt SIFTS cross-referencing
- PyMOL commands: fetch, select, show/hide, color, distance measurement, ray tracing, and image export
- ChimeraX and VMD visualization: cartoon styles, B-factor coloring, surface rendering, matchmaker alignment, and TCL scripting
- Ramachandran plot validation with phi/psi allowed regions and R-factor/R-free quality metrics with overfitting detection
- DSSP secondary structure codes, contact maps, RMSD structural comparison with interpretation thresholds (<1 to >3 Angstrom)
- AlphaFold DB with pLDDT confidence scoring, Swiss-Model homology modeling workflow, and AutoDock Vina docking syntax
Frequently Asked Questions
What information is in a PDB ATOM record?
Each ATOM record contains: columns 1-6 (record type), 7-11 (atom serial number), 13-16 (atom name like CA, N, C), 17 (alternate location), 18-20 (residue name like ALA, GLY), 22 (chain ID), 23-26 (residue sequence number), 31-38/39-46/47-54 (X/Y/Z coordinates in Angstroms), 55-60 (occupancy, usually 1.00), and 61-66 (B-factor/temperature factor). HETATM records use the same format for non-protein molecules like ligands, metals, and water.
How do I interpret resolution and B-factor values?
Resolution indicates structural detail: below 1 Angstrom is atomic-level (ultra-high), 1-2 is high, 2-3 is medium, 3-4 is low, and above 4 is very low. Cryo-EM structures average 3-4 Angstroms. B-factor reflects atomic disorder: below 20 is well-ordered, 20-40 is typical, 40-60 is flexible, and above 60 is very flexible or disordered. Loops generally have higher B-factors than helices, which are higher than sheets.
How do I search for structures on RCSB PDB?
RCSB PDB at rcsb.org supports multiple search methods: text search (e.g., "human hemoglobin"), sequence search via BLAST, structural similarity search, and sequence motif search using patterns like C-x(2,4)-C-x(3)-[LIVMFYWC]. The Advanced Search Builder allows combining multiple criteria. Programmatic access is available through the REST API at data.rcsb.org and search.rcsb.org.
What PyMOL commands are included in this reference?
The reference covers essential PyMOL commands: fetch (load PDB by ID), select (atom selection syntax), show/hide (cartoon, lines, sticks, surface), color (by chain, element, B-factor), distance (measure atom-to-atom distances), ray (render high-quality images), and png (save images). Example: "fetch 1HHO; select chain_A, chain A; show cartoon, chain_A; color red, chain_A; ray 2400, 1800; png output.png".
What does the Ramachandran plot show?
The Ramachandran plot maps backbone dihedral angles phi (C-N-CA-C) and psi (N-CA-C-N), both ranging from -180 to 180 degrees. Allowed regions correspond to secondary structures: alpha helix at phi=-60, psi=-45; beta sheet at phi=-135, psi=135; left-handed helix at phi=60, psi=45. A well-refined structure should have over 98% of residues in allowed regions. Outliers may indicate model errors.
How do I assess structure quality using R-factor?
R-factor measures agreement between the crystallographic model and experimental data: R = sum|Fobs - Fcalc| / sum|Fobs|. R-factor is calculated over all reflections, while R-free uses 5% of reflections withheld from refinement. Good structures have R-factor 0.15-0.20 and R-free 0.20-0.25. If R-free minus R-factor exceeds 0.05, overfitting of the model to the data is suspected.
How do I interpret RMSD values when comparing structures?
RMSD (Root Mean Square Deviation) quantifies structural similarity by comparing equivalent atom positions (typically CA atoms). RMSD below 1 Angstrom indicates very similar (nearly identical) structures, 1-2 Angstrom means similar, 2-3 Angstrom indicates the same fold, and above 3 Angstrom suggests different structures. In PyMOL use "align obj1, obj2" and in ChimeraX use "matchmaker #1 to #2" to compute RMSD.
What structure prediction tools are covered?
The reference covers three prediction approaches: AlphaFold DB (alphafold.ebi.ac.uk) with pLDDT confidence interpretation (>90 very high, 70-90 high, 50-70 low, <50 very low/possibly disordered), Swiss-Model homology modeling (template search, alignment, model building, QMEAN quality score), and AutoDock Vina molecular docking (receptor/ligand PDBQT input, grid box definition, binding energy in kcal/mol where more negative means stronger binding).