AlphaFold DB Reference
Free reference guide: AlphaFold DB Reference
About AlphaFold DB Reference
The AlphaFold DB Reference is a searchable guide to the AlphaFold Protein Structure Database, DeepMind's revolutionary resource containing over 200 million predicted protein structures. This reference covers everything a structural biologist, computational chemist, or bioinformatics researcher needs to effectively use AlphaFold predictions, from understanding confidence metrics (pLDDT, PAE, pTM, ipTM) to accessing structures via REST API, downloading proteome-scale datasets, and visualizing results in molecular graphics software.
The reference is organized into six categories: DB Overview (database access, identifier format AF-{UniProt}-F1, experimental comparison methods, UniProt integration, use cases), Confidence Metrics (per-residue pLDDT scores with color coding, Predicted Aligned Error matrices, predicted TM-score, interface pTM for multimers, intrinsically disordered region identification), File Formats (PDB with pLDDT in B-factor column, mmCIF with quality metrics, PAE JSON matrices), API (REST endpoints for structure queries, UniProt accession lookup, FTP bulk proteome downloads), Run/Install (Google Colab, local AlphaFold2 installation, ColabFold with MMseqs2, AlphaFold-Multimer for complexes, AlphaFold3 for nucleic acids and ligands), and Visualization (PyMOL pLDDT coloring, ChimeraX rendering, PAE heatmap plotting, domain boundary analysis).
Each entry provides the exact URL, command, or code snippet needed to accomplish the task. Whether you are looking up a predicted structure by UniProt accession, interpreting a PAE matrix to determine domain-domain orientation confidence, running ColabFold on Google Colab for a quick prediction, setting up a local AlphaFold2 instance on GPU hardware, or coloring a structure by pLDDT in PyMOL, this reference provides the precise syntax and interpretation guidelines without navigating multiple documentation sources.
Key Features
- Confidence metric interpretation guide for pLDDT (per-residue, 0-100 scale with blue/cyan/yellow/orange color coding), PAE (residue-pair position error in Angstroms), pTM (overall topology), and ipTM (interface confidence for multimers)
- File format specifications for PDB (pLDDT stored in B-factor column), mmCIF (quality metric annotations), and PAE JSON (predicted aligned error matrix with max error values)
- REST API endpoint reference for querying structure metadata by UniProt accession, downloading PDB/CIF/PAE files via direct URL patterns, and FTP bulk download of entire proteomes
- ColabFold and AlphaFold2 setup guides for Google Colab (MMseqs2-based MSA, ~10 min/protein), local installation (NVIDIA GPU, ~3TB database), and AlphaFold-Multimer complex prediction with ranking confidence formula
- AlphaFold3 capabilities reference covering protein-protein, protein-DNA/RNA, protein-ligand complexes, ions, and modified residues via the AlphaFold Server
- Molecular visualization workflows for PyMOL (spectrum b coloring), ChimeraX (bfactor palette alphafold), and Python matplotlib PAE heatmap generation with color maps
- Domain boundary analysis using PAE matrix block patterns to identify well-defined domains (low intra-block PAE) vs uncertain inter-domain orientations (high inter-block PAE)
- Intrinsically disordered region (IDR) identification through low pLDDT scores (below 50), extended conformations without secondary structure, and high PAE values as structural biology indicators rather than prediction failures
Frequently Asked Questions
What does the pLDDT score mean and how should I interpret the color coding?
pLDDT (predicted Local Distance Difference Test) is a per-residue confidence score from 0 to 100. Scores above 90 (blue) indicate very high confidence where the backbone and side chains are reliable. Scores 70-90 (cyan) mean the backbone is well-predicted. Scores 50-70 (yellow) suggest low confidence where only the fold topology may be correct. Scores below 50 (orange) often indicate intrinsically disordered regions that genuinely lack stable structure, rather than prediction errors.
How do I read a PAE (Predicted Aligned Error) matrix?
The PAE matrix is an NxN grid where entry (i,j) represents the expected positional error (in Angstroms) of residue j when the structure is aligned on residue i. Low values (below 5 Angstroms) between two residues mean their relative position is predicted with high confidence. Look for dark blue diagonal blocks indicating well-defined domains, and light/red off-diagonal regions indicating uncertain inter-domain orientations. PAE is critical for assessing whether domain-domain arrangements are meaningful.
What is the difference between pTM and ipTM scores?
pTM (predicted TM-score) estimates the overall topology accuracy of a single-chain prediction, where values above 0.8 indicate high confidence in the global fold. ipTM (interface predicted TM-score) is specific to AlphaFold-Multimer and assesses confidence in the protein-protein interface. The ranking confidence for multimer predictions is calculated as 0.8*ipTM + 0.2*pTM. Both metrics help determine whether the predicted structure is trustworthy enough for downstream applications like drug design or interaction analysis.
How do I download an AlphaFold structure using the REST API?
Query the API at https://alphafold.ebi.ac.uk/api/prediction/{UniProt_accession} (e.g., /api/prediction/P04637 for p53). The JSON response includes direct download URLs for the PDB file, mmCIF file, and PAE JSON. You can also construct file URLs directly: https://alphafold.ebi.ac.uk/files/AF-P04637-F1-model_v4.pdb for PDB, replacing .pdb with .cif for mmCIF or -predicted_aligned_error_v4.json for PAE data.
How can I run AlphaFold quickly without local GPU hardware?
Use ColabFold on Google Colab, which combines MMseqs2-based MSA generation (no large database download needed) with AlphaFold2 prediction. Install with pip install colabfold[alphafold] and run colabfold_batch input.fasta output_dir/. A typical single-chain protein takes about 10 minutes on a Colab GPU. For the simplest approach, use the ColabFold notebook directly at github.com/sokrypton/ColabFold, which provides a GUI-like interface for sequence input and result visualization.
Why does the PDB file store pLDDT in the B-factor column?
AlphaFold repurposes the B-factor (temperature factor) column in PDB files to store pLDDT confidence scores, since predicted structures do not have experimental B-factors. The value 72.50 in the last numeric column of an ATOM record represents pLDDT=72.50, not crystallographic thermal motion. This is important to remember when using analysis tools that interpret B-factors, as you need to treat these values as confidence scores. The mmCIF format stores pLDDT in dedicated _ma_qa_metric fields instead.
What can AlphaFold3 predict that AlphaFold2 cannot?
AlphaFold3 extends predictions beyond single proteins and protein complexes to include protein-DNA complexes, protein-RNA complexes, protein-ligand interactions, ions, and modified residues. It uses a diffusion-based architecture instead of the structure module used in AlphaFold2. Access is available through the AlphaFold Server (golgi.sandbox.google.com). This enables modeling of transcription factor-DNA binding, ribonucleoprotein complexes, and drug-target interactions that were not possible with AlphaFold2.
How do I visualize AlphaFold structures with pLDDT coloring in PyMOL?
Load the PDB file with "load AF-P04637-F1-model_v4.pdb", then apply pLDDT coloring with "spectrum b, blue_white_red, minimum=50, maximum=100". This maps high-confidence regions (pLDDT near 100) to blue and low-confidence regions to red. In ChimeraX, use "open alphafold:P04637" followed by "color bfactor palette alphafold" which applies the standard AlphaFold blue-cyan-yellow-orange palette. For PAE visualization, use Python with matplotlib: load the JSON, extract the predicted_aligned_error array, and display with plt.imshow using a blue-white-red colormap.