BLAST Reference
Free reference guide: BLAST Reference
About BLAST Reference
The BLAST Reference is a comprehensive command-line cheat sheet for NCBI BLAST+ (Basic Local Alignment Search Tool) covering all five main programs: blastn (nucleotide vs nucleotide), blastp (protein vs protein), blastx (translated nucleotide vs protein with 6-frame translation), tblastn (protein vs translated nucleotide database), and PSI-BLAST (position-specific iterated search for remote homolog detection with PSSM generation).
Key parameter sections detail E-value thresholds with interpretation guidelines (<1e-50 very high homology through >0.01 low significance), word size tuning for sensitivity vs speed tradeoffs, output format options (pairwise, tabular format 6 with custom fields like qseqid/sseqid/pident/evalue/bitscore), max_target_seqs control, and multi-threading configuration.
Database management covers makeblastdb for creating custom nucleotide and protein databases, blastdbcmd for sequence extraction, and update_blastdb.pl for downloading NCBI databases (nt, nr, swissprot, refseq_rna, pdbaa). Advanced options include algorithm variants (-task megablast/blastn/blastn-short), gap penalties, BLOSUM/PAM substitution matrices, low-complexity filtering (-dust/-seg), query region selection, and remote BLAST execution on NCBI servers.
Key Features
- All five BLAST programs: blastn, blastp, blastx, tblastn, and PSI-BLAST with complete command-line syntax and practical examples
- Key parameters: -evalue thresholds with significance interpretation, -word_size for sensitivity tuning, -outfmt 6 with custom column specifications, -max_target_seqs, and -num_threads
- Database tools: makeblastdb for custom database creation (nucl/prot), blastdbcmd for sequence extraction and DB info, update_blastdb.pl for NCBI database downloads
- Standard databases reference: nt, nr, swissprot, refseq_rna, pdbaa, est_human with usage examples for each
- Advanced options: -task variants (megablast, blastn-short, dc-megablast), gap penalties (-gapopen/-gapextend), substitution matrices (BLOSUM45/62/80, PAM250), and low-complexity filtering
- E-value and scoring interpretation: bit score thresholds, E-value calculation formula (E = K*m*n*e^(-lambda*S)), percent identity ranges, query coverage metrics, and HSP concepts
- Remote BLAST execution with -remote flag for searches without local database installation
- Searchable by command name, parameter flag, or keyword with instant category filtering
Frequently Asked Questions
What is the difference between blastn, blastp, blastx, and tblastn?
blastn compares nucleotide queries against nucleotide databases. blastp compares protein queries against protein databases. blastx translates a nucleotide query in all 6 reading frames and searches against a protein database (useful for finding protein homologs of a DNA sequence). tblastn searches a protein query against a nucleotide database translated in 6 frames (useful for finding unannotated genes).
What E-value threshold should I use for BLAST searches?
Common thresholds depend on your goal: E-value <1e-50 indicates very high sequence homology, 1e-50 to 1e-10 suggests strong homology suitable for functional annotation, 1e-10 to 1e-5 is moderate and useful for detecting related sequences, and >0.01 has low statistical significance. For most homology searches, -evalue 1e-5 is a reasonable starting point.
How do I create a custom BLAST database from my own sequences?
Use makeblastdb: `makeblastdb -in sequences.fasta -dbtype nucl -out my_db -title "My Custom DB" -parse_seqids` for nucleotide sequences, or `-dbtype prot` for protein sequences. The -parse_seqids flag enables sequence retrieval by ID. You can then search against it with `blastn -query query.fasta -db my_db`.
How do I get tabular BLAST output with specific columns?
Use outfmt 6 with custom field specifiers: `blastn -query input.fasta -db nt -outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore"`. Common fields include qseqid (query ID), sseqid (subject ID), pident (percent identity), evalue, bitscore, qcovs (query coverage), and stitle (subject title).
What is the difference between megablast and regular blastn?
Megablast (default for blastn, or explicit with -task megablast) uses a word size of 28 and is optimized for finding highly similar sequences quickly. Regular blastn (-task blastn) uses word size 11 and is more sensitive for detecting moderate similarity. blastn-short (-task blastn-short) is optimized for short sequences like primers. dc-megablast is for discontiguous word matching.
How do I interpret BLAST bit scores and percent identity?
Bit scores above 200 indicate very strong homology, 80-200 strong homology, 50-80 moderate, and below 50 weak. Percent identity above 70% suggests high homology, 30-70% falls in the twilight zone where homology may or may not reflect shared function, and below 30% indicates very distant relationships. Always consider E-value, coverage, and alignment length together.
What is PSI-BLAST and when should I use it?
PSI-BLAST (Position-Specific Iterated BLAST) performs iterative searches where each round builds a position-specific scoring matrix (PSSM) from the significant alignments found. Use it when regular blastp fails to detect remote homologs: `psiblast -query protein.fasta -db nr -num_iterations 5 -out_pssm result.pssm -evalue 0.001`. It is particularly powerful for detecting distant evolutionary relationships in protein families.
Is this BLAST reference free to use?
Yes, this BLAST command reference is completely free with no account, download, or usage limits. It is a browser-based searchable cheat sheet designed for bioinformaticians, genomics researchers, and biology students. Note that running BLAST itself requires installing NCBI BLAST+ locally or using the NCBI web interface. This reference helps you quickly find the right command syntax and parameters.