Biopython Reference
Free reference guide: Biopython Reference
About Biopython Reference
The Biopython Reference is a searchable cheat sheet covering 27 essential Biopython modules and functions organized into seven categories. The Sequence Parsing section covers SeqIO.parse(), SeqIO.read(), SeqIO.write(), SeqIO.convert() for FASTA and GenBank formats, plus GenBank feature/qualifier extraction. Sequence Analysis includes Seq object operations (complement, reverse_complement, transcribe, translate), GC content calculation via gc_fraction(), molecular weight computation, and JASPAR motif searching.
The Sequence Alignment section demonstrates both the legacy pairwise2.align and the newer PairwiseAligner class, AlignIO for reading Clustal/Stockholm MSA files, and BLOSUM62/PAM substitution matrix loading. The NCBI API section covers Entrez.esearch(), Entrez.efetch(), Entrez.einfo(), and Entrez.elink() for programmatic access to NCBI databases including nucleotide, protein, and PubMed.
BLAST integration includes NcbiblastnCommandline for local BLAST execution and NCBIWWW.qblast() for online searches, with NCBIXML parsing for result analysis. The Phylogenetics section covers Phylo.read() and Phylo.draw() for Newick tree visualization, while Structure Analysis demonstrates PDBParser for 3D structure extraction and DSSP for secondary structure calculation.
Key Features
- SeqIO module: parse(), read(), write(), convert() for FASTA, GenBank, and other sequence formats with complete code examples
- Seq object operations: complement(), reverse_complement(), transcribe(), translate() with codon table support including mitochondrial tables
- Sequence analysis utilities: gc_fraction() for GC content, molecular_weight() for DNA/RNA/protein, and JASPAR motif consensus searching
- Pairwise and multiple sequence alignment: PairwiseAligner with custom scoring, AlignIO for Clustal/Stockholm, and BLOSUM62/PAM matrix loading
- NCBI Entrez API: esearch() for database queries, efetch() for record downloads, einfo() for database metadata, and elink() for cross-database links
- BLAST integration: local execution via NcbiblastnCommandline, online NCBIWWW.qblast(), and NCBIXML result parsing with score and E-value extraction
- Phylogenetics: Phylo.read() for Newick/Nexus tree files, draw_ascii() for terminal output, and Phylo.draw() with matplotlib for publication-quality figures
- Structure analysis: PDBParser for PDB file parsing with model/chain/residue hierarchy, and DSSP for secondary structure assignment
Frequently Asked Questions
How do I read a multi-sequence FASTA file with Biopython?
Use SeqIO.parse() which returns an iterator of SeqRecord objects: `for record in SeqIO.parse("sequences.fasta", "fasta"): print(record.id, len(record.seq))`. For a single-sequence file, use SeqIO.read() instead, which returns a single SeqRecord directly. Supported formats include fasta, genbank, swiss, embl, and many more.
How do I translate a DNA sequence to protein in Biopython?
Create a Seq object and call translate(): `from Bio.Seq import Seq; seq = Seq("ATGGCCATTGTAATGGGCCGCTGA"); protein = seq.translate()`. For non-standard codon tables, specify the table parameter: `seq.translate(table="Vertebrate Mitochondrial")`. You can also use `seq.transcribe()` to get the mRNA sequence first.
How do I search NCBI databases using Biopython Entrez?
Set your email first with `Entrez.email = "you@example.com"`, then use `Entrez.esearch(db="nucleotide", term="BRCA1[Gene] AND Homo sapiens[Organism]", retmax=10)` to search. Fetch results with `Entrez.efetch(db="nucleotide", id="NM_007294", rettype="gb", retmode="text")` and parse with SeqIO.read(handle, "genbank").
How do I run BLAST from Biopython?
For local BLAST: `from Bio.Blast.Applications import NcbiblastnCommandline; cmd = NcbiblastnCommandline(query="query.fasta", db="nt", evalue=0.001, outfmt=6); stdout, stderr = cmd()`. For online BLAST: `from Bio.Blast import NCBIWWW; result = NCBIWWW.qblast("blastn", "nt", sequence_string)`. Parse XML results with NCBIXML.parse().
How do I perform pairwise sequence alignment with Biopython?
Use the newer PairwiseAligner class: `from Bio.Align import PairwiseAligner; aligner = PairwiseAligner(); aligner.mode = "global"; aligner.match_score = 2; aligner.mismatch_score = -1; alignments = aligner.align("ATCGATCG", "ATGGATCG"); print(alignments[0])`. The older pairwise2.align module is also available but deprecated in newer versions.
How do I visualize a phylogenetic tree with Biopython?
Read the tree with `from Bio import Phylo; tree = Phylo.read("tree.nwk", "newick")`. For a quick ASCII view: `Phylo.draw_ascii(tree)`. For publication-quality graphics: `import matplotlib.pyplot as plt; fig, ax = plt.subplots(figsize=(10, 8)); Phylo.draw(tree, axes=ax); plt.savefig("tree.png")`. Supports Newick, Nexus, and PhyloXML formats.
How do I parse PDB structure files with Biopython?
Use PDBParser: `from Bio.PDB import PDBParser; parser = PDBParser(QUIET=True); structure = parser.get_structure("protein", "1abc.pdb")`. Navigate the hierarchy: structure > model > chain > residue > atom. For secondary structure, use DSSP: `from Bio.PDB.DSSP import DSSP; dssp = DSSP(model, "1abc.pdb")` which assigns H (helix), E (sheet), C (coil) to each residue.
Is this Biopython reference free and does it require installation?
Yes, this web-based reference is completely free with no account or download required. It is a searchable cheat sheet with code examples you can browse from any device. Note that to actually run Biopython code, you need to install the library with `pip install biopython` in your Python environment. This reference helps you find the right syntax and examples quickly.