NumPy Reference
Free reference guide: NumPy Reference
About NumPy Reference
The NumPy Reference is a comprehensive cheat sheet for scientific Python computing, covering the full ndarray API that underlies virtually every data science, machine learning, and numerical simulation workflow. The Array Creation section documents np.array() for building arrays from Python lists, np.zeros() and np.ones() for initialized arrays, np.arange() for integer sequences like range(), np.linspace() for evenly spaced float sequences, np.eye() for identity matrices, and np.full() for constant-value arrays. These are the entry points for every NumPy computation.
Indexing and Operations are at the heart of NumPy's power. Basic slicing with a[i:j], boolean indexing with condition arrays (a[a > 3] returns elements greater than 3), fancy indexing with row/column index arrays, and np.where() for conditional selection are all covered. The Operations section documents element-wise arithmetic (+, -, *, /), matrix multiplication with np.dot() and the @ operator, axis-aware aggregations like np.sum() and np.mean(), np.max() and np.argmax() for finding values and positions, and broadcasting rules that let differently shaped arrays operate together. The Linear Algebra section covers np.linalg.inv() for matrix inversion, np.linalg.det() for determinant, np.linalg.eig() for eigenvalues and eigenvectors, np.linalg.svd() for singular value decomposition, np.linalg.solve() for systems of linear equations, and np.linalg.norm() for vector and matrix norms.
Random sampling, array reshaping, statistical functions, and file I/O complete the reference. The Random section covers np.random.rand() for uniform distribution, np.random.randn() for standard normal, np.random.randint() for integer sampling, np.random.choice() for sampling from arrays, np.random.seed() for reproducibility, and np.random.shuffle() for in-place shuffling. Reshaping covers a.reshape(), a.flatten(), a.T for transpose, np.concatenate(), np.vstack()/np.hstack(), and np.expand_dims() for adding batch dimensions. Statistical functions include np.mean(), np.std(), np.var(), np.median(), np.percentile(), np.corrcoef(), and np.histogram(). File I/O covers np.save()/np.load() for .npy binary format, np.savetxt()/np.loadtxt() for CSV, and np.savez() for compressed multi-array archives. This reference is used by data scientists, ML engineers, quantitative analysts, and Python developers working with numerical data.
Key Features
- Array creation — np.array, np.zeros, np.ones, np.arange, np.linspace, np.eye, np.full
- Boolean indexing (a[a > 3]), fancy indexing (a[[0,2], 1]), and np.where() conditional select
- Element-wise arithmetic and matrix multiplication with np.dot() and the @ operator
- Axis-aware np.sum, np.mean, np.max, np.argmax with broadcasting rules
- Full np.linalg suite — inv, det, eig, svd, solve, norm for linear algebra
- Random sampling — rand (uniform), randn (normal), randint, choice, seed, shuffle
- Array reshaping — reshape, flatten, .T transpose, concatenate, vstack/hstack, expand_dims
- Statistics (std, var, median, percentile, corrcoef) and file I/O (npy, csv, npz)
Frequently Asked Questions
What is the difference between np.arange() and np.linspace()?
np.arange(start, stop, step) works like Python range() but returns a NumPy array. The step is an integer or float interval and the number of elements depends on the step. np.linspace(start, stop, num) generates exactly num evenly spaced values between start and stop (inclusive by default). Use arange when you know the step size, and linspace when you know how many points you want — especially for plotting and integration where equal spacing matters.
How does NumPy broadcasting work?
Broadcasting allows NumPy to operate on arrays of different shapes by virtually expanding the smaller array to match the larger one without copying data. The rule is: arrays are compatible if their dimensions are equal or one of them is 1. NumPy aligns shapes from the right: a shape (3,1) array broadcast with a (1,4) array produces a (3,4) result. This enables efficient vectorized operations — for example, subtracting a mean vector from each row of a matrix without a Python loop.
What is the difference between np.dot() and the @ operator?
For 2D arrays (matrices), np.dot(A, B) and A @ B both compute matrix multiplication. For 1D arrays, np.dot computes the inner product (dot product). The @ operator (matmul) is generally preferred in modern NumPy code because it is more readable and matches mathematical notation. np.dot has slightly different behavior for higher-dimensional arrays (treating them as a sum product over the last axis of the first and the second-to-last axis of the second array), while @ follows stricter matmul semantics.
How does boolean indexing work in NumPy?
Boolean indexing lets you select elements by passing an array of True/False values as the index. When you write a[a > 3], NumPy first evaluates a > 3 to create a boolean array of the same shape, then returns only the elements where the condition is True as a 1D array. You can combine conditions with & (and) and | (or): a[(a > 2) & (a < 8)]. You can also assign to boolean-indexed selections: a[a < 0] = 0 sets all negative elements to zero.
What is SVD (Singular Value Decomposition) and when is it used?
SVD decomposes a matrix A into three matrices: A = U @ np.diag(S) @ Vt, where U and Vt are orthogonal matrices and S contains the singular values. np.linalg.svd(A) returns U, S, Vt. SVD is used in dimensionality reduction (PCA), computing the pseudoinverse of non-square matrices, image compression (keeping only the largest singular values), and solving least-squares problems. It is more numerically stable than eigendecomposition for general matrices.
How do I use np.random.seed() for reproducibility?
Call np.random.seed(integer) before any random operations to initialize the random number generator to a known state. All subsequent random calls in that script will produce the same sequence of numbers every time you run it. For example, np.random.seed(42) followed by np.random.randn(3) will always produce the same three numbers. In modern NumPy (1.17+), the preferred approach is to use a Generator: rng = np.random.default_rng(42) and then rng.standard_normal(3), which is thread-safe and statistically superior.
What is the difference between reshape() and flatten()?
reshape(shape) returns a new view of the array with the given shape without copying data — modifying the reshaped array modifies the original. Use -1 for one dimension to let NumPy infer it: a.reshape(2, -1). flatten() always returns a copy of the array collapsed to 1D. ravel() is similar to flatten() but returns a view when possible and a copy otherwise. Use reshape for changing dimensions, flatten when you need a guaranteed independent 1D copy, and ravel for a potentially zero-copy 1D view.
How do I save and load NumPy arrays?
Use np.save("file.npy", array) to save a single array in binary format and np.load("file.npy") to load it back — this is fast and preserves dtype exactly. For text-based CSV output, use np.savetxt("file.csv", array, delimiter=",") and np.loadtxt("file.csv", delimiter=","). For multiple arrays in one file, use np.savez("file.npz", x=a, y=b) and access them with loaded = np.load("file.npz"); loaded["x"]. The .npz format is compressed and ideal for storing datasets with multiple named arrays.