NumPy
Array creation, indexing, reshaping, math, linear algebra, random module, broadcasting β complete implementation reference for ML engineering.
Array Creation
Indexing, Slicing & Boolean Masking
View vs Copy β critical for memory efficiency:
- View: Basic slicing (
a[1:4]) returns a view β no data is copied. Modifying the view modifies the original. Share memory (a.base is originalis True). - Copy: Fancy indexing (
a[[1,2,3]]) and boolean masking (a[a>0]) always return copies. Mutations don't affect original. - Use
a.flags['OWNDATA']ora.base is Noneto check ownership. - Force a copy:
a[1:4].copy().
X = np.arange(24).reshape(6, 4)
# Slice first 4 rows, all columns β VIEW
X_train = X[:4, :]
X_test = X[4:, :]
# Select specific features by column index
feature_idx = [0, 2]
X_subset = X[:, feature_idx] # fancy index β copy
# Boolean masking: select rows where target > 0
y = np.array([1, 0, 1, 1, 0, 1])
X_pos = X[y == 1] # only positive class samples
# np.where: clip outliers
X_clean = np.where(np.abs(X) > 10, np.sign(X) * 10, X)
# np.where as argwhere: find indices of non-zero
nonzero_rows = np.where(y == 1)[0]
# Argmax along axis
preds = np.argmax(logits, axis=1) # multiclass predictions
# Advanced: select upper triangle (covariance matrices)
mask = np.triu(np.ones((4, 4), dtype=bool), k=1)
upper_vals = corr_matrix[mask]
- Chained indexing on assignment:
X[X>0][0] = 99silently fails β creates a temporary copy. UseX[np.where(X>0)[0][0]] = 99. - View mutation:
a = X[1:3]; a[:] = 0zeros out rows 1-2 of X. To avoid, usea = X[1:3].copy().
X = np.random.randn(100, 28, 28) # 100 grayscale images 28x28
# Flatten for Dense layer input
X_flat = X.reshape(100, -1) # (100, 784) β -1 infers 784
# Add channel dim for Conv2D input
X_conv = X[..., np.newaxis] # (100, 28, 28, 1)
# Stack list of 1D arrays into a 2D matrix
features_list = [np.array([1,2,3]), np.array([4,5,6])]
X_matrix = np.stack(features_list, axis=0) # (2, 3)
# Concatenate along feature axis
X_combined = np.concatenate([X_flat, extra_feats], axis=1)
# Split into k folds manually
folds = np.array_split(X, 5) # handles uneven splits
# Remove singleton dims (e.g. from model output)
y_pred = model_out.squeeze() # (100,1) β (100,)
- np.stack vs np.concatenate:
stackcreates a new axis;concatenatejoins along existing axis. For 1D arrays:np.stack([a,b], axis=0)gives (2, N);np.concatenate([a,b])gives (2N,). - ravel vs flatten:
ravelreturns a view when possible (no copy);flattenalways copies. Useravelfor performance.
Math, Statistics & Sorting
X = np.random.randn(1000, 20) # 1000 samples, 20 features
# Per-feature statistics (axis=0 collapses rows)
mean = X.mean(axis=0) # (20,)
std = X.std(axis=0, ddof=1) # sample std (same as pandas default)
# keepdims for broadcasting back to original shape
X_normed = (X - X.mean(axis=0, keepdims=True)) / X.std(axis=0, keepdims=True)
# Class distribution
y = np.array([0,1,0,2,1,0])
labels, counts = np.unique(y, return_counts=True)
class_weights = counts.sum() / (len(labels) * counts)
# Top-k predictions (argsort descending)
logits = np.random.randn(100, 10)
top3 = np.argsort(logits, axis=1)[:, -3:][:, ::-1] # top-3 class indices
# IQR for outlier detection
q1, q3 = np.percentile(X, [25, 75], axis=0)
iqr = q3 - q1
lower, upper = q1 - 1.5*iqr, q3 + 1.5*iqr
mask = np.all((X >= lower) & (X <= upper), axis=1)
X_no_outliers = X[mask]
ddof=0 (default) = population std;
ddof=1 = sample std (Bessel's correction). For ML normalization
either is fine, but use ddof=1 to match sklearn's StandardScaler.a.sort() sorts in-place (no copy).
np.sort(a) returns a new array. Use in-place to save memory on
large arrays.Linear Algebra (np.linalg)
SVD β backbone of PCA and dimensionality reduction:
SVD decomposes X = UΞ£Vα΅. U: left singular vectors (scores); Ξ£: diagonal of singular values; V: right singular vectors (loadings = principal components). PCA = SVD of the centered X.
linalg.solve vs inv:
np.linalg.inv(A) @ b requires computing the inverse
explicitly β numerically unstable and ~2Γ slower. Always prefer
np.linalg.solve(A, b) which uses LU factorization.
# OLS with linalg
X = np.random.randn(100, 5)
y = np.random.randn(100)
# Ξ² = (Xα΅X)β»ΒΉXα΅y β via lstsq (numerically stable)
beta, residuals, rank, sv = np.linalg.lstsq(X, y, rcond=None)
# Manual PCA via SVD
X_centered = X - X.mean(axis=0)
U, S, Vt = np.linalg.svd(X_centered, full_matrices=False)
# Project to first 2 components
X_pca = U[:, :2] * S[:2] # equivalent to X_centered @ Vt[:2].T
explained_var = (S**2) / (S**2).sum()
print(f'Explained variance ratio: {explained_var[:2]}')
# L2 norm of each row (for cosine similarity preprocessing)
norms = np.linalg.norm(X, axis=1, keepdims=True)
X_unit = X / norms
# Cosine similarity matrix
cos_sim = X_unit @ X_unit.T # (100, 100)
# Matrix multiply with @ operator (preferred)
W1 = np.random.randn(784, 128)
b1 = np.zeros(128)
hidden = X_flat @ W1 + b1 # forward pass
np.show_config().
MKL-linked NumPy (Anaconda) is typically fastest.a.flags['C_CONTIGUOUS']. Use np.ascontiguousarray(a)
if needed.np.matmul supports batched matrix
multiplication for 3D+ arrays: A @ B where A is (batch, n, m) and B
is (batch, m, k) β (batch, n, k).Random Module
rng = np.random.default_rng(42)
# Xavier / Glorot initialization
n_in, n_out = 784, 128
W = rng.normal(0, scale=np.sqrt(2 / (n_in + n_out)), size=(n_in, n_out))
# He initialization (for ReLU networks)
W_he = rng.standard_normal((n_in, n_out)) * np.sqrt(2 / n_in)
# Reproducible train/test shuffle
idx = np.arange(10000)
rng.shuffle(idx)
train_idx, test_idx = idx[:8000], idx[8000:]
# Stratified bootstrap sampling
def bootstrap(X, y, n, rng):
idx = rng.choice(len(X), size=n, replace=True)
return X[idx], y[idx]
# Weighted sampling (imbalanced datasets)
class_counts = np.array([900, 100])
weights = 1 / class_counts
sample_weights = weights[y_train]
sample_weights /= sample_weights.sum()
sampled_idx = rng.choice(len(X_train), size=1000,
replace=True, p=sample_weights)
# Monte Carlo dropout (uncertainty estimation)
n_mc = 50
mc_preds = np.stack([model_forward(X, training=True) for _ in range(n_mc)])
mean_pred = mc_preds.mean(axis=0)
uncertainty = mc_preds.std(axis=0)
- Legacy np.random.seed(): Global seed affects all code. The modern
default_rng(seed)creates an isolated RNG β instances don't interfere. - rng.shuffle vs rng.permutation:
shuffleis in-place;permutationreturns a new array. Both shuffle the first axis only for 2D arrays.
Broadcasting & Vectorization
# Manual StandardScaler (broadcasting)
X = np.random.randn(1000, 20)
mean = X.mean(axis=0) # (20,)
std = X.std(axis=0) # (20,)
X_scaled = (X - mean) / std # (1000,20) - (20,) β broadcasts β
# Pairwise Euclidean distance matrix (no loops!)
# ||xi - xj||Β² = ||xi||Β² + ||xj||Β² - 2Β·xiΒ·xj
sq_norms = (X**2).sum(axis=1, keepdims=True) # (n,1)
dist_sq = sq_norms + sq_norms.T - 2 * (X @ X.T) # (n,n)
# Softmax (numerically stable)
def softmax(logits):
exp = np.exp(logits - logits.max(axis=1, keepdims=True)) # stability
return exp / exp.sum(axis=1, keepdims=True)
# Outer product via broadcasting
a = np.array([1,2,3])
b = np.array([4,5,6])
outer = a[:, np.newaxis] * b[np.newaxis, :] # (3,1)*(1,3) β (3,3)
# einsum examples β concise, often faster than explicit ops
A = np.random.randn(32, 10) # batch of 32, 10-dim vectors
B = np.random.randn(10, 5) # weight matrix
# Matrix multiply
out = np.einsum('ij,jk->ik', A, B) # (32, 5)
# Batch dot product (pair-wise)
dots = np.einsum('bi,bi->b', A, A) # (32,) β row-wise normsΒ²
# Trace of matrix
M = np.random.randn(5, 5)
trace = np.einsum('ii->', M) # scalar
# Attention scores (QΒ·Kα΅ / βd)
Q = np.random.randn(8, 64, 32) # (batch, seq, d_k)
K = np.random.randn(8, 64, 32)
scores = np.einsum('bqd,bkd->bqk', Q, K) / np.sqrt(32) # (batch,q,k)
Utility Operations
# Numerically stable log-sum-exp (for softmax/cross-entropy)
def logsumexp(x):
c = x.max(axis=-1, keepdims=True)
return c + np.log(np.exp(x - c).sum(axis=-1, keepdims=True))
# Gradient clipping (prevent exploding gradients in manual backprop)
def clip_gradients(grads, max_norm=1.0):
total_norm = np.sqrt(sum(np.sum(g**2) for g in grads))
clip_coef = max_norm / (total_norm + 1e-6)
if clip_coef < 1:
grads = [g * clip_coef for g in grads]
return grads
# NaN detection and imputation in raw data
X_raw = np.array([[1, np.nan, 3], [4, 5, np.nan]])
nan_mask = np.isnan(X_raw)
col_means = np.nanmean(X_raw, axis=0)
X_raw[nan_mask] = np.take(col_means, np.where(nan_mask)[1])
# Binary cross-entropy (clip for numerical stability)
def bce_loss(y_true, y_prob):
y_prob = np.clip(y_prob, 1e-7, 1 - 1e-7) # avoid log(0)
return -np.mean(y_true * np.log(y_prob) + (1-y_true) * np.log(1-y_prob))
# Test floating point equality
np.allclose(a, b) # NEVER use == for floats