Array Creation

🧱
np.array / np.asarray / np.zeros / np.ones / np.full
Core array construction β€” dtype, copy semantics, and memory safety
array creationdtypecopy
β–Ύ
Syntax
Internals
Example
Mistakes
import numpy as np np.array(object, dtype=None, copy=True, order='K') np.asarray(a, dtype=None) # no copy if already array of correct dtype np.zeros(shape, dtype=float) np.ones(shape, dtype=float) np.full(shape, fill_value, dtype=None) np.empty(shape, dtype=float) # uninitialized β€” fastest, do NOT read np.eye(N, M=None, k=0, dtype=float) # identity / off-diagonal np.arange(start, stop, step, dtype=None) np.linspace(start, stop, num=50, endpoint=True)

dtype hierarchy & ML implications:

  • float64 β€” default NumPy float. Most sklearn estimators expect this.
  • float32 β€” half the memory; preferred by TensorFlow/PyTorch for GPU training.
  • int32 / int64 β€” for labels, indices. Loss functions accept int32.
  • bool β€” for masks; stored as 1 byte (not 1 bit).

np.array vs np.asarray:

np.array() always copies. np.asarray() returns a view if the input is already a compatible array β€” zero-copy. Use asarray in library code to avoid accidental copies.

python
import numpy as np

# Float32 for GPU-compatible features
X = np.array([[1.0, 2.0], [3.0, 4.0]], dtype=np.float32)

# Avoid copy when safe
X_view = np.asarray(X, dtype=np.float32)
print(X_view is X)  # True β€” no copy!

# Weight matrix init (zeros)
W = np.zeros((128, 64), dtype=np.float32)

# Bias init
b = np.full(64, 0.01, dtype=np.float32)

# Identity (useful for transformations)
I = np.eye(4)

# Evenly spaced samples (e.g. for ROC threshold grid)
thresholds = np.linspace(0, 1, num=101)

# Integer ranges for batch indices
indices = np.arange(0, 10000, step=32)  # batch start positions

# Check dtype and memory
print(X.dtype, X.nbytes, X.itemsize)
  • np.empty is NOT zeroed: Contains whatever was in memory. Never use to initialize weight arrays β€” use np.zeros or np.random.
  • Integer division pitfall: np.arange(0, 1, 0.1) may give 11 elements due to floating-point stepping. Use np.linspace(0, 1, 11) instead.
  • Nested list β†’ ragged: np.array([[1,2],[3]]) gives dtype=object. Always ensure all rows are the same length.

Indexing, Slicing & Boolean Masking

🎯
Basic slicing / Fancy indexing / Boolean masking
Select, filter, and extract subarrays β€” the most common NumPy operations in ML pipelines
indexingslicingboolean maskfancy indexing
β–Ύ
Syntax
Internals
Example
Mistakes
# Basic slicing β€” returns a VIEW a[start:stop:step] a[2:5] # rows 2,3,4 a[::- 1] # reverse a[1:5, 2:4] # 2D slice a[:, np.newaxis] # add axis (= a[:,None]) # Fancy indexing β€” returns a COPY a[[0, 3, 5]] # select rows by list a[np.array([0,3])] # Boolean masking β€” returns a COPY mask = a > 0 a[mask] a[a % 2 == 0] # even elements np.where(condition, x, y) # element-wise ternary

View vs Copy β€” critical for memory efficiency:

  • View: Basic slicing (a[1:4]) returns a view β€” no data is copied. Modifying the view modifies the original. Share memory (a.base is original is True).
  • Copy: Fancy indexing (a[[1,2,3]]) and boolean masking (a[a>0]) always return copies. Mutations don't affect original.
  • Use a.flags['OWNDATA'] or a.base is None to check ownership.
  • Force a copy: a[1:4].copy().
python
X = np.arange(24).reshape(6, 4)

# Slice first 4 rows, all columns β€” VIEW
X_train = X[:4, :]
X_test  = X[4:, :]

# Select specific features by column index
feature_idx = [0, 2]
X_subset = X[:, feature_idx]           # fancy index β†’ copy

# Boolean masking: select rows where target > 0
y = np.array([1, 0, 1, 1, 0, 1])
X_pos = X[y == 1]                      # only positive class samples

# np.where: clip outliers
X_clean = np.where(np.abs(X) > 10, np.sign(X) * 10, X)

# np.where as argwhere: find indices of non-zero
nonzero_rows = np.where(y == 1)[0]

# Argmax along axis
preds = np.argmax(logits, axis=1)     # multiclass predictions

# Advanced: select upper triangle (covariance matrices)
mask = np.triu(np.ones((4, 4), dtype=bool), k=1)
upper_vals = corr_matrix[mask]
  • Chained indexing on assignment: X[X>0][0] = 99 silently fails β€” creates a temporary copy. Use X[np.where(X>0)[0][0]] = 99.
  • View mutation: a = X[1:3]; a[:] = 0 zeros out rows 1-2 of X. To avoid, use a = X[1:3].copy().
πŸ”„
reshape / ravel / stack / concatenate / split
Reshape, combine, and split arrays β€” batch processing and data pipelining
reshapestackconcatenatesplit
β–Ύ
Syntax
Example
Mistakes
a.reshape(shape) # -1 infers dimension a.ravel() # flatten to 1D (view if contiguous) a.flatten() # always returns copy a.squeeze() # remove dims of size 1 np.expand_dims(a, axis) # insert axis at position np.concatenate([a, b], axis=0) # join along existing axis np.stack([a, b], axis=0) # join along NEW axis np.vstack([a, b]) # stack row-wise (axis=0) np.hstack([a, b]) # stack col-wise (axis=1) np.split(a, indices, axis=0) # split into sub-arrays
python
X = np.random.randn(100, 28, 28)  # 100 grayscale images 28x28

# Flatten for Dense layer input
X_flat = X.reshape(100, -1)       # (100, 784) β€” -1 infers 784

# Add channel dim for Conv2D input
X_conv = X[..., np.newaxis]       # (100, 28, 28, 1)

# Stack list of 1D arrays into a 2D matrix
features_list = [np.array([1,2,3]), np.array([4,5,6])]
X_matrix = np.stack(features_list, axis=0)  # (2, 3)

# Concatenate along feature axis
X_combined = np.concatenate([X_flat, extra_feats], axis=1)

# Split into k folds manually
folds = np.array_split(X, 5)     # handles uneven splits

# Remove singleton dims (e.g. from model output)
y_pred = model_out.squeeze()     # (100,1) β†’ (100,)
  • np.stack vs np.concatenate: stack creates a new axis; concatenate joins along existing axis. For 1D arrays: np.stack([a,b], axis=0) gives (2, N); np.concatenate([a,b]) gives (2N,).
  • ravel vs flatten: ravel returns a view when possible (no copy); flatten always copies. Use ravel for performance.

Math, Statistics & Sorting

πŸ“Š
mean / std / var / sum / cumsum / sort / argsort
Reduction operations and sorting β€” understanding axis, keepdims, and ddof
statisticsaxissorting
β–Ύ
Syntax
Example
Performance
np.mean(a, axis=None, keepdims=False) np.std(a, axis=None, ddof=0, keepdims=False) np.var(a, axis=None, ddof=0) np.sum(a, axis=None) np.cumsum(a, axis=None) np.min(a) / np.max(a) / np.ptp(a) # ptp = peak-to-peak(max-min) np.percentile(a, q) # q in 0-100 np.quantile(a, q) # q in 0.0-1.0 np.median(a, axis=None) # Sorting np.sort(a, axis=-1) # returns sorted copy np.argsort(a) # indices that sort a np.argmax(a, axis=None) # index of max np.argmin(a, axis=None) np.unique(a, return_counts=False)
python
X = np.random.randn(1000, 20)  # 1000 samples, 20 features

# Per-feature statistics (axis=0 collapses rows)
mean = X.mean(axis=0)   # (20,)
std  = X.std(axis=0, ddof=1)  # sample std (same as pandas default)

# keepdims for broadcasting back to original shape
X_normed = (X - X.mean(axis=0, keepdims=True)) / X.std(axis=0, keepdims=True)

# Class distribution
y = np.array([0,1,0,2,1,0])
labels, counts = np.unique(y, return_counts=True)
class_weights = counts.sum() / (len(labels) * counts)

# Top-k predictions (argsort descending)
logits = np.random.randn(100, 10)
top3 = np.argsort(logits, axis=1)[:, -3:][:, ::-1]  # top-3 class indices

# IQR for outlier detection
q1, q3 = np.percentile(X, [25, 75], axis=0)
iqr = q3 - q1
lower, upper = q1 - 1.5*iqr, q3 + 1.5*iqr
mask = np.all((X >= lower) & (X <= upper), axis=1)
X_no_outliers = X[mask]
ddof
ddof=0 (default) = population std; ddof=1 = sample std (Bessel's correction). For ML normalization either is fine, but use ddof=1 to match sklearn's StandardScaler.
In-place sort
a.sort() sorts in-place (no copy). np.sort(a) returns a new array. Use in-place to save memory on large arrays.

Linear Algebra (np.linalg)

πŸ“
dot / matmul / @ / linalg.inv / svd / eig / norm / solve
Matrix operations that underpin every ML algorithm β€” from OLS to PCA to backprop
linalgSVDmatmuleigenvalues
β–Ύ
Syntax
Internals
Example
Performance
np.dot(a, b) # scalar/1D/2D dot product np.matmul(a, b) # = a @ b; no broadcasting for 0D np.inner(a, b) np.outer(a, b) np.tensordot(a, b, axes) # numpy.linalg np.linalg.inv(a) # matrix inverse β€” avoid when possible! np.linalg.solve(A, b) # solve Ax=b (more stable than inv(A)@b) np.linalg.lstsq(A, b, rcond=None) # least-squares (OLS solution) np.linalg.svd(a, full_matrices=True) # SVD decomposition np.linalg.eig(a) # eigenvalues + eigenvectors np.linalg.eigh(a) # for symmetric/Hermitian β€” more stable np.linalg.norm(a, ord=None, axis=None) np.linalg.det(a) np.linalg.matrix_rank(a) np.linalg.pinv(a) # Moore-Penrose pseudo-inverse

SVD β€” backbone of PCA and dimensionality reduction:

SVD decomposes X = UΞ£Vα΅€. U: left singular vectors (scores); Ξ£: diagonal of singular values; V: right singular vectors (loadings = principal components). PCA = SVD of the centered X.

linalg.solve vs inv:

np.linalg.inv(A) @ b requires computing the inverse explicitly β€” numerically unstable and ~2Γ— slower. Always prefer np.linalg.solve(A, b) which uses LU factorization.

python
# OLS with linalg
X = np.random.randn(100, 5)
y = np.random.randn(100)

# Ξ² = (Xα΅€X)⁻¹Xα΅€y β€” via lstsq (numerically stable)
beta, residuals, rank, sv = np.linalg.lstsq(X, y, rcond=None)

# Manual PCA via SVD
X_centered = X - X.mean(axis=0)
U, S, Vt = np.linalg.svd(X_centered, full_matrices=False)
# Project to first 2 components
X_pca = U[:, :2] * S[:2]   # equivalent to X_centered @ Vt[:2].T
explained_var = (S**2) / (S**2).sum()
print(f'Explained variance ratio: {explained_var[:2]}')

# L2 norm of each row (for cosine similarity preprocessing)
norms = np.linalg.norm(X, axis=1, keepdims=True)
X_unit = X / norms

# Cosine similarity matrix
cos_sim = X_unit @ X_unit.T  # (100, 100)

# Matrix multiply with @ operator (preferred)
W1 = np.random.randn(784, 128)
b1 = np.zeros(128)
hidden = X_flat @ W1 + b1    # forward pass
BLAS Backend
np.dot / matmul delegate to BLAS (MKL, OpenBLAS). Ensure NumPy is built against optimized BLAS: np.show_config(). MKL-linked NumPy (Anaconda) is typically fastest.
Contiguous Arrays
BLAS requires C-contiguous (row-major) arrays. Non-contiguous arrays trigger copies. Check: a.flags['C_CONTIGUOUS']. Use np.ascontiguousarray(a) if needed.
Batched matmul
np.matmul supports batched matrix multiplication for 3D+ arrays: A @ B where A is (batch, n, m) and B is (batch, m, k) β†’ (batch, n, k).

Random Module

🎲
np.random.default_rng β€” Modern Random API
Reproducible experiments, data augmentation, shuffling, and sampling
randomreproducibilitysampling
β–Ύ
Syntax
Example
Mistakes
# Modern API (NumPy 1.17+) β€” prefer over np.random.* rng = np.random.default_rng(seed=42) rng.random((m, n)) # uniform [0, 1) rng.standard_normal((m, n)) # N(0,1) rng.normal(loc=0, scale=1, size=(...)) rng.uniform(low=0, high=1, size=(...)) rng.integers(low, high, size=(...)) # exclusive high rng.choice(a, size, replace=True, p=None) rng.shuffle(x) # in-place shuffle rng.permutation(x) # returns shuffled copy
python
rng = np.random.default_rng(42)

# Xavier / Glorot initialization
n_in, n_out = 784, 128
W = rng.normal(0, scale=np.sqrt(2 / (n_in + n_out)), size=(n_in, n_out))

# He initialization (for ReLU networks)
W_he = rng.standard_normal((n_in, n_out)) * np.sqrt(2 / n_in)

# Reproducible train/test shuffle
idx = np.arange(10000)
rng.shuffle(idx)
train_idx, test_idx = idx[:8000], idx[8000:]

# Stratified bootstrap sampling
def bootstrap(X, y, n, rng):
    idx = rng.choice(len(X), size=n, replace=True)
    return X[idx], y[idx]

# Weighted sampling (imbalanced datasets)
class_counts = np.array([900, 100])
weights = 1 / class_counts
sample_weights = weights[y_train]
sample_weights /= sample_weights.sum()
sampled_idx = rng.choice(len(X_train), size=1000,
                           replace=True, p=sample_weights)

# Monte Carlo dropout (uncertainty estimation)
n_mc = 50
mc_preds = np.stack([model_forward(X, training=True) for _ in range(n_mc)])
mean_pred = mc_preds.mean(axis=0)
uncertainty = mc_preds.std(axis=0)
  • Legacy np.random.seed(): Global seed affects all code. The modern default_rng(seed) creates an isolated RNG β€” instances don't interfere.
  • rng.shuffle vs rng.permutation: shuffle is in-place; permutation returns a new array. Both shuffle the first axis only for 2D arrays.

Broadcasting & Vectorization

πŸ“‘
Broadcasting Rules + np.einsum
Eliminate Python loops β€” broadcasting semantics, common patterns, and einsum for arbitrary contractions
broadcastingeinsumvectorization
β–Ύ
Rules
Example
einsum
# Broadcasting rules (align from right, expand where needed): # 1. Prepend 1s to the smaller shape until lengths match # 2. Dimensions of size 1 stretch to match the other # 3. If no dimension is 1 and sizes differ β†’ error # (3, 4) + (4,) β†’ (4,) treated as (1,4) β†’ (3,4) βœ“ # (3, 4) + (3,) β†’ ERROR (3,) treated as (1,3) βœ— β€” use (3,1) # (3, 1) + (1, 4) β†’ (3, 4) βœ“ np.einsum(subscripts, *operands) # Subscript notation: # 'ij,jk->ik' : matrix multiply # 'ij->i' : row sum # 'ii->i' : diagonal # 'ij,ij->ij' : element-wise multiply # 'ijk,ik->ij' : batched dot product
python
# Manual StandardScaler (broadcasting)
X = np.random.randn(1000, 20)
mean = X.mean(axis=0)     # (20,)
std  = X.std(axis=0)      # (20,)
X_scaled = (X - mean) / std  # (1000,20) - (20,) β†’ broadcasts βœ“

# Pairwise Euclidean distance matrix (no loops!)
# ||xi - xj||Β² = ||xi||Β² + ||xj||Β² - 2Β·xiΒ·xj
sq_norms = (X**2).sum(axis=1, keepdims=True)  # (n,1)
dist_sq = sq_norms + sq_norms.T - 2 * (X @ X.T)   # (n,n)

# Softmax (numerically stable)
def softmax(logits):
    exp = np.exp(logits - logits.max(axis=1, keepdims=True))  # stability
    return exp / exp.sum(axis=1, keepdims=True)

# Outer product via broadcasting
a = np.array([1,2,3])
b = np.array([4,5,6])
outer = a[:, np.newaxis] * b[np.newaxis, :]  # (3,1)*(1,3) β†’ (3,3)
python
# einsum examples β€” concise, often faster than explicit ops
A = np.random.randn(32, 10)   # batch of 32, 10-dim vectors
B = np.random.randn(10, 5)   # weight matrix

# Matrix multiply
out = np.einsum('ij,jk->ik', A, B)            # (32, 5)

# Batch dot product (pair-wise)
dots = np.einsum('bi,bi->b', A, A)            # (32,) β€” row-wise normsΒ²

# Trace of matrix
M = np.random.randn(5, 5)
trace = np.einsum('ii->', M)                  # scalar

# Attention scores (QΒ·Kα΅€ / √d)
Q = np.random.randn(8, 64, 32)  # (batch, seq, d_k)
K = np.random.randn(8, 64, 32)
scores = np.einsum('bqd,bkd->bqk', Q, K) / np.sqrt(32)  # (batch,q,k)

Utility Operations

πŸ”§
clip / abs / sign / log / exp / power / isnan / nanmean
Element-wise math, NaN handling, and numerical stability utilities
clipNaNnumerical stability
β–Ύ
Syntax
Example
np.clip(a, a_min, a_max) np.abs(a) / np.sign(a) np.log(a) / np.log2(a) / np.log1p(a) np.exp(a) / np.expm1(a) np.power(a, exp) / np.sqrt(a) # NaN utilities np.isnan(a) / np.isinf(a) / np.isfinite(a) np.nanmean(a) / np.nanstd(a) / np.nansum(a) np.nan_to_num(a, nan=0.0, posinf=0, neginf=0) # Comparisons & logical np.allclose(a, b, rtol=1e-5, atol=1e-8) # float comparison np.any(a) / np.all(a)
python
# Numerically stable log-sum-exp (for softmax/cross-entropy)
def logsumexp(x):
    c = x.max(axis=-1, keepdims=True)
    return c + np.log(np.exp(x - c).sum(axis=-1, keepdims=True))

# Gradient clipping (prevent exploding gradients in manual backprop)
def clip_gradients(grads, max_norm=1.0):
    total_norm = np.sqrt(sum(np.sum(g**2) for g in grads))
    clip_coef = max_norm / (total_norm + 1e-6)
    if clip_coef < 1:
        grads = [g * clip_coef for g in grads]
    return grads

# NaN detection and imputation in raw data
X_raw = np.array([[1, np.nan, 3], [4, 5, np.nan]])
nan_mask = np.isnan(X_raw)
col_means = np.nanmean(X_raw, axis=0)
X_raw[nan_mask] = np.take(col_means, np.where(nan_mask)[1])

# Binary cross-entropy (clip for numerical stability)
def bce_loss(y_true, y_prob):
    y_prob = np.clip(y_prob, 1e-7, 1 - 1e-7)  # avoid log(0)
    return -np.mean(y_true * np.log(y_prob) + (1-y_true) * np.log(1-y_prob))

# Test floating point equality
np.allclose(a, b)  # NEVER use == for floats