NumPy
Array creation, indexing, reshaping, math, linear algebra, random module, broadcasting β complete implementation reference for ML engineering.
Array Creation
The np.array() function is the core foundation of NumPy. It converts Python
lists or tuples into high-performance, multi-dimensional NumPy arrays (ndarrays). These
arrays are the fundamental data structure used across all Python ML libraries, solving the
problem of Python's slow native lists by storing data in contiguous memory blocks with
homogeneous types.
Imagine a Python list as a flexible box that can hold anything (a book, an apple, a shoe). Because it's flexible, the computer has to carefully inspect each item when processing them, which is slow. A NumPy array is like an egg carton: every slot holds exactly one egg of the exact same size. Because the computer knows exactly what is in every slot without checking, it can process the entire carton instantly.
import numpy as np
# Creating a 2D array (matrix) from a nested list
data_list = [[1.0, 2.0], [3.0, 4.0]]
X = np.array(data_list, dtype=np.float32)
print(X)
| Code Line | Explanation |
|---|---|
import numpy as np |
Imports the library under the standard alias np. |
data_list = [...] |
Defines a standard Python list of lists containing floats. |
X = np.array(...) |
Calls the core function, passing the list and explicitly forcing a 32-bit float data type (preferred for ML). |
print(X) |
Outputs the constructed ndarray to the console. |
Input: [[1.0, 2.0], [3.0, 4.0]] (Nested Python List)
Output:
[[1. 2.]
[3. 4.]]
Transformation: The irregular Python objects in memory are serialized into a strict C-array structure.
Under the hood, np.array() creates a C-struct that contains a pointer to a raw
block of memory, along with metadata describing the shape, strides (how many bytes to jump
to get to the next element), and data type. Memory is allocated via C's malloc,
entirely bypassing the slow Python GIL (Global Interpreter Lock).
In memory, the 2D matrix [[1, 2], [3, 4]] is stored flatly as
[1.0, 2.0, 3.0, 4.0]. The object just uses 'strides' mathematics to simulate a
grid when you ask for row 2, column 1.
A nested list with 2 elements, where each element has 2 elements, yields a shape of
(2, 2).
Before: Nested Lists.
After: X.shape = (batch_size=2, features=2)
Object Type: numpy.ndarray
Data Type: Follows the dtype parameter (e.g.,
float32).
Shape: Tuple of integers representing dimension sizes.
Ragged Arrays: If inner lists have different lengths (e.g.,
[[1], [2, 3]]), NumPy cannot build a grid. It falls back to an array of Python
objects (dtype=object), which ruins performance entirely.
Empty Inputs: np.array([]) evaluates safely but defaults to
float64.
np.asarray(obj): Use this instead of np.array()
inside functions! If obj is already an array, it returns a 0-copy view, saving
memory.
np.zeros(shape) / np.ones(shape): Use when the
data isn't known yet (e.g. initializing model weights).
Mistake: Not specifying dtype when feeding a GPU model like
PyTorch. Default is float64, but GPUs operate best on float32,
causing slow automatic casting later.
Fix: Always write np.array(data, dtype=np.float32) during
feature engineering pipelines.
Creating an array implies looping over the Python list and type-converting every element in
C. It is an O(N) operation. You should build your data as an array immediately when
generating data, rather than building a giant list for millions of rows and calling
np.array() at the end.
Challenge: Write a single line of code to create a 3-element 1D array of type 8-bit integer containing values 10, 20, 30.
Expected Output: [10 20 30] with dtype('int8')
The order='K' parameter determines memory layout. By default, NumPy prefers
C-contiguous (row-major) order. Fortran-contiguous (column-major) order
(order='F') is faster for column-wise operations and is the memory layout
expected by certain BLAS/LAPACK Fortran routines solving linear algebra equations in
Scikit-Learn.
Indexing, Slicing & Boolean Masking
Indexing and slicing allow you to access, modify, or extract specific elements and subarrays from a larger NumPy array. This is critical for tasks like selecting training batches, isolating features in a dataset, or creating validation splits without copying unnecessary memory.
Think of an array like a spreadsheet. Basic slicing ([1:4]) is like dragging
your mouse over rows 1 through 3βyou just "look" at them (a View). Fancy indexing (passing a
list of specific rows like [[0, 5, 8]]) or boolean masking
([a > 0]) is like using the Filter tool to copy specific rows into a brand new
spreadsheet (a Copy).
import numpy as np
X = np.arange(24).reshape(6, 4)
# 1. Basic Slicing (first 4 rows, all columns)
X_train = X[:4, :]
# 2. Fancy Indexing (columns 0 and 2)
X_subset = X[:, [0, 2]]
# 3. Boolean Masking (values greater than 10)
X_large = X[X > 10]
| Code Line | Explanation |
|---|---|
X = np.arange(24).reshape(6, 4) |
Creates a 6x4 matrix containing numbers 0 through 23. |
X_train = X[:4, :] |
Slices rows index 0 to 3. The : means "all columns". This returns a
memory view. |
X_subset = X[:, [0, 2]] |
Selects all rows but specifically copies out columns 0 and 2 using a list. |
X_large = X[X > 10] |
Evaluates X > 10 (creates a T/F mask) and filters the array, collapsing
it into 1D. |
Input: Array of shape (6,4)
Applying Slicing: X[:2, 1:3]
Output:
[[1 2]
[5 6]]
Transformation: Extracts the intersection of the first two rows and the middle two columns.
Basic slices only alter the strides and offset metadata of the C-struct. No data is duplicated in RAM. Fancy indexing requires NumPy to allocate a new memory block and iterate through the requested indices to populate it.
A boolean mask X > 10 is represented as an array of 8-bit booleans:
[False, False, ..., True, True]. Memory filtering iterates over this mask
mapping against the original pointers.
Basic 2D Slice: (6,4) sliced with [:4, :] β
(4, 4)
Boolean Mask: Masking a multi-dimensional array ALWAYS flattens the result
to 1D, e.g., (N,) where N is True count.
Object Type: numpy.ndarray
Note: Either a new object pointing to shared memory (View) or entirely new memory (Copy).
Out of Bounds: a[10:20] on a 5-element array doesn't crash; it
returns an empty array [].
Negative Step: a[::-1] instantly reverses the array logic by
flipping the memory strides to negative values.
np.where(cond, x, y): A variation of boolean masking providing
ternary logic (if condition then x else y), often used to cap outliers:
np.where(X > 100, 100, X).
Mistake: Chained assignment like X[X > 0][0] = 99. This
silently modifies a temporary copy, leaving X unchanged.
Fix: Use a single boolean index with np.where or direct
assignment: X[X > 0] = 99.
Always heavily favor basic slices over fancy indexing when dealing with massive datasets (like images or batches), because generating a View is O(1) instantaneous time and 0 extra memory.
Challenge: Given a 1D array a = np.array([1, 2, 3, 4, 5]),
extract only the odd numbers without using a loop.
Expected Output: [1 3 5]
Adding new axes via np.newaxis (or None) during slicing simply
inserts a stride of 0 into the struct metadata. X[:, np.newaxis] turns a shape
of (5,) into (5, 1) instantly, preparing arrays for broadcasting
without computationally expensive reshape functions.
X = np.random.randn(100, 28, 28) # 100 grayscale images 28x28
# Flatten for Dense layer input
X_flat = X.reshape(100, -1) # (100, 784) β -1 infers 784
# Add channel dim for Conv2D input
X_conv = X[..., np.newaxis] # (100, 28, 28, 1)
# Stack list of 1D arrays into a 2D matrix
features_list = [np.array([1,2,3]), np.array([4,5,6])]
X_matrix = np.stack(features_list, axis=0) # (2, 3)
# Concatenate along feature axis
X_combined = np.concatenate([X_flat, extra_feats], axis=1)
# Split into k folds manually
folds = np.array_split(X, 5) # handles uneven splits
# Remove singleton dims (e.g. from model output)
y_pred = model_out.squeeze() # (100,1) β (100,)
- np.stack vs np.concatenate:
stackcreates a new axis;concatenatejoins along existing axis. For 1D arrays:np.stack([a,b], axis=0)gives (2, N);np.concatenate([a,b])gives (2N,). - ravel vs flatten:
ravelreturns a view when possible (no copy);flattenalways copies. Useravelfor performance.
Math, Statistics & Sorting
X = np.random.randn(1000, 20) # 1000 samples, 20 features
# Per-feature statistics (axis=0 collapses rows)
mean = X.mean(axis=0) # (20,)
std = X.std(axis=0, ddof=1) # sample std (same as pandas default)
# keepdims for broadcasting back to original shape
X_normed = (X - X.mean(axis=0, keepdims=True)) / X.std(axis=0, keepdims=True)
# Class distribution
y = np.array([0,1,0,2,1,0])
labels, counts = np.unique(y, return_counts=True)
class_weights = counts.sum() / (len(labels) * counts)
# Top-k predictions (argsort descending)
logits = np.random.randn(100, 10)
top3 = np.argsort(logits, axis=1)[:, -3:][:, ::-1] # top-3 class indices
# IQR for outlier detection
q1, q3 = np.percentile(X, [25, 75], axis=0)
iqr = q3 - q1
lower, upper = q1 - 1.5*iqr, q3 + 1.5*iqr
mask = np.all((X >= lower) & (X <= upper), axis=1)
X_no_outliers = X[mask]
ddof=0 (default) = population std;
ddof=1 = sample std (Bessel's correction). For ML normalization
either is fine, but use ddof=1 to match sklearn's StandardScaler.
a.sort() sorts in-place (no copy).
np.sort(a) returns a new array. Use in-place to save memory on
large arrays.
Linear Algebra (np.linalg)
SVD β backbone of PCA and dimensionality reduction:
SVD decomposes X = UΞ£Vα΅. U: left singular vectors (scores); Ξ£: diagonal of singular values; V: right singular vectors (loadings = principal components). PCA = SVD of the centered X.
linalg.solve vs inv:
np.linalg.inv(A) @ b requires computing the inverse
explicitly β numerically unstable and ~2Γ slower. Always prefer
np.linalg.solve(A, b) which uses LU factorization.
# OLS with linalg
X = np.random.randn(100, 5)
y = np.random.randn(100)
# Ξ² = (Xα΅X)β»ΒΉXα΅y β via lstsq (numerically stable)
beta, residuals, rank, sv = np.linalg.lstsq(X, y, rcond=None)
# Manual PCA via SVD
X_centered = X - X.mean(axis=0)
U, S, Vt = np.linalg.svd(X_centered, full_matrices=False)
# Project to first 2 components
X_pca = U[:, :2] * S[:2] # equivalent to X_centered @ Vt[:2].T
explained_var = (S**2) / (S**2).sum()
print(f'Explained variance ratio: {explained_var[:2]}')
# L2 norm of each row (for cosine similarity preprocessing)
norms = np.linalg.norm(X, axis=1, keepdims=True)
X_unit = X / norms
# Cosine similarity matrix
cos_sim = X_unit @ X_unit.T # (100, 100)
# Matrix multiply with @ operator (preferred)
W1 = np.random.randn(784, 128)
b1 = np.zeros(128)
hidden = X_flat @ W1 + b1 # forward pass
np.show_config().
MKL-linked NumPy (Anaconda) is typically fastest.a.flags['C_CONTIGUOUS']. Use np.ascontiguousarray(a)
if needed.
np.matmul supports batched matrix
multiplication for 3D+ arrays: A @ B where A is (batch, n, m) and B
is (batch, m, k) β (batch, n, k).Random Module
rng = np.random.default_rng(42)
# Xavier / Glorot initialization
n_in, n_out = 784, 128
W = rng.normal(0, scale=np.sqrt(2 / (n_in + n_out)), size=(n_in, n_out))
# He initialization (for ReLU networks)
W_he = rng.standard_normal((n_in, n_out)) * np.sqrt(2 / n_in)
# Reproducible train/test shuffle
idx = np.arange(10000)
rng.shuffle(idx)
train_idx, test_idx = idx[:8000], idx[8000:]
# Stratified bootstrap sampling
def bootstrap(X, y, n, rng):
idx = rng.choice(len(X), size=n, replace=True)
return X[idx], y[idx]
# Weighted sampling (imbalanced datasets)
class_counts = np.array([900, 100])
weights = 1 / class_counts
sample_weights = weights[y_train]
sample_weights /= sample_weights.sum()
sampled_idx = rng.choice(len(X_train), size=1000,
replace=True, p=sample_weights)
# Monte Carlo dropout (uncertainty estimation)
n_mc = 50
mc_preds = np.stack([model_forward(X, training=True) for _ in range(n_mc)])
mean_pred = mc_preds.mean(axis=0)
uncertainty = mc_preds.std(axis=0)
- Legacy np.random.seed(): Global seed affects all code. The modern
default_rng(seed)creates an isolated RNG β instances don't interfere. - rng.shuffle vs rng.permutation:
shuffleis in-place;permutationreturns a new array. Both shuffle the first axis only for 2D arrays.
Broadcasting & Vectorization
# Manual StandardScaler (broadcasting)
X = np.random.randn(1000, 20)
mean = X.mean(axis=0) # (20,)
std = X.std(axis=0) # (20,)
X_scaled = (X - mean) / std # (1000,20) - (20,) β broadcasts β
# Pairwise Euclidean distance matrix (no loops!)
# ||xi - xj||Β² = ||xi||Β² + ||xj||Β² - 2Β·xiΒ·xj
sq_norms = (X**2).sum(axis=1, keepdims=True) # (n,1)
dist_sq = sq_norms + sq_norms.T - 2 * (X @ X.T) # (n,n)
# Softmax (numerically stable)
def softmax(logits):
exp = np.exp(logits - logits.max(axis=1, keepdims=True)) # stability
return exp / exp.sum(axis=1, keepdims=True)
# Outer product via broadcasting
a = np.array([1,2,3])
b = np.array([4,5,6])
outer = a[:, np.newaxis] * b[np.newaxis, :] # (3,1)*(1,3) β (3,3)
# einsum examples β concise, often faster than explicit ops
A = np.random.randn(32, 10) # batch of 32, 10-dim vectors
B = np.random.randn(10, 5) # weight matrix
# Matrix multiply
out = np.einsum('ij,jk->ik', A, B) # (32, 5)
# Batch dot product (pair-wise)
dots = np.einsum('bi,bi->b', A, A) # (32,) β row-wise normsΒ²
# Trace of matrix
M = np.random.randn(5, 5)
trace = np.einsum('ii->', M) # scalar
# Attention scores (QΒ·Kα΅ / βd)
Q = np.random.randn(8, 64, 32) # (batch, seq, d_k)
K = np.random.randn(8, 64, 32)
scores = np.einsum('bqd,bkd->bqk', Q, K) / np.sqrt(32) # (batch,q,k)
Utility Operations
# Numerically stable log-sum-exp (for softmax/cross-entropy)
def logsumexp(x):
c = x.max(axis=-1, keepdims=True)
return c + np.log(np.exp(x - c).sum(axis=-1, keepdims=True))
# Gradient clipping (prevent exploding gradients in manual backprop)
def clip_gradients(grads, max_norm=1.0):
total_norm = np.sqrt(sum(np.sum(g**2) for g in grads))
clip_coef = max_norm / (total_norm + 1e-6)
if clip_coef < 1:
grads = [g * clip_coef for g in grads]
return grads
# NaN detection and imputation in raw data
X_raw = np.array([[1, np.nan, 3], [4, 5, np.nan]])
nan_mask = np.isnan(X_raw)
col_means = np.nanmean(X_raw, axis=0)
X_raw[nan_mask] = np.take(col_means, np.where(nan_mask)[1])
# Binary cross-entropy (clip for numerical stability)
def bce_loss(y_true, y_prob):
y_prob = np.clip(y_prob, 1e-7, 1 - 1e-7) # avoid log(0)
return -np.mean(y_true * np.log(y_prob) + (1-y_true) * np.log(1-y_prob))
# Test floating point equality
np.allclose(a, b) # NEVER use == for floats