NumPy Reference

Array Creation

np.array() - Creating NumPy Arrays

1. Concept Introduction

The np.array() function is the core foundation of NumPy. It converts Python lists or tuples into high-performance, multi-dimensional NumPy arrays (ndarrays). These arrays are the fundamental data structure used across all Python ML libraries, solving the problem of Python's slow native lists by storing data in contiguous memory blocks with homogeneous types.

2. Concept Intuition

Imagine a Python list as a flexible box that can hold anything (a book, an apple, a shoe). Because it's flexible, the computer has to carefully inspect each item when processing them, which is slow. A NumPy array is like an egg carton: every slot holds exactly one egg of the exact same size. Because the computer knows exactly what is in every slot without checking, it can process the entire carton instantly.

3. Python Syntax

np.array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0) # Parameters: # object: Any array-like object (list, tuple, etc.) # dtype: The desired data-type for the array (e.g., np.float32) # copy: If True (default), the object is copied into new memory

4. Python Code Example

python

import numpy as np

# Creating a 2D array (matrix) from a nested list
data_list = [[1.0, 2.0], [3.0, 4.0]]
X = np.array(data_list, dtype=np.float32)

print(X)

5. Line-by-Line Explanation

Code Line	Explanation
`import numpy as np`	Imports the library under the standard alias `np`.
`data_list = [...]`	Defines a standard Python list of lists containing floats.
`X = np.array(...)`	Calls the core function, passing the list and explicitly forcing a 32-bit float data type (preferred for ML).
`print(X)`	Outputs the constructed ndarray to the console.

6. Input and Output Example

Input: [[1.0, 2.0], [3.0, 4.0]] (Nested Python List)

Output:

[[1. 2.]
 [3. 4.]]

Transformation: The irregular Python objects in memory are serialized into a strict C-array structure.

7. Internal Mechanism

Under the hood, np.array() creates a C-struct that contains a pointer to a raw block of memory, along with metadata describing the shape, strides (how many bytes to jump to get to the next element), and data type. Memory is allocated via C's malloc, entirely bypassing the slow Python GIL (Global Interpreter Lock).

8. Vector Representation

In memory, the 2D matrix [[1, 2], [3, 4]] is stored flatly as [1.0, 2.0, 3.0, 4.0]. The object just uses 'strides' mathematics to simulate a grid when you ask for row 2, column 1.

9. Shape and Dimensions

A nested list with 2 elements, where each element has 2 elements, yields a shape of (2, 2).

Before: Nested Lists.

After: X.shape = (batch_size=2, features=2)

10. Return Values

Object Type: numpy.ndarray

Data Type: Follows the dtype parameter (e.g., float32).

Shape: Tuple of integers representing dimension sizes.

11. Edge Cases

Ragged Arrays: If inner lists have different lengths (e.g., [[1], [2, 3]]), NumPy cannot build a grid. It falls back to an array of Python objects (dtype=object), which ruins performance entirely.

Empty Inputs: np.array([]) evaluates safely but defaults to float64.

12. Variations & Alternatives

np.asarray(obj): Use this instead of np.array() inside functions! If obj is already an array, it returns a 0-copy view, saving memory.

np.zeros(shape) / np.ones(shape): Use when the data isn't known yet (e.g. initializing model weights).

13. Common Mistakes

Mistake: Not specifying dtype when feeding a GPU model like PyTorch. Default is float64, but GPUs operate best on float32, causing slow automatic casting later.

Fix: Always write np.array(data, dtype=np.float32) during feature engineering pipelines.

14. Performance Considerations

Creating an array implies looping over the Python list and type-converting every element in C. It is an O(N) operation. You should build your data as an array immediately when generating data, rather than building a giant list for millions of rows and calling np.array() at the end.

15. Practice Exercise

Challenge: Write a single line of code to create a 3-element 1D array of type 8-bit integer containing values 10, 20, 30.

Expected Output: [10 20 30] with dtype('int8')

16. Advanced Explanation

The order='K' parameter determines memory layout. By default, NumPy prefers C-contiguous (row-major) order. Fortran-contiguous (column-major) order (order='F') is faster for column-wise operations and is the memory layout expected by certain BLAS/LAPACK Fortran routines solving linear algebra equations in Scikit-Learn.

✅ Next Steps: If you want, I can also give you a "100 Most Important Concepts for AI/ML Engineers" (a compact list that interviews and advanced courses focus on).

Indexing, Slicing & Boolean Masking

Indexing & Slicing NumPy Arrays

1. Concept Introduction

Indexing and slicing allow you to access, modify, or extract specific elements and subarrays from a larger NumPy array. This is critical for tasks like selecting training batches, isolating features in a dataset, or creating validation splits without copying unnecessary memory.

2. Concept Intuition

Think of an array like a spreadsheet. Basic slicing ([1:4]) is like dragging your mouse over rows 1 through 3—you just "look" at them (a View). Fancy indexing (passing a list of specific rows like [[0, 5, 8]]) or boolean masking ([a > 0]) is like using the Filter tool to copy specific rows into a brand new spreadsheet (a Copy).

3. Python Syntax

# Basic Slicing (View) a[start:stop:step] # 1D array a[row_expr, col_expr] # 2D array # Fancy Indexing (Copy) a[[idx1, idx2, idx3]] # Select by list of indices # Boolean Masking (Copy) a[condition_array] # Select where condition is True

4. Python Code Example

python

import numpy as np

X = np.arange(24).reshape(6, 4)

# 1. Basic Slicing (first 4 rows, all columns)
X_train = X[:4, :]

# 2. Fancy Indexing (columns 0 and 2)
X_subset = X[:, [0, 2]]

# 3. Boolean Masking (values greater than 10)
X_large = X[X > 10]

5. Line-by-Line Explanation

Code Line	Explanation
`X = np.arange(24).reshape(6, 4)`	Creates a 6x4 matrix containing numbers 0 through 23.
`X_train = X[:4, :]`	Slices rows index 0 to 3. The `:` means "all columns". This returns a memory view.
`X_subset = X[:, [0, 2]]`	Selects all rows but specifically copies out columns 0 and 2 using a list.
`X_large = X[X > 10]`	Evaluates `X > 10` (creates a T/F mask) and filters the array, collapsing it into 1D.

6. Input and Output Example

Input: Array of shape (6,4)

Applying Slicing: X[:2, 1:3]

Output:

[[1  2]
 [5  6]]

Transformation: Extracts the intersection of the first two rows and the middle two columns.

7. Internal Mechanism

Basic slices only alter the strides and offset metadata of the C-struct. No data is duplicated in RAM. Fancy indexing requires NumPy to allocate a new memory block and iterate through the requested indices to populate it.

8. Vector Representation

A boolean mask X > 10 is represented as an array of 8-bit booleans: [False, False, ..., True, True]. Memory filtering iterates over this mask mapping against the original pointers.

9. Shape and Dimensions

Basic 2D Slice: (6,4) sliced with [:4, :] → (4, 4)

Boolean Mask: Masking a multi-dimensional array ALWAYS flattens the result to 1D, e.g., (N,) where N is True count.

10. Return Values

Object Type: numpy.ndarray

Note: Either a new object pointing to shared memory (View) or entirely new memory (Copy).

11. Edge Cases

Out of Bounds: a[10:20] on a 5-element array doesn't crash; it returns an empty array [].

Negative Step: a[::-1] instantly reverses the array logic by flipping the memory strides to negative values.

12. Variations & Alternatives

np.where(cond, x, y): A variation of boolean masking providing ternary logic (if condition then x else y), often used to cap outliers: np.where(X > 100, 100, X).

13. Common Mistakes

Mistake: Chained assignment like X[X > 0][0] = 99. This silently modifies a temporary copy, leaving X unchanged.

Fix: Use a single boolean index with np.where or direct assignment: X[X > 0] = 99.

14. Performance Considerations

Always heavily favor basic slices over fancy indexing when dealing with massive datasets (like images or batches), because generating a View is O(1) instantaneous time and 0 extra memory.

15. Practice Exercise

Challenge: Given a 1D array a = np.array([1, 2, 3, 4, 5]), extract only the odd numbers without using a loop.

Expected Output: [1 3 5]

16. Advanced Explanation

Adding new axes via np.newaxis (or None) during slicing simply inserts a stride of 0 into the struct metadata. X[:, np.newaxis] turns a shape of (5,) into (5, 1) instantly, preparing arrays for broadcasting without computationally expensive reshape functions.

✅ Next Steps: If you want, I can also give you a "100 Most Important Concepts for AI/ML Engineers" (a compact list that interviews and advanced courses focus on).

🔄

reshape / ravel / stack / concatenate / split

Reshape, combine, and split arrays — batch processing and data pipelining

reshapestackconcatenatesplit

▾

Syntax

Example

Mistakes

a.reshape(shape) # -1 infers dimension a.ravel() # flatten to 1D (view if contiguous) a.flatten() # always returns copy a.squeeze() # remove dims of size 1 np.expand_dims(a, axis) # insert axis at position np.concatenate([a, b], axis=0) # join along existing axis np.stack([a, b], axis=0) # join along NEW axis np.vstack([a, b]) # stack row-wise (axis=0) np.hstack([a, b]) # stack col-wise (axis=1) np.split(a, indices, axis=0) # split into sub-arrays

python

X = np.random.randn(100, 28, 28)  # 100 grayscale images 28x28

# Flatten for Dense layer input
X_flat = X.reshape(100, -1)       # (100, 784) — -1 infers 784

# Add channel dim for Conv2D input
X_conv = X[..., np.newaxis]       # (100, 28, 28, 1)

# Stack list of 1D arrays into a 2D matrix
features_list = [np.array([1,2,3]), np.array([4,5,6])]
X_matrix = np.stack(features_list, axis=0)  # (2, 3)

# Concatenate along feature axis
X_combined = np.concatenate([X_flat, extra_feats], axis=1)

# Split into k folds manually
folds = np.array_split(X, 5)     # handles uneven splits

# Remove singleton dims (e.g. from model output)
y_pred = model_out.squeeze()     # (100,1) → (100,)

np.stack vs np.concatenate: stack creates a new axis; concatenate joins along existing axis. For 1D arrays: np.stack([a,b], axis=0) gives (2, N); np.concatenate([a,b]) gives (2N,).
ravel vs flatten: ravel returns a view when possible (no copy); flatten always copies. Use ravel for performance.

Math, Statistics & Sorting

📊

mean / std / var / sum / cumsum / sort / argsort

Reduction operations and sorting — understanding axis, keepdims, and ddof

statisticsaxissorting

▾

Syntax

Example

Performance

np.mean(a, axis=None, keepdims=False) np.std(a, axis=None, ddof=0, keepdims=False) np.var(a, axis=None, ddof=0) np.sum(a, axis=None) np.cumsum(a, axis=None) np.min(a) / np.max(a) / np.ptp(a) # ptp = peak-to-peak(max-min) np.percentile(a, q) # q in 0-100 np.quantile(a, q) # q in 0.0-1.0 np.median(a, axis=None) # Sorting np.sort(a, axis=-1) # returns sorted copy np.argsort(a) # indices that sort a np.argmax(a, axis=None) # index of max np.argmin(a, axis=None) np.unique(a, return_counts=False)

python

X = np.random.randn(1000, 20)  # 1000 samples, 20 features

# Per-feature statistics (axis=0 collapses rows)
mean = X.mean(axis=0)   # (20,)
std  = X.std(axis=0, ddof=1)  # sample std (same as pandas default)

# keepdims for broadcasting back to original shape
X_normed = (X - X.mean(axis=0, keepdims=True)) / X.std(axis=0, keepdims=True)

# Class distribution
y = np.array([0,1,0,2,1,0])
labels, counts = np.unique(y, return_counts=True)
class_weights = counts.sum() / (len(labels) * counts)

# Top-k predictions (argsort descending)
logits = np.random.randn(100, 10)
top3 = np.argsort(logits, axis=1)[:, -3:][:, ::-1]  # top-3 class indices

# IQR for outlier detection
q1, q3 = np.percentile(X, [25, 75], axis=0)
iqr = q3 - q1
lower, upper = q1 - 1.5*iqr, q3 + 1.5*iqr
mask = np.all((X >= lower) & (X <= upper), axis=1)
X_no_outliers = X[mask]

ddof

ddof=0 (default) = population std; ddof=1 = sample std (Bessel's correction). For ML normalization either is fine, but use ddof=1 to match sklearn's StandardScaler.

In-place sort

a.sort() sorts in-place (no copy). np.sort(a) returns a new array. Use in-place to save memory on large arrays.

Linear Algebra (np.linalg)

📐

dot / matmul / @ / linalg.inv / svd / eig / norm / solve

Matrix operations that underpin every ML algorithm — from OLS to PCA to backprop

linalgSVDmatmuleigenvalues

▾

Syntax

Internals

Example

Performance

np.dot(a, b) # scalar/1D/2D dot product np.matmul(a, b) # = a @ b; no broadcasting for 0D np.inner(a, b) np.outer(a, b) np.tensordot(a, b, axes) # numpy.linalg np.linalg.inv(a) # matrix inverse — avoid when possible! np.linalg.solve(A, b) # solve Ax=b (more stable than inv(A)@b) np.linalg.lstsq(A, b, rcond=None) # least-squares (OLS solution) np.linalg.svd(a, full_matrices=True) # SVD decomposition np.linalg.eig(a) # eigenvalues + eigenvectors np.linalg.eigh(a) # for symmetric/Hermitian — more stable np.linalg.norm(a, ord=None, axis=None) np.linalg.det(a) np.linalg.matrix_rank(a) np.linalg.pinv(a) # Moore-Penrose pseudo-inverse

SVD — backbone of PCA and dimensionality reduction:

SVD decomposes X = UΣVᵀ. U: left singular vectors (scores); Σ: diagonal of singular values; V: right singular vectors (loadings = principal components). PCA = SVD of the centered X.

linalg.solve vs inv:

np.linalg.inv(A) @ b requires computing the inverse explicitly — numerically unstable and ~2× slower. Always prefer np.linalg.solve(A, b) which uses LU factorization.

python

# OLS with linalg
X = np.random.randn(100, 5)
y = np.random.randn(100)

# β = (XᵀX)⁻¹Xᵀy — via lstsq (numerically stable)
beta, residuals, rank, sv = np.linalg.lstsq(X, y, rcond=None)

# Manual PCA via SVD
X_centered = X - X.mean(axis=0)
U, S, Vt = np.linalg.svd(X_centered, full_matrices=False)
# Project to first 2 components
X_pca = U[:, :2] * S[:2]   # equivalent to X_centered @ Vt[:2].T
explained_var = (S**2) / (S**2).sum()
print(f'Explained variance ratio: {explained_var[:2]}')

# L2 norm of each row (for cosine similarity preprocessing)
norms = np.linalg.norm(X, axis=1, keepdims=True)
X_unit = X / norms

# Cosine similarity matrix
cos_sim = X_unit @ X_unit.T  # (100, 100)

# Matrix multiply with @ operator (preferred)
W1 = np.random.randn(784, 128)
b1 = np.zeros(128)
hidden = X_flat @ W1 + b1    # forward pass

BLAS Backend

np.dot / matmul delegate to BLAS (MKL, OpenBLAS). Ensure NumPy is built against optimized BLAS: np.show_config(). MKL-linked NumPy (Anaconda) is typically fastest.

Contiguous Arrays

BLAS requires C-contiguous (row-major) arrays. Non-contiguous arrays trigger copies. Check: a.flags['C_CONTIGUOUS']. Use np.ascontiguousarray(a) if needed.

Batched matmul

np.matmul supports batched matrix multiplication for 3D+ arrays: A @ B where A is (batch, n, m) and B is (batch, m, k) → (batch, n, k).

Random Module

🎲

np.random.default_rng — Modern Random API

Reproducible experiments, data augmentation, shuffling, and sampling

randomreproducibilitysampling

▾

Syntax

Example

Mistakes

# Modern API (NumPy 1.17+) — prefer over np.random.* rng = np.random.default_rng(seed=42) rng.random((m, n)) # uniform [0, 1) rng.standard_normal((m, n)) # N(0,1) rng.normal(loc=0, scale=1, size=(...)) rng.uniform(low=0, high=1, size=(...)) rng.integers(low, high, size=(...)) # exclusive high rng.choice(a, size, replace=True, p=None) rng.shuffle(x) # in-place shuffle rng.permutation(x) # returns shuffled copy

python

rng = np.random.default_rng(42)

# Xavier / Glorot initialization
n_in, n_out = 784, 128
W = rng.normal(0, scale=np.sqrt(2 / (n_in + n_out)), size=(n_in, n_out))

# He initialization (for ReLU networks)
W_he = rng.standard_normal((n_in, n_out)) * np.sqrt(2 / n_in)

# Reproducible train/test shuffle
idx = np.arange(10000)
rng.shuffle(idx)
train_idx, test_idx = idx[:8000], idx[8000:]

# Stratified bootstrap sampling
def bootstrap(X, y, n, rng):
    idx = rng.choice(len(X), size=n, replace=True)
    return X[idx], y[idx]

# Weighted sampling (imbalanced datasets)
class_counts = np.array([900, 100])
weights = 1 / class_counts
sample_weights = weights[y_train]
sample_weights /= sample_weights.sum()
sampled_idx = rng.choice(len(X_train), size=1000,
                           replace=True, p=sample_weights)

# Monte Carlo dropout (uncertainty estimation)
n_mc = 50
mc_preds = np.stack([model_forward(X, training=True) for _ in range(n_mc)])
mean_pred = mc_preds.mean(axis=0)
uncertainty = mc_preds.std(axis=0)

Legacy np.random.seed(): Global seed affects all code. The modern default_rng(seed) creates an isolated RNG — instances don't interfere.
rng.shuffle vs rng.permutation: shuffle is in-place; permutation returns a new array. Both shuffle the first axis only for 2D arrays.

Broadcasting & Vectorization

📡

Broadcasting Rules + np.einsum

Eliminate Python loops — broadcasting semantics, common patterns, and einsum for arbitrary contractions

broadcastingeinsumvectorization

▾

Rules

Example

einsum

# Broadcasting rules (align from right, expand where needed): # 1. Prepend 1s to the smaller shape until lengths match # 2. Dimensions of size 1 stretch to match the other # 3. If no dimension is 1 and sizes differ → error # (3, 4) + (4,) → (4,) treated as (1,4) → (3,4) ✓ # (3, 4) + (3,) → ERROR (3,) treated as (1,3) ✗ — use (3,1) # (3, 1) + (1, 4) → (3, 4) ✓ np.einsum(subscripts, *operands) # Subscript notation: # 'ij,jk->ik' : matrix multiply # 'ij->i' : row sum # 'ii->i' : diagonal # 'ij,ij->ij' : element-wise multiply # 'ijk,ik->ij' : batched dot product

python

# Manual StandardScaler (broadcasting)
X = np.random.randn(1000, 20)
mean = X.mean(axis=0)     # (20,)
std  = X.std(axis=0)      # (20,)
X_scaled = (X - mean) / std  # (1000,20) - (20,) → broadcasts ✓

# Pairwise Euclidean distance matrix (no loops!)
# ||xi - xj||² = ||xi||² + ||xj||² - 2·xi·xj
sq_norms = (X**2).sum(axis=1, keepdims=True)  # (n,1)
dist_sq = sq_norms + sq_norms.T - 2 * (X @ X.T)   # (n,n)

# Softmax (numerically stable)
def softmax(logits):
    exp = np.exp(logits - logits.max(axis=1, keepdims=True))  # stability
    return exp / exp.sum(axis=1, keepdims=True)

# Outer product via broadcasting
a = np.array([1,2,3])
b = np.array([4,5,6])
outer = a[:, np.newaxis] * b[np.newaxis, :]  # (3,1)*(1,3) → (3,3)

python

# einsum examples — concise, often faster than explicit ops
A = np.random.randn(32, 10)   # batch of 32, 10-dim vectors
B = np.random.randn(10, 5)   # weight matrix

# Matrix multiply
out = np.einsum('ij,jk->ik', A, B)            # (32, 5)

# Batch dot product (pair-wise)
dots = np.einsum('bi,bi->b', A, A)            # (32,) — row-wise norms²

# Trace of matrix
M = np.random.randn(5, 5)
trace = np.einsum('ii->', M)                  # scalar

# Attention scores (Q·Kᵀ / √d)
Q = np.random.randn(8, 64, 32)  # (batch, seq, d_k)
K = np.random.randn(8, 64, 32)
scores = np.einsum('bqd,bkd->bqk', Q, K) / np.sqrt(32)  # (batch,q,k)

Utility Operations

🔧

clip / abs / sign / log / exp / power / isnan / nanmean

Element-wise math, NaN handling, and numerical stability utilities

clipNaNnumerical stability

▾

Syntax

Example

np.clip(a, a_min, a_max) np.abs(a) / np.sign(a) np.log(a) / np.log2(a) / np.log1p(a) np.exp(a) / np.expm1(a) np.power(a, exp) / np.sqrt(a) # NaN utilities np.isnan(a) / np.isinf(a) / np.isfinite(a) np.nanmean(a) / np.nanstd(a) / np.nansum(a) np.nan_to_num(a, nan=0.0, posinf=0, neginf=0) # Comparisons & logical np.allclose(a, b, rtol=1e-5, atol=1e-8) # float comparison np.any(a) / np.all(a)

python

# Numerically stable log-sum-exp (for softmax/cross-entropy)
def logsumexp(x):
    c = x.max(axis=-1, keepdims=True)
    return c + np.log(np.exp(x - c).sum(axis=-1, keepdims=True))

# Gradient clipping (prevent exploding gradients in manual backprop)
def clip_gradients(grads, max_norm=1.0):
    total_norm = np.sqrt(sum(np.sum(g**2) for g in grads))
    clip_coef = max_norm / (total_norm + 1e-6)
    if clip_coef < 1:
        grads = [g * clip_coef for g in grads]
    return grads

# NaN detection and imputation in raw data
X_raw = np.array([[1, np.nan, 3], [4, 5, np.nan]])
nan_mask = np.isnan(X_raw)
col_means = np.nanmean(X_raw, axis=0)
X_raw[nan_mask] = np.take(col_means, np.where(nan_mask)[1])

# Binary cross-entropy (clip for numerical stability)
def bce_loss(y_true, y_prob):
    y_prob = np.clip(y_prob, 1e-7, 1 - 1e-7)  # avoid log(0)
    return -np.mean(y_true * np.log(y_prob) + (1-y_true) * np.log(1-y_prob))

# Test floating point equality
np.allclose(a, b)  # NEVER use == for floats