TensorFlow / Keras
Model building via Sequential & Functional API, Dense/Conv2D/LSTM/Embedding layers, tf.data pipelines, callbacks, custom training loops, and deployment.
Building Models: Sequential vs Functional API
Core Layers: Dense / Conv2D / LSTM / Embedding
Conv2D / MaxPooling2D / GlobalAveragePooling2D
Spatial feature extraction for image classification and computer
vision tasks
βΎ
Syntax
Example
Internals
keras.layers.Conv2D(
filters=32, # number of learnable
kernels
kernel_size=(3, 3), # (height, width) of filter
strides=(1, 1),
padding='same', # 'valid'=no pad;
'same'=output=input size
activation='relu',
kernel_regularizer=keras.regularizers.l2(1e-4)
)
keras.layers.MaxPooling2D(pool_size=(2,
2))
keras.layers.GlobalAveragePooling2D() #
replaces Flatten+Dense for efficiency
python
# CNN for 32x32 RGB image classification (CIFAR-10 style)
inputs = keras.Input(shape=(32, 32, 3))
# Block 1
x = keras.layers.Conv2D(32, (3,3), padding='same', activation='relu')(inputs)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Conv2D(32, (3,3), padding='same', activation='relu')(x)
x = keras.layers.MaxPooling2D((2,2))(x)
x = keras.layers.Dropout(0.25)(x)
# Block 2
x = keras.layers.Conv2D(64, (3,3), padding='same', activation='relu')(x)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Conv2D(64, (3,3), padding='same', activation='relu')(x)
x = keras.layers.GlobalAveragePooling2D()(x) # 64-dim vector
x = keras.layers.Dropout(0.4)(x)
outputs = keras.layers.Dense(10, activation='softmax')(x)
model = keras.Model(inputs, outputs, name='cnn_cifar')
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=1e-3),
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
- Output shape: With
samepadding: H_out = βH_in/strideβ. Withvalid: H_out = β(H_in - kernel + 1)/strideβ. - GlobalAveragePooling2D vs Flatten: GAP reduces spatial dims to a single mean per channel (no params). Flatten preserves all spatial info but creates a large Dense layer.
- Receptive field: Stack multiple 3Γ3 convolutions instead of one large kernel. Two 3Γ3 convolutions have the same receptive field as one 5Γ5, but half the parameters and more non-linearity.
LSTM / GRU / Bidirectional
Sequential modelling for text, time-series, and event data β
return sequences and states
βΎ
Syntax
Example
Internals
keras.layers.LSTM(
units=64, # hidden state dimension
return_sequences=False, # True for stacked
LSTM / seq2seq
return_state=False, # True to access h_t and
c_t
dropout=0.2,
recurrent_dropout=0.0 # dropout on recurrent
connections
)
keras.layers.Bidirectional(
keras.layers.LSTM(64,
return_sequences=True),
merge_mode='concat' # or
'sum','mul','ave'
)
python
# Sentiment analysis: Embedding + BiLSTM + Dense
VOCAB_SIZE, EMB_DIM, MAX_LEN = 20000, 64, 200
inputs = keras.Input(shape=(MAX_LEN,))
x = keras.layers.Embedding(VOCAB_SIZE, EMB_DIM, mask_zero=True)(inputs)
x = keras.layers.Bidirectional(
keras.layers.LSTM(64, return_sequences=True))(x)
x = keras.layers.Bidirectional(
keras.layers.LSTM(32))(x) # last layer: return_sequences=False
x = keras.layers.Dense(32, activation='relu')(x)
x = keras.layers.Dropout(0.4)(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)
model = keras.Model(inputs, outputs)
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy', keras.metrics.AUC(name='auc')])
# Time-series: multi-step prediction
ts_input = keras.Input(shape=(30, 5)) # 30 timesteps, 5 features
ts_out = keras.layers.LSTM(64, return_sequences=False)(ts_input)
ts_out = keras.layers.Dense(7)(ts_out) # predict next 7 steps
- LSTM vs GRU: GRU has fewer parameters (2 gates vs 3 in LSTM) and is faster. Performance is often comparable. Use GRU as default; LSTM when long-range dependencies matter.
- return_sequences=True: Returns output at every timestep β required
for stacked LSTMs or attention.
Falsereturns only the final timestep (for classification). - mask_zero=True: In Embedding, tells downstream layers to ignore padded 0s. Critical for variable-length sequences.
tf.data Pipeline
tf.data.Dataset β from_tensor_slices / map / batch / prefetch / cache
High-performance data loading and augmentation pipeline β
eliminate GPU idle time
βΎ
Syntax
Example
Performance
dataset = tf.data.Dataset.from_tensor_slices((X, y))
.shuffle(buffer_size, seed=42)
.map(preprocess_fn, num_parallel_calls=tf.data.AUTOTUNE)
.batch(batch_size, drop_remainder=False)
.prefetch(tf.data.AUTOTUNE) # overlap CPU prep
+ GPU train
.cache() # cache after expensive map
python
# Full training pipeline with augmentation
AUTOTUNE = tf.data.AUTOTUNE
def augment_image(image, label):
image = tf.image.random_flip_left_right(image)
image = tf.image.random_brightness(image, max_delta=0.1)
image = tf.image.random_contrast(image, lower=0.9, upper=1.1)
image = tf.clip_by_value(image, 0.0, 1.0)
return image, label
def make_dataset(X, y, batch_size=32, training=False):
ds = tf.data.Dataset.from_tensor_slices((X, y))
if training:
ds = ds.shuffle(len(X), seed=42)
ds = ds.map(augment_image, num_parallel_calls=AUTOTUNE)
ds = (ds
.batch(batch_size)
.cache() # after batch, cache in memory
.prefetch(AUTOTUNE))
return ds
train_ds = make_dataset(X_train, y_train, training=True)
val_ds = make_dataset(X_val, y_val, training=False)
# Load from files (TFRecord)
raw_ds = tf.data.TFRecordDataset(filenames)
ds = raw_ds.map(parse_fn, num_parallel_calls=AUTOTUNE)
model.fit(train_ds, validation_data=val_ds, epochs=30)
AUTOTUNE
Let TF automatically tune parallelism. Always use
num_parallel_calls=tf.data.AUTOTUNE in map() and
prefetch(tf.data.AUTOTUNE).cache() placement
Place
cache() after decode/resize but
before augmentation (augmentation should be random each epoch). If data fits in
RAM, cache after map.shuffle buffer
For proper shuffling,
buffer_size should
equal dataset size. Smaller buffers = biased ordering. For large datasets, use
the full size or a large fraction.Compile, Training & Callbacks
model.compile / model.fit / EarlyStopping / ReduceLROnPlateau /
ModelCheckpoint
Configure training, prevent overfitting with patience, and save
best weights automatically
βΎ
Syntax
Example
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=1e-3),
loss='binary_crossentropy',
metrics=['accuracy', keras.metrics.AUC()]
)
# Core callbacks
keras.callbacks.EarlyStopping(
monitor='val_loss', patience=10,
restore_best_weights=True, min_delta=1e-4
)
keras.callbacks.ReduceLROnPlateau(
monitor='val_loss', factor=0.5,
patience=5,
min_lr=1e-6, verbose=1
)
keras.callbacks.ModelCheckpoint(
filepath='best_model.keras',
monitor='val_auc', mode='max',
save_best_only=True
)
python
callbacks = [
keras.callbacks.EarlyStopping(
monitor='val_auc', mode='max',
patience=15, restore_best_weights=True
),
keras.callbacks.ReduceLROnPlateau(
monitor='val_loss', factor=0.5, patience=7, min_lr=1e-7
),
keras.callbacks.ModelCheckpoint(
'checkpoints/best.keras', save_best_only=True,
monitor='val_auc', mode='max'
),
keras.callbacks.TensorBoard(log_dir='logs/', histogram_freq=1)
]
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=200, # EarlyStopping will stop early
callbacks=callbacks,
class_weight={0: 1.0, 1: 5.0}, # imbalanced: penalize false negatives
verbose=1
)
# Load best saved model
best_model = keras.models.load_model('checkpoints/best.keras')
y_prob = best_model.predict(X_test, batch_size=256)[:, 0]
Custom Training Loop
GradientTape β Custom Training Loop
Full control over forward pass, gradient computation, and metric
updates β for research
βΎ
Example
python
optimizer = keras.optimizers.Adam(1e-3)
loss_fn = keras.losses.BinaryCrossentropy()
train_loss = keras.metrics.Mean(name='train_loss')
train_auc = keras.metrics.AUC(name='train_auc')
@tf.function # compile to graph for speed
def train_step(X_batch, y_batch):
with tf.GradientTape() as tape:
y_pred = model(X_batch, training=True)
loss = loss_fn(y_batch, y_pred)
loss += sum(model.losses) # L2 reg losses
grads = tape.gradient(loss, model.trainable_variables)
# Gradient clipping (prevents exploding gradients in RNNs)
grads, _ = tf.clip_by_global_norm(grads, clip_norm=1.0)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
train_loss.update_state(loss)
train_auc.update_state(y_batch, y_pred)
for epoch in range(NUM_EPOCHS):
train_loss.reset_state(); train_auc.reset_state()
for X_batch, y_batch in train_ds:
train_step(X_batch, y_batch)
print(f'Epoch {epoch+1}: loss={train_loss.result():.4f}, '
f'AUC={train_auc.result():.4f}')
Transfer Learning & Fine-tuning
keras.applications β Transfer Learning Pattern
Freeze pretrained weights, add classification head, unfreeze for
fine-tuning
βΎ
Example
python
# Step 1: Load pretrained base without top
base_model = keras.applications.MobileNetV2(
input_shape=(224, 224, 3),
include_top=False, # drop classifier
weights='imagenet'
)
base_model.trainable = False # freeze base weights
# Step 2: Add custom classification head
inputs = keras.Input(shape=(224, 224, 3))
x = keras.applications.mobilenet_v2.preprocess_input(inputs)
x = base_model(x, training=False) # BN in eval mode
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dropout(0.2)(x)
outputs = keras.layers.Dense(num_classes, activation='softmax')(x)
model = keras.Model(inputs, outputs)
# Step 3: Train head only (fast)
model.compile('adam', 'sparse_categorical_crossentropy', ['accuracy'])
model.fit(train_ds, epochs=10, validation_data=val_ds)
# Step 4: Unfreeze top layers for fine-tuning
base_model.trainable = True
# Only fine-tune top 30 layers
for layer in base_model.layers[:-30]:
layer.trainable = False
# Use lower LR for fine-tuning
model.compile(optimizer=keras.optimizers.Adam(1e-5),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_ds, epochs=20, validation_data=val_ds)