Two-Stage Fine-Tuning For Transfer Learning With Resnet-18

I wrote an app to classify an object using transfer learning, and the model was trained on CIFAR-10 dataset, all by myself.

Two-Stage Fine-Tuning for Transfer Learning with ResNet-18

What Is Two-Stage Fine-Tuning?

Two-stage fine-tuning is a structured approach to adapting a pretrained model to a new task. Instead of unfreezing all layers at once, we fine-tune the network in stages, starting from the top (task-specific layers) and gradually moving deeper into the backbone.

For a ResNet-18 pretrained on ImageNet, this method works especially well because different layers learn features of different abstraction levels.

Why Fine-Tuning in Stages Matters

If all layers are unfrozen immediately:

Gradients from a small dataset can overwrite pretrained features
Training becomes unstable
Validation accuracy may stagnate or degrade

Two-stage fine-tuning mitigates these issues by controlling where learning happens first.

ResNet-18 Layer Hierarchy

Conceptually, ResNet-18 can be divided into:

Early layers: conv1, layer1 → edges, textures
Middle layers: layer2, layer3 → shapes, parts
Late layers: layer4, fc → semantics, classes

Two-stage fine-tuning leverages this natural separation.

Stage 1: Fine-Tune the Head and Top Layers

Objective

Adapt the pretrained model to the new dataset while preserving general visual features learned from ImageNet.

Layers to Unfreeze

In Stage 1, we typically unfreeze:

fc (classification head)
layer4 (highest residual block)

All earlier layers remain frozen.

Why This Works

The classifier head is randomly initialized and must be trained from scratch. layer4 already contains high-level semantic features that can be safely adapted without destabilizing the network.

Example: Freezing and Unfreezing Layers

# Freeze all parameters
for name, param in model.named_parameters():
    param.requires_grad = False

# Unfreeze layer4
for name, param in model.layer4.named_parameters():
    param.requires_grad = True

# Unfreeze fully connected layer
for name, param in model.fc.named_parameters():
    param.requires_grad = True

Typical Training Behavior

Rapid improvement in validation accuracy
Stable loss curves
Minimal overfitting early on

Stage 1 establishes a strong task-specific baseline.

Stage 2: Fine-Tune Mid-Level Features

Objective

Refine feature representations by allowing limited adaptation of deeper layers without damaging low-level visual primitives.

Layers to Unfreeze

In Stage 2, we commonly unfreeze:

layer3

Lower layers (layer1, layer2) usually remain frozen for small datasets.

Why Not Unfreeze Everything?

Lower layers encode universal features like edges and color contrasts. Updating them with limited data:

Provides little performance gain
Increases overfitting risk
Can hurt generalization

Example: Unfreezing an Additional Layer

# Unfreeze layer3
for name, param in model.layer3.named_parameters():
    param.requires_grad = True

Learning Rate Strategy

Stage 2 typically uses:

A lower learning rate
Or a scheduler such as CosineAnnealingLR

This ensures gradual, controlled updates and prevents catastrophic forgetting.

Why Two-Stage Fine-Tuning Improves Validation Accuracy

From an Optimization Perspective

Stage 1 quickly moves the model into a good region of the loss landscape. Stage 2 performs smaller, more precise adjustments that improve generalization rather than memorization.

From a Generalization Perspective

This approach:

Preserves low-level visual knowledge
Encourages smoother decision boundaries
Reduces over-confident predictions

Typical Gains

For ResNet-18 on small to medium datasets:

~1–3% improvement in validation accuracy
More stable results across runs
Better resistance to overfitting

Key Takeaways

Two-stage fine-tuning emphasizes control and stability
Train task-specific layers first, then refine mid-level features
Lower layers are usually best left frozen for limited data
ResNet-18 benefits significantly from this structured approach

In practice, two-stage fine-tuning is one of the most reliable ways to extract maximum performance from a pretrained ResNet-18 without increasing model size or data requirements.

Any comments? Feel free to participate below in the Facebook comment section.

Enjoy the following random pages..

This website helps you translate Chinese into English!

This website helps C++ programmers improve their programming skills!

This software allows hundreds of participants to hold a meeting online.

This is a random maze generating program written in C.

Post your comment below.
Anything is okay.
I am serious.