ResNet
0. Idea:
ResNet is used to solve the idea that deeper network can get better result. In a plain network, the deeper network cannot get better result since the gradient vanishing/explosion problem. So ResNet gets skip/shortcut connections to overcome it. In the worst case, the ResNet can just skip all of the networks and form a shallow network to maintain the performance.
1. Architecture (ResNet-34, 34-layer plain, VGG-19)

The three networks are
Top: 34-layer ResNet with Skip / Shortcut Connection: the plain one with addition of skip / shortcut connection.
Middle: 34-layer Plain Network: treated as the deeper network of VGG-19
Bottom: 19-layer VGG-19
2. Motivation of ResNet
2.1. Problems of plain network: vanishing/ exploding gradients
We use plain network with no skip/shortcut connection, when the network get deeper, the vanishing/exploding gradients occurs. In back propagation, when partial derivative of the error function with respect to the current weight ==> It has the effect of multiplying n of these small / large numbers to compute gradients of front layers.
Vanished: multiplying n of small numbers ==> 0
Exploded: multiplying n of large numbers ==> too large
Solutions:
ResNet: Skip/ Shortcut connections
A smaller batch size
LSTM: use gate related neuron structures
Use gradient clipping: when the gradient lower than a threshold, just set the gradient as the clipping value, usually is 0.5
Weight regulation: add a L1 or L2 penalty regulation to help with exploding the gradients
2.2. Skip/ Shortcut connection in ResNet

The output , the weight layers is to learn a residual mapping:
If there is vanishing gradient for the weight layers ==> the identify can be transfer back to earlier layers ==> added back the vanished gradients
2.3 Two types of residual connections
The identity shortcuts (x): When the input and output are the same dimensions. ==> No extra parameters
Input/output Dimension changes: ==> Added extra parameters
Perform identity mapping with extra zero entries padded with the increased dimension
Projection shortcut to match the dimension with 1x1 CONV layer
3. Bottleneck design
Since the network is very deep now, the time complexity is high. A bottleneck design is used to reduce the complexity

How to add
1x1 CONV layers are added to the start and end of network
Why?
1×1 CONV can reduce the number of connections (parameters) while not degrading the performance of the network so much.
After adding a bottleneck, a ResNet-34 becomes ResNet-50:

Last updated
Was this helpful?