GoogLeNet

1. Architecture

GoogLeNet can go very deep. It involves 3 techniques to make this happen:

1x1 Convolution layer
Inception Module
Global Average Pooling

2. 1x1 Convolution

The 1x1 convolution layer is used as a dimension reduction module to reduce the computation. By reducing the computation bottleneck, depth and width can be increased.

5x5 kernel convolution without 1x1 CONV kernel:

Number of operations = (14x14x48) x (5x5x480) = 112.9 M

1x1 CONV kernel:

- Number of operations for 1x1 kernel = (14x14x16) x (1x1x480) = 1.5 M
- Number of operations for 5x5 kernel = (14x14x48) x (5x5x16) = 3.8 M

==> Indeed, we map operations from high dimension to low dimension in a non-linear way

3. Inception Module

The inception module is used to extract different kinds of features from the same layer.

The 1×1 conv, 3×3 conv, 5×5 conv, and 3×3 max pooling are done altogether for the previous input, and stack together again at output.

==> When image’s coming in, different sizes of convolutions as well as max pooling are tried.

To reduce the operations in the inception module, we used 1x1 convolution layers before the normal convolution layers:

4. Global Average Pooling (GAP)

For a fully connected (FC) layer, the number of weights = 7x7x1024x1024 = 51.3 M

For GoogLeNet, the global average pooling by averaging each feature map from 7x7 to 1x1. The number of weights = 0 M

5. Auxiliary Classifiers for training:

=> For combating gradient vanishing problem and provide regulations

The softmax branches at the middle are used for training only. The classifiers consist of

5x5 average pooling (stride 3)
1x1 CONV (128 filters)
1024 FC
1000 FC
Softmax

==> The loss is added to the total loss with weight 0.3

PreviousVGGNet NextResNet

Last updated 5 years ago

Was this helpful?