GoogLeNet

1. Architecture

GoogLeNet can go very deep. It involves 3 techniques to make this happen:

  • 1x1 Convolution layer

  • Inception Module

  • Global Average Pooling

2. 1x1 Convolution

The 1x1 convolution layer is used as a dimension reduction module to reduce the computation. By reducing the computation bottleneck, depth and width can be increased.

  • 5x5 kernel convolution without 1x1 CONV kernel:

Number of operations = (14x14x48) x (5x5x480) = 112.9 M

  • 1x1 CONV kernel:

    • Number of operations for 1x1 kernel = (14x14x16) x (1x1x480) = 1.5 M

    • Number of operations for 5x5 kernel = (14x14x48) x (5x5x16) = 3.8 M

==> Indeed, we map operations from high dimension to low dimension in a non-linear way

3. Inception Module

The inception module is used to extract different kinds of features from the same layer.

Inception model (No 1x1 convolution)

The 1×1 conv, 3×3 conv, 5×5 conv, and 3×3 max pooling are done altogether for the previous input, and stack together again at output.

==> When image’s coming in, different sizes of convolutions as well as max pooling are tried.

To reduce the operations in the inception module, we used 1x1 convolution layers before the normal convolution layers:

Inception module (With 1x1 convolution)

4. Global Average Pooling (GAP)

Fully Connected Layer VS Global Average Pooling

For a fully connected (FC) layer, the number of weights = 7x7x1024x1024 = 51.3 M

For GoogLeNet, the global average pooling by averaging each feature map from 7x7 to 1x1. The number of weights = 0 M

5. Auxiliary Classifiers for training:

=> For combating gradient vanishing problem and provide regulations

The softmax branches at the middle are used for training only. The classifiers consist of

  • 5x5 average pooling (stride 3)

  • 1x1 CONV (128 filters)

  • 1024 FC

  • 1000 FC

  • Softmax

==> The loss is added to the total loss with weight 0.3

Last updated

Was this helpful?