The experiments showed that the tradeoff between accuracy and
quantization differs in each layer. We call this layer sensitivity. For example, middle layers are much more robust to quantization
whilst also housing majority of the weights
As shown in Figure, majority of the MAC operations are conducted
in the second, third and fourth convolutional layers.
Using mixed2 scheme, more than 3 times saves is achieved at the
MAC operations. Hence, the energy consumption and latency
numbers can also be significantly reduced with only sacrificing
0.2% accuracy.
In the next Figure , the distribution of weights per layer is shown.
It is clear that the majority of weights reside in the first dense
layer, followed by the fourth convolutional layer.
These are also some of the least sensitive layers. Meaning, by
deeply quantizing these layers, majority of the memory space
can be saved
No comments:
Post a Comment