Hire a web Developer and Designer to upgrade and boost your online presence with cutting edge Technologies

Tuesday 25 April 2023

Quantized Neural Networks (QNN)

 Quantization is a common way to reduce the demand on hardware.

When the activations are quantized, the number of MAC operations
vastly
reduces, resulting in with a better latency and energy
consumption
.
On the other hand, weight quantization decreases both memory
footprint
and the number of MAC operations, also helping with area
reduction
.
To obtain independent quantization of trainable parameters, QKeras
library
is used. Mathematically, the mantissa quantization for a give
input
x is: [3]
Previous studies have been done on 8-bit quantization schemes
and
other fixed lower precision levels. [4]
Experiments have been conducted using a light-weight network on
the
CIFAR10 dataset [5].
Adapting an intra-layer mixed quantization training technique for
both
weights and activations, with respect to layer sensitivities, a
memory
reduction of 2/8 times and a number of MAC operation
reduction
of 2/30 times can be achieved compared to their
8
bit/FP32 counterparts while sacrificing virtually no accuracy
against
8bit and around 2% against the FP32 model

No comments:

Post a Comment

Connect broadband

How to Encode Text Data for Machine Learning with scikit-learn

  Text data requires special preparation before you can start using it for predictive modeling. The text must be parsed to remove words, cal...