Tuesday, 25 April 2023

Quantized Neural Networks (QNN)

 Quantization is a common way to reduce the demand on hardware.

When the activations are quantized, the number of MAC operations
vastly
reduces, resulting in with a better latency and energy
consumption
.
On the other hand, weight quantization decreases both memory
footprint
and the number of MAC operations, also helping with area
reduction
.
To obtain independent quantization of trainable parameters, QKeras
library
is used. Mathematically, the mantissa quantization for a give
input
x is: [3]
Previous studies have been done on 8-bit quantization schemes
and
other fixed lower precision levels. [4]
Experiments have been conducted using a light-weight network on
the
CIFAR10 dataset [5].
Adapting an intra-layer mixed quantization training technique for
both
weights and activations, with respect to layer sensitivities, a
memory
reduction of 2/8 times and a number of MAC operation
reduction
of 2/30 times can be achieved compared to their
8
bit/FP32 counterparts while sacrificing virtually no accuracy
against
8bit and around 2% against the FP32 model

No comments:

Post a Comment

Connect broadband

AI:List the estimated population in the sky scraping building and in the middle class families and lower backward families and number of male female disputes every year and mental illness cases

  Here’s an overview of the estimated population in skyscraper buildings, middle-class and backward families, as well as data on male-femal...