bitorch_engine.layers.qlinear.nbit.layer
Classes
|
Base class for mixed precision quantized (MPQ) linear layers, designed to support the computational needs of large language models (LLMs) with mixed precision quantization, such as 16-bit activations and 4-bit weights for efficient inference. |
|
A custom parameter class for quantized weights, extending torch.nn.Parameter, with additional attributes specific to quantization. |
|
A base class for n-bit Quantization-Aware Training (QAT) linear layers. |
|
A custom parameter class for n-bit linear layer, extending torch.nn.Parameter. |