bitorch_engine.layers.qlinear.nbit.layer

Classes

MPQLinearBase(in_channels, out_channels[, ...])

Base class for mixed precision quantized (MPQ) linear layers, designed to support the computational needs of large language models (LLMs) with mixed precision quantization, such as 16-bit activations and 4-bit weights for efficient inference.

MPQWeightParameter([data, requires_grad, ...])

A custom parameter class for quantized weights, extending torch.nn.Parameter, with additional attributes specific to quantization.

nBitLinearBase(in_channels, out_channels[, ...])

A base class for n-bit Quantization-Aware Training (QAT) linear layers.

nBitLinearParameter([data, requires_grad])

A custom parameter class for n-bit linear layer, extending torch.nn.Parameter.