bitorch_engine.utils.quant_operators.q4_quantization

bitorch_engine.utils.quant_operators.q4_quantization(input: Tensor, scale_a: Tensor | None = None, eps: Tensor | None = None) Tensor[source]

Quantizes an input tensor to 4-bit integers using uniform quantization.

The function first ensures that the input tensor is of floating-point type. It then adjusts the scale factor scale_a to avoid division by values too close to zero, applying a lower threshold defined by eps. The quantization process scales the input tensor by the inverse of scale_a, rounds the result to the nearest integer, and clamps the values to the 4-bit range [-8, 7].

Parameters:
  • input (torch.Tensor) – The input tensor to be quantized. Should ideally be of floating-point type.

  • scale_a (torch.Tensor) – The scale factor for quantization. Each element in scale_a scales the corresponding element in input.

  • eps (torch.Tensor) – A small positive tensor used to prevent division by zero or values too close to zero in the scale factor.

Returns:

The quantized tensor, with values rounded and clamped to fit within

the 4-bit integer range.

Return type:

torch.Tensor