bitorch_engine.layers.qlinear.nbit.cuda.utils.pack_fp_weight

bitorch_engine.layers.qlinear.nbit.cuda.utils.pack_fp_weight(weight: Tensor, qweight: MPQWeightParameter, unpacked_zeros: Tensor | None = None) → Tensor[source]

Packs the fp16 weight into a quantized weight format using the attributes defined in the QweightParameter.

This function handles three main scenarios:

GPTQ style quantization with group index (g_index).
GPTQ style quantization without g_index.
Mixed-bit quantization (currently not implemented).

Parameters:

weight (torch.Tensor) – The floating-point weights to be quantized and packed.
qweight (MPQWeightParameter) – An object containing quantization parameters.

Returns:

The packed integer tensor representing the quantized weights.

Return type:

torch.Tensor

Raises:

ValueError – If ‘layer_type’ attribute is invalid or not present.
NotImplementedError – For unimplemented quantization methods, like mixed-bit quantization.