bitorch_engine.layers.qlinear.nbit.cuda.utils.pack_fp_weight
- bitorch_engine.layers.qlinear.nbit.cuda.utils.pack_fp_weight(weight: Tensor, qweight: MPQWeightParameter, unpacked_zeros: Tensor | None = None) Tensor [source]
Packs the fp16 weight into a quantized weight format using the attributes defined in the QweightParameter.
- This function handles three main scenarios:
GPTQ style quantization with group index (g_index).
GPTQ style quantization without g_index.
Mixed-bit quantization (currently not implemented).
- Parameters:
weight (torch.Tensor) – The floating-point weights to be quantized and packed.
qweight (MPQWeightParameter) – An object containing quantization parameters.
- Returns:
The packed integer tensor representing the quantized weights.
- Return type:
torch.Tensor
- Raises:
ValueError – If ‘layer_type’ attribute is invalid or not present.
NotImplementedError – For unimplemented quantization methods, like mixed-bit quantization.