bitorch_engine.layers.qlinear.nbit.cuda.utils.pack_fp_weight

bitorch_engine.layers.qlinear.nbit.cuda.utils.pack_fp_weight(weight: Tensor, qweight: MPQWeightParameter, unpacked_zeros: Tensor | None = None) Tensor[source]

Packs the fp16 weight into a quantized weight format using the attributes defined in the QweightParameter.

This function handles three main scenarios:
  1. GPTQ style quantization with group index (g_index).

  2. GPTQ style quantization without g_index.

  3. Mixed-bit quantization (currently not implemented).

Parameters:
  • weight (torch.Tensor) – The floating-point weights to be quantized and packed.

  • qweight (MPQWeightParameter) – An object containing quantization parameters.

Returns:

The packed integer tensor representing the quantized weights.

Return type:

torch.Tensor

Raises:
  • ValueError – If ‘layer_type’ attribute is invalid or not present.

  • NotImplementedError – For unimplemented quantization methods, like mixed-bit quantization.