bitorch_engine.layers.qlinear.nbit.cuda.utils.unpack_qweight

bitorch_engine.layers.qlinear.nbit.cuda.utils.unpack_qweight(qweight: MPQWeightParameter) Tensor[source]

Reconstructs the fp16 weight tensor from the input quantized weight parameter.

Parameters:

qweight (MPQWeightParameter) – The quantized weight parameter object containing all necessary quantization information.

Returns:

The reconstructed weight tensor in fp16 format.

Return type:

torch.Tensor

Raises:
  • ValueError – If essential attributes are missing in the input qweight parameter.

  • NotImplementedError – For quantization types that are not yet supported.

Supported quantization styles:
  1. GPTQ style with g_index.

  2. GPTQ style without g_index.

  3. Mixed-bit quantization.