bitorch_engine.layers.qlinear.nbit.cuda.utils

Functions

make_group_map

Creates a mapping of quantization groups for handling irregular group sizes in quantized models.

pack_fp_weight

Packs the fp16 weight into a quantized weight format using the attributes defined in the QweightParameter.

unpack_qweight

Reconstructs the fp16 weight tensor from the input quantized weight parameter.