
class bitorch_engine.layers.qlinear.nbit.layer.MPQWeightParameter(data=None, requires_grad: bool = True, privileged_grad: Tensor | None = None, scales: Tensor | None = None, zeros: Tensor | None = None, g_idx: Tensor | None = None, w_bit: int = -1, asym: bool = False, group_size: int = -1, layer_type: int = -1, q_perm: Tensor | None = None, qscales_zeros: Tensor | None = None, qscales_scales: Tensor | None = None, qzeros_zeros: Tensor | None = None, qzeros_scales: Tensor | None = None, q_group_map: Tensor | None = None, rows: list | None = None)[source]

A custom parameter class for quantized weights, extending torch.nn.Parameter, with additional attributes specific to quantization.


Optional tensor for privileged gradients (not used in standard backpropagation).

scales, zeros

Quantization scales and zero points for the affine quantization.


Group index for weight quantization.


Bit-width for weight quantization.


Flag to indicate if asymmetric quantization is used.


The size of quantization groups.


Type of layer (e.g., MPQLinear: 1, MBWQLinear: 2).


Permutation indices for quantization groups.

qscales_zeros, qscales_scales, qzeros_zeros, qzeros_scales

Additional quantization parameters for calculating (q)scales and (q)zeros.


Mapping from weights to quantization groups.


Storing rows information for each bit-width in the quantized weight matrix.

  • data (Tensor, optional) – Parameter tensor.

  • requires_grad (bool, optional) – If the parameter requires gradient. Default: True.

  • optional. (The rest of the parameters are specific to the quantization process and are) –




This method defines how to update quantized weights with quantized gradients.


__init__(data: Tensor | None = None, requires_grad: bool = True, privileged_grad: Tensor | None = None, scales: Tensor | None = None, zeros: Tensor | None = None, g_idx: Tensor | None = None, w_bit: int = -1, asym: bool = False, group_size: int = -1, layer_type: int = -1, q_perm: Tensor | None = None, qscales_zeros: Tensor | None = None, qscales_scales: Tensor | None = None, qzeros_zeros: Tensor | None = None, qzeros_scales: Tensor | None = None, q_group_map: Tensor | None = None, rows: list | None = None)[source]
static update(qweight: Parameter, exp_avg_s: Tensor | None = None, exp_avg_l: Tensor | None = None, step: Tensor | None = None, lr: float = 0.0001, weight_decay: float = 0.0, beta1: float = 0.99, beta2: float = 0.9999, eps: float = 1e-06, dtype=torch.float16, correct_bias=None, projector=None, grad: Tensor | None = None) None[source]

This method defines how to update quantized weights with quantized gradients. It may involve operations such as applying momentum or adjusting weights based on some optimization algorithm.

  • qweight (torch.nn.Parameter) – The current quantized weight parameter to be updated.

  • exp_avg_s (torch.Tensor, optional) – Exponential moving average of squared gradients. Used in optimization algorithms like Adam.

  • exp_avg_l (torch.Tensor, optional) – Exponential moving average of the gradients. Also used in optimizers like Adam.

  • step (torch.Tensor, optional) – The current step or iteration in the optimization process. Can be used to adjust learning rate or for other conditional operations in the update process.

  • lr (float, optional) – Learning rate. A hyperparameter that determines the step size at each iteration while moving toward a minimum of a loss function.

  • weight_decay (float, optional) – Weight decay (L2 penalty). A regularization term that helps to prevent overfitting by penalizing large weights.

  • beta1 (float, optional) – The exponential decay rate for the first moment estimates. A hyperparameter for optimizers like Adam.

  • beta2 (float, optional) – The exponential decay rate for the second-moment estimates. Another hyperparameter for Adam-like optimizers.

  • eps (float, optional) – A small constant for numerical stability.

  • dtype (torch.dtype, optional) – The data type to be used for computations.

  • correct_bias (optional) – Whether to apply bias correction (specific to certain models like BERT).

  • projector (optinal) – Whether use a gradient projector.

  • grad (optional) – gradient tensor will be used if projector used.


The function is expected to update the qweight in-place and does not return anything.

Return type:



NotImplementedError – Indicates that the function has not yet been implemented.