bitorch_engine.layers.qlinear.layer.QLinearInf

class bitorch_engine.layers.qlinear.layer.QLinearInf(input_features: int, out_features: int, device=None, a_bit: int = 1, w_bit: int = 1, bias=False)[source]

QLinearInf is a class for quantized linear layers optimized for inference. It inherits from QLinearImplementationMixin and BinaryLinearBase to utilize quantization functionalities and binary linear operations.

This class specifically handles inference operations with quantized weights, potentially using different bit widths for activations and weights.

Methods

__init__

Initializes the QLinearInf layer with specified input and output feature dimensions, quantization bit widths, and device.

create_clone_from

Creates a clone of the layer from a given recipe, adjusting input feature dimensions and setting up quantization parameters based on the recipe's specifications.

forward

Forwards the input tensor x through the quantized linear layer, performing the linear operation with quantized weights.

generate_quantized_weight

Generates and sets the quantized weights for the layer, optionally focusing only on the quantized weights without affecting the original weights.

prepare_params

Prepares the parameters of the layer for quantization and inference, calling the corresponding method of the underlying binary or n-bit linear layer.

set_quantized_weight_data

Sets the quantized weight data for the layer.

set_weight_data

Sets the weight data for the layer.

Attributes

opt_weight

Property to access the optimized weight tensor of the layer, which may include quantized or otherwise transformed weights for efficient inference.

weight

Property to access the weight tensor of the layer.

training

__init__(input_features: int, out_features: int, device=None, a_bit: int = 1, w_bit: int = 1, bias=False) None[source]

Initializes the QLinearInf layer with specified input and output feature dimensions, quantization bit widths, and device. Currently, bias is not supported and must be False.

Parameters:
  • input_features (int) – The dimension of input features after bit-packing.

  • out_features (int) – The dimension of output features (hidden states).

  • device (optional) – The device on which to initialize the layer. Defaults to None.

  • a_bit (int, optional) – Bit width for activation quantization. Defaults to 1.

  • w_bit (int, optional) – Bit width for weight quantization. Defaults to 1.

  • bias (bool, optional) – Indicates if bias is used. Currently must be False.

Raises:

AssertionError – If bias is set to True.

classmethod create_clone_from(recipe: LayerRecipe, device: device | None = None) Any[source]

Creates a clone of the layer from a given recipe, adjusting input feature dimensions and setting up quantization parameters based on the recipe’s specifications.

Parameters:
  • recipe (LayerRecipe) – A configuration object containing layer specifications.

  • device (torch.device, optional) – The device on which to create the layer. Defaults to None.

Returns:

An instance of the cloned layer with quantization applied.

Return type:

Any

forward(x: Tensor) Tensor[source]

Forwards the input tensor x through the quantized linear layer, performing the linear operation with quantized weights.

Parameters:

x (torch.Tensor) – The input tensor to forward through the layer.

Returns:

The output tensor after passing through the layer.

Return type:

torch.Tensor

generate_quantized_weight(qweight_only: bool = False) None[source]

Generates and sets the quantized weights for the layer, optionally focusing only on the quantized weights without affecting the original weights.

Parameters:

qweight_only (bool, optional) – If True, only quantized weights are generated. Defaults to False.

property opt_weight

Property to access the optimized weight tensor of the layer, which may include quantized or otherwise transformed weights for efficient inference.

Returns:

The optimized weight tensor.

Return type:

torch.Tensor

prepare_params() None[source]

Prepares the parameters of the layer for quantization and inference, calling the corresponding method of the underlying binary or n-bit linear layer.

set_quantized_weight_data(x: Tensor)[source]

Sets the quantized weight data for the layer.

Parameters:

x (torch.Tensor) – The tensor containing the quantized weight data.

set_weight_data(x: Tensor)[source]

Sets the weight data for the layer.

Parameters:

x (torch.Tensor) – The tensor containing the weight data.

property weight

Property to access the weight tensor of the layer.

Returns:

The weight tensor.

Return type:

torch.Tensor