bitorch_engine.layers.qlinear.layer.QLinearInf
- class bitorch_engine.layers.qlinear.layer.QLinearInf(input_features: int, out_features: int, device=None, a_bit: int = 1, w_bit: int = 1, bias=False)[source]
QLinearInf is a class for quantized linear layers optimized for inference. It inherits from QLinearImplementationMixin and BinaryLinearBase to utilize quantization functionalities and binary linear operations.
This class specifically handles inference operations with quantized weights, potentially using different bit widths for activations and weights.
Methods
Initializes the QLinearInf layer with specified input and output feature dimensions, quantization bit widths, and device.
Creates a clone of the layer from a given recipe, adjusting input feature dimensions and setting up quantization parameters based on the recipe's specifications.
Forwards the input tensor x through the quantized linear layer, performing the linear operation with quantized weights.
Generates and sets the quantized weights for the layer, optionally focusing only on the quantized weights without affecting the original weights.
Prepares the parameters of the layer for quantization and inference, calling the corresponding method of the underlying binary or n-bit linear layer.
Sets the quantized weight data for the layer.
Sets the weight data for the layer.
Attributes
Property to access the optimized weight tensor of the layer, which may include quantized or otherwise transformed weights for efficient inference.
Property to access the weight tensor of the layer.
training
- __init__(input_features: int, out_features: int, device=None, a_bit: int = 1, w_bit: int = 1, bias=False) None [source]
Initializes the QLinearInf layer with specified input and output feature dimensions, quantization bit widths, and device. Currently, bias is not supported and must be False.
- Parameters:
input_features (int) – The dimension of input features after bit-packing.
out_features (int) – The dimension of output features (hidden states).
device (optional) – The device on which to initialize the layer. Defaults to None.
a_bit (int, optional) – Bit width for activation quantization. Defaults to 1.
w_bit (int, optional) – Bit width for weight quantization. Defaults to 1.
bias (bool, optional) – Indicates if bias is used. Currently must be False.
- Raises:
AssertionError – If bias is set to True.
- classmethod create_clone_from(recipe: LayerRecipe, device: device | None = None) Any [source]
Creates a clone of the layer from a given recipe, adjusting input feature dimensions and setting up quantization parameters based on the recipe’s specifications.
- Parameters:
recipe (LayerRecipe) – A configuration object containing layer specifications.
device (torch.device, optional) – The device on which to create the layer. Defaults to None.
- Returns:
An instance of the cloned layer with quantization applied.
- Return type:
Any
- forward(x: Tensor) Tensor [source]
Forwards the input tensor x through the quantized linear layer, performing the linear operation with quantized weights.
- Parameters:
x (torch.Tensor) – The input tensor to forward through the layer.
- Returns:
The output tensor after passing through the layer.
- Return type:
torch.Tensor
- generate_quantized_weight(qweight_only: bool = False) None [source]
Generates and sets the quantized weights for the layer, optionally focusing only on the quantized weights without affecting the original weights.
- Parameters:
qweight_only (bool, optional) – If True, only quantized weights are generated. Defaults to False.
- property opt_weight
Property to access the optimized weight tensor of the layer, which may include quantized or otherwise transformed weights for efficient inference.
- Returns:
The optimized weight tensor.
- Return type:
torch.Tensor
- prepare_params() None [source]
Prepares the parameters of the layer for quantization and inference, calling the corresponding method of the underlying binary or n-bit linear layer.
- set_quantized_weight_data(x: Tensor)[source]
Sets the quantized weight data for the layer.
- Parameters:
x (torch.Tensor) – The tensor containing the quantized weight data.
- set_weight_data(x: Tensor)[source]
Sets the weight data for the layer.
- Parameters:
x (torch.Tensor) – The tensor containing the weight data.
- property weight
Property to access the weight tensor of the layer.
- Returns:
The weight tensor.
- Return type:
torch.Tensor