bitorch_engine.layers.qconv.nbit.cutlass.layer.Q4Conv2dCutlass
- class bitorch_engine.layers.qconv.nbit.cutlass.layer.Q4Conv2dCutlass(*args, **kwargs)[source]
A specialized 4-bit quantized convolutional layer using CUTLASS kernels.
This class extends nBitConv2dBase to implement a 4-bit quantized convolution layer optimized for CUTLASS. It supports quantization for both weights and activations, aiming to reduce model size and improve computational efficiency while maintaining accuracy.
- bias_a
Bias parameter for activation quantization.
- Type:
torch.nn.Parameter
- scale_a
Scale parameter for activation quantization.
- Type:
torch.nn.Parameter
- scale_w
Scale parameter for weight quantization.
- Type:
torch.nn.Parameter
- eps
A small epsilon value to prevent division by zero in calculations.
- Type:
torch.Tensor
Methods
Initializes the Q4Conv2dCutlass layer with provided arguments.
Defines the forward pass of the 4-bit quantized convolution using CUTLASS.
Performs weight quantization and optionally releases the floating-point weights.
Prepares and initializes the model parameters for training.
Calculates scale of input and shift the input using a learnable bias_a.
Attributes
training
- __init__(*args, **kwargs)[source]
Initializes the Q4Conv2dCutlass layer with provided arguments.
- Parameters:
*args – Variable length argument list for base class.
**kwargs – Arbitrary keyword arguments for base class.
- forward(x: Tensor) Tensor [source]
Defines the forward pass of the 4-bit quantized convolution using CUTLASS.
- Parameters:
x (torch.Tensor) – Input tensor of shape (N, C, H, W).
- Returns:
The output tensor of the convolution operation.
- generate_quantized_weight(qweight_only: bool = False) None [source]
Performs weight quantization and optionally releases the floating-point weights.
This method should be called before saving the model weights, especially for inference.
- Parameters:
qweight_only (bool) – If True, releases the floating-point weight after quantization.
inference. (It will save runtime memory for) –
- prepare_params() None [source]
Prepares and initializes the model parameters for training.
Note
This method MUST be called after model initialization and before training starts to ensure the weights are properly prepared for efficient computation.
One can use “prepare_bie_layers” method from project_root.utils.model_helper to call this function.