bitorch_engine.functions.cuda.functions.q4_pack_tensor

bitorch_engine.functions.cuda.functions.q4_pack_tensor(input: Tensor, is_transpose: bool = False) Tensor[source]

Packs a tensor into a 4-bit packed format using CUDA accelerated functions.

This function takes an input tensor and optionally transposes it before packing. The packing process reduces the storage requirement by representing each value in the tensor with only 4 bits. This is particularly useful for quantized neural network weights and other scenarios where precision can be traded for storage efficiency without significantly affecting the application’s performance.

The actual packing is performed by a CUDA-accelerated function for efficiency, making this function suitable for large tensors.

Parameters:
  • input (torch.Tensor) – The input tensor to be packed. This tensor should be in a compatible format (int32) where each value can be represented in 4 bits.

  • is_transpose (bool) – If True, the tensor will be transposed before packing. This is useful if the packed tensor needs to be in a specific orientation for subsequent operations.

Returns:

A tensor containing the 4-bit packed representation of the

input tensor. The returned tensor will have a dtype of int8 and potentially half the number of elements in the last dimension of the input tensor if is_transpose is False. If is_transpose is True and the transposition changes the tensor’s shape, the returned tensor’s shape will be adjusted accordingly.

Return type:

torch.Tensor