bitorch_engine.functions.cuda.functions.fp32toint4

bitorch_engine.functions.cuda.functions.fp32toint4(input: Tensor) Tensor[source]

Converts a 32-bit floating point tensor to a 4-bit integer representation.

This function takes an input tensor of floating point numbers and compresses it into a tensor of 4-bit integers, effectively reducing the memory footprint by a factor of 8. The conversion process involves finding the minimum and maximum values of the input to normalize the data range, and then quantizing the normalized values into 4-bit integers.

Parameters:

input (-) – A tensor of 32-bit floating point numbers that we want to compress.

Returns:

A tensor of 4-bit integers representing the quantized version of the input tensor.

The output tensor uses a 64-bit integer data type to store the 4-bit values, with each 64-bit integer holding sixteen 4-bit values.

Return type:

  • Tensor

Note

  • The input tensor is assumed to be a flat 1D tensor, and the output tensor will also be a 1D tensor.

  • This function is designed to be executed on CUDA-enabled devices and utilizes custom CUDA kernels for the quantization process.

  • The function allocates temporary memory on the GPU for intermediate computations, which is freed before returning the output tensor.