bitorch_engine.functions.cuda.functions

Functions

fp32toint4

Converts a 32-bit floating point tensor to a 4-bit integer representation.

q4_pack_tensor

Packs a tensor into a 4-bit packed format using CUDA accelerated functions.

q4_unpack_and_scaling_tensor

Unpacks a tensor that has been previously packed using 4-bit quantization into its original format.

q4_unpack_tensor

Unpacks a tensor that has been previously packed using 4-bit quantization into its original format.

tensor_to_packed_uint8

Packs the given tensor into an 8-bit unsigned integer tensor representation on CUDA.

unpack_uint8_tensor

Unpacks an 8-bit unsigned integer tensor into a floating-point tensor using CUDA, scaling the unpacked values by a provided scale tensor.