bitorch_engine.functions.cuda.functions

Functions

`fp32toint4`	Converts a 32-bit floating point tensor to a 4-bit integer representation.
`q4_pack_tensor`	Packs a tensor into a 4-bit packed format using CUDA accelerated functions.
`q4_unpack_and_scaling_tensor`	Unpacks a tensor that has been previously packed using 4-bit quantization into its original format.
`q4_unpack_tensor`	Unpacks a tensor that has been previously packed using 4-bit quantization into its original format.
`tensor_to_packed_uint8`	Packs the given tensor into an 8-bit unsigned integer tensor representation on CUDA.
`unpack_uint8_tensor`	Unpacks an 8-bit unsigned integer tensor into a floating-point tensor using CUDA, scaling the unpacked values by a provided scale tensor.