Source code for bitorch_engine.layers.qlinear.binary.cuda.bmm
from enum import Enum
[docs]
class BMM(Enum):
"""
Enumeration for selecting the Bit-Matrix-Multiplication (BMM) kernel to be used during operations.
This allows for the choice of different underlying implementations based on the requirements or
optimizations desired for specific hardware or computational constraints.
Attributes:
BSTC32: Software-based Tensor Core implementation. This option utilizes a software-level implementation
to simulate tensor core operations, potentially offering more flexibility at the cost of raw performance.
BTC32: Bit-Matrix-Multiplication using NVIDIA Tensor Cores. This leverages hardware tensor cores for
accelerated computation, suitable for NVIDIA GPUs that support tensor core operations, offering
high performance for matrix multiplications.
ADAPTIVE: Automatically selects the best combination of kernel implementations based on the specific dimension
constraints of the inputs and weights. This option aims to optimize performance by considering
the characteristics of the computation and available hardware capabilities.
The choice of kernel can significantly affect the performance and efficiency of operations that involve
matrix multiplications, especially in deep learning models where such operations are prevalent.
"""
BSTC32 = 1 # software based tensor core implementation
BTC32 = 2 # Bit-Matrix-Multiplication using NVIDIA Tensor Cores
ADAPTIVE = 3 # Chooses the best kernel based on input and weight dimensions