bitorch_engine.optim.diode_beta.DiodeMix
- class bitorch_engine.optim.diode_beta.DiodeMix(params: Iterable[Parameter], lr: float = 0.0001, betas: Tuple[float, float] = (0.99, 0.9999), eps: float = 1e-06, weight_decay: float = 0.0, correct_bias: bool = True, dtype: dtype = torch.float32)[source]
DiodeMix is a custom optimizer designed for efficient optimization, leveraging adaptive learning rates and momentum. It is particularly suited for deep learning tasks involving parameters with binary, n-bit, or standard floating-point values. This implementation is based on “Diode: Reinventing Binary Neural Networks Training with Sign Descent Optimization” Guo, Nianhui et al., 2024.
- params
Iterable of parameters to optimize
- Type:
Iterable[nn.parameter.Parameter]
- lr
Learning rate. Defaults to 1e-4.
- Type:
float, optional
- betas
Coefficients used for computing running averages of gradient and its square. Defaults to (0.99, 0.9999).
- Type:
Tuple[float, float], optional
- eps
Term added to the denominator to improve numerical stability. Defaults to 1e-6.
- Type:
float, optional
- weight_decay
Weight decay (L2 penalty). Defaults to 0.0.
- Type:
float, optional
- correct_bias
Whether to correct bias in adaptive learning rate. Defaults to True.
- Type:
bool, optional
- dtype
data type will be used in qweight update computation
- Type:
torch.dtype
- Raises:
ValueError – If any of the parameters (learning rate, betas, epsilon) are out of their expected range.
Note
This optimizer includes checks to ensure the learning rate and beta values are within valid ranges, raising ValueError if not. It supports sparse gradients with a specific error message guiding the user towards using SparseAdam instead if needed.
Methods
Performs a single optimization step.
Attributes
- __init__(params: Iterable[Parameter], lr: float = 0.0001, betas: Tuple[float, float] = (0.99, 0.9999), eps: float = 1e-06, weight_decay: float = 0.0, correct_bias: bool = True, dtype: dtype = torch.float32)[source]
- step(closure: Callable = None)[source]
Performs a single optimization step.
- Parameters:
closure (Callable, optional) – A closure that reevaluates the model and returns the loss. Defaults to None.
- Returns:
The loss from the closure call, if any.
Note
This method updates the weights of the parameters based on the gradients. It handles different types of parameters (binary, n-bit, or floating-point) differently to optimize their weights efficiently. For binary and n-bit parameters, it updates quantized weights directly. For standard floating-point parameters, it applies Adam-like updates.