bitorch_engine.optim.diode_beta.DiodeMix

class bitorch_engine.optim.diode_beta.DiodeMix(params: Iterable[Parameter], lr: float = 0.0001, betas: Tuple[float, float] = (0.99, 0.9999), eps: float = 1e-06, weight_decay: float = 0.0, correct_bias: bool = True, dtype: dtype = torch.float32)[source]

DiodeMix is a custom optimizer designed for efficient optimization, leveraging adaptive learning rates and momentum. It is particularly suited for deep learning tasks involving parameters with binary, n-bit, or standard floating-point values. This implementation is based on “Diode: Reinventing Binary Neural Networks Training with Sign Descent Optimization” Guo, Nianhui et al., 2024.

params

Iterable of parameters to optimize

Type:: Iterable[nn.parameter.Parameter]

lr

Learning rate. Defaults to 1e-4.

Type:: float, optional

betas

Coefficients used for computing running averages of gradient and its square. Defaults to (0.99, 0.9999).

Type:: Tuple[float, float], optional

eps

Term added to the denominator to improve numerical stability. Defaults to 1e-6.

Type:: float, optional

weight_decay

Weight decay (L2 penalty). Defaults to 0.0.

Type:: float, optional

correct_bias

Whether to correct bias in adaptive learning rate. Defaults to True.

Type:: bool, optional

dtype

data type will be used in qweight update computation

Type:: torch.dtype

__init__()[source]: Initializes the optimizer with the given parameters and options.

step(closure=None)[source]: Performs a single optimization step.

Raises:: ValueError – If any of the parameters (learning rate, betas, epsilon) are out of their expected range.

Note

This optimizer includes checks to ensure the learning rate and beta values are within valid ranges, raising ValueError if not. It supports sparse gradients with a specific error message guiding the user towards using SparseAdam instead if needed.

Methods

`__init__`
`step`	Performs a single optimization step.

Attributes

__init__(params: Iterable[Parameter], lr: float = 0.0001, betas: Tuple[float, float] = (0.99, 0.9999), eps: float = 1e-06, weight_decay: float = 0.0, correct_bias: bool = True, dtype: dtype = torch.float32)[source]

step(closure: Callable = None)[source]

Performs a single optimization step.

Parameters:: closure (Callable, optional) – A closure that reevaluates the model and returns the loss. Defaults to None.
Returns:: The loss from the closure call, if any.

Note

This method updates the weights of the parameters based on the gradients. It handles different types of parameters (binary, n-bit, or floating-point) differently to optimize their weights efficiently. For binary and n-bit parameters, it updates quantized weights directly. For standard floating-point parameters, it applies Adam-like updates.