Lazy optimizer is a term that refers to an optimization algorithm that updates the parameters of a model only when they are relevant for the current gradient computation. This can reduce the computational cost and memory usage of the optimization process, especially for models that have sparse features or large embeddings.
One example of a lazy optimizer is the LazyAdamOptimizer, which is a variant of the Adam optimizer that handles sparse gradients more efficiently. It is available in Tensorflow as tf.keras.optimizers.LazyAdam. It works by keeping track of the slots (i.e., the auxiliary variables) associated with each parameter, and updating them only when they are involved in a gradient update.
Another example of a lazy optimizer is the LazyNewton method, which is a second-order optimization algorithm that reuses a previously seen Hessian matrix for several iterations, while computing new gradients at each step. This can significantly reduce the arithmetical complexity of second-order optimization schemes, while still achieving fast convergence rates. The LazyNewton method can be combined with cubic or quadratic regularization techniques to ensure global or local superlinear convergence.