The field of optimization is vast and complex. In optimization problems, we have to find solutions which are optimal or near-optimal with respect to some goals. Usually, we are not able to solve problems in one step, but we follow some process which guides us through problem-solving. mlpack implements multiple strategies for optimizing different objective functions and provides different strategies for selecting and customizing the appropriate optimization algorithm. This tutorial discusses how to use each of the techniques that mlpack implements.
This visualization allows us to see how many popular optimizers perform on different optimization problems. Select a problem to optimize, then select an optimizer and tune its parameters, and see the steps that the optimizer takes plotted in pink. Note that you can zoom in and rotate the drawing. Also, you can compare how different optimizers perform on a given problem in the second graph. As you try a given problem with more optimizers, the objective function vs. the number of iterations is plotted for each optimizer you have tried.









Step-size: 0.01
Iterations: 4000
The defaults here are not necessarily good for the given problem, so it is suggested that the values used be tailored to the task at hand. (Use the mouse to zoom/drag the function.)
As intuition says, system has higher probability of staying in the states with a smaller stepsize. As the stepsize goes up, imbalance becomes stronger. When the stepsize is close to zero, the system stays in the state(s) with the highest cost.
Evaluations: Dynamic
A plot of the cost reveals distinct properties for each optimizer with its own style of convergence.
AdaDelta - AdaGrad - Adam - AdaGrad - Adam - AdaMax - AMSGrad - Nadam - CNE - IQN - L_BFGS - RMSProp - SMORMS3 - SPALeRASGD - SVRG - SVRG (Barzilai-Borwein) - SARAH - SARAH+ - Katyusha - CMAES
In order to facilitate consistent implementations, we have defined a FunctionType API that describes all the methods that an objective function may implement. mlpack offers a few variations of this API to cover different function characteristics. This leads to several different APIs for different function types:
Each of these types of objective functions require slightly different methods to be implemented. In some cases, methods will be automatically deduced by the optimizers using template metaprogramming and this allows the user to not need to implement every method for a given type of objective function. Each type described above is detailed in the following sections.
A function satisfying the FunctionType API is a general differentiable objective function. It must implement an Evaluate() and a Gradient() method. The interface used for that can be the following two methods:
However, there are some objective functions (like logistic regression) for which it is computationally convenient to calculate both the objective and the gradient at the same time. Therefore, optionally, the following function can be implemented:
It is not a problem to implement all three of these methods, but it is not obligatory to. EvaluateWithGradient() will automatically be inferred if it is not written from the Evaluate() and Gradient() functions; similarly, the Evaluate() and Gradient() functions will be inferred from EvaluateWithGradient() if they are not available. However, any automatically inferred method may be slightly slower.
The following optimizers use the FunctionType API:
A function satisfying the DecomposableFunctionType API is a differentiable objective function that can be decomposed into a number of separable objective functions. Examples of these types of objective functions include those that are a sum of loss on individual data points; so, common machine learning tasks like logistic regression or training a neural network can be expressed as optimizing a decomposable objective function.
Any function implementing the DecomposableFunctionAPI must implement the following four methods:
Note that the decomposable objective functions should support batch computation—this can allow significant speedups. The Shuffle() method shuffles the ordering of the functions. For some optimizers, randomness is an important component, so it is important that it is possible to shuffle the ordering of the decomposable functions.
As with the regular FunctionType API, it is optional to implement an EvaluateWithGradient() method in place of, or in addition to, the Evaluate() and Gradient() methods. The interface used for that should be the following:
The following mlpack optimizers require functions implementing the DecomposableFunctionType API:
A function satisfying the SparseFunctionType API is a decomposable differentiable objective function with the condition that a single individual gradient is sparse. The API is slightly different but similar to the DecomposableFunctionType API; the following methods are necessary:
The Shuffle() method is not needed for the SparseFunctionType API.
Note that it is possible to write a Gradient() method that accepts a template parameter GradType, which may be arma::mat (dense Armadillo matrix) or arma::sp_mat (sparse Armadillo matrix). This allows support for both the DecomposableFunctionType API and the SparseFunctionType API, as below:
The following mlpack optimizers require an objective function satisfying the SparseFunctionType API:
A function satisfying the NonDifferentiableFunctionType API is a general non-differentiable objective function. Only an Evaluate() method must be implemented, with the following signature:
The following mlpack optimizers require an objective function satisfying the NonDifferentiableFunctionType API:
A function satisfying the ConstrainedFunctionType API is a general differentiable objective function that has differentiable constraints for the parameters. This API is more complex than the others and requires five methods to be implemented:
The following mlpack optimizers require a ConstrainedFunctionType:
A function satisfying the NonDifferentiableDecomposableFunctionType API is a decomposable non-differentiable objective function. Only an Evaluate() and a NumFunctions() method must be implemented, with the following signatures:
The following mlpack optimizers require a NonDifferentiableDecomposableFunctionType:
A function satisfying the ResolvableFunctionType API is a partially differentiable objective function. For this API, three methods must be implemented, with the following signatures:
It is not required to templatize so that both sparse and dense gradients can be used, but it can be helpful.
The following mlpack optimizers require a ResolvableFunctionType:
An optimizer must implement only the method:
The Optimize() method optimizes the given function function, and stores the best set of parameters in the matrix parameters and returns the best objective value.
If the optimizer requires a given API from above, the following functions from src/mlpack/optimizers/function/static_checks.hpp can be helpful:
The example below constructs a simple function, where each dimension has a parabola with a distinct minimum. Note, in this example we maintain an ordering with the vector order; in other situations, such as training neural networks, we could simply shuffle the columns of the data matrix in Shuffle().
For the optimization of the defined ObjectiveFunction and other mlpack objective functions, we must implement only an Optimize() method, and a constructor to set some parameters. The code is given below.
Note for the sake of simplicity we omitted checks on the batch size (bs). This optimizer assumes that function.NumFunctions() is a multiple of the batch size, we also omitted other typical parts of real implementations, a more detailed example may be found in the complete API documentation". Still, SimpleOptimizer works with any mlpack objective function which implements Evaluate that can compute a partial objective function, and Gradient that can compute a part of the gradient starting with a separable function.
Finding the minimum of ObjectiveFunction or any other mlpack function can be done as shown below.
The final value of the objective function should be close to the optimal value, which is the sum of values at the vertices of the parabolas.
Further documentation for each Optimizer may be found in the complete API documentation. In addition, more information on the testing functions may be found in its complete API documentation.