AdaDelta is an optimizer that uses two ideas to improve upon the two main drawbacks of the Adagrad method: More...

Public Member Functions
	AdaDelta (const double stepSize=1.0, const size_t batchSize=32, const double rho=0.95, const double epsilon=1e-6, const size_t maxIterations=100000, const double tolerance=1e-5, const bool shuffle=true)
	Construct the AdaDelta optimizer with the given function and parameters. More...

size_t	BatchSize () const
	Get the batch size. More...

size_t &	BatchSize ()
	Modify the batch size. More...

double	Epsilon () const
	Get the value used to initialise the mean squared gradient parameter. More...

double &	Epsilon ()
	Modify the value used to initialise the mean squared gradient parameter. More...

size_t	MaxIterations () const
	Get the maximum number of iterations (0 indicates no limit). More...

size_t &	MaxIterations ()
	Modify the maximum number of iterations (0 indicates no limit). More...

template < typename DecomposableFunctionType >
double	Optimize (DecomposableFunctionType &function, arma::mat &iterate)
	Optimize the given function using AdaDelta. More...

double	Rho () const
	Get the smoothing parameter. More...

double &	Rho ()
	Modify the smoothing parameter. More...

bool	Shuffle () const
	Get whether or not the individual functions are shuffled. More...

bool &	Shuffle ()
	Modify whether or not the individual functions are shuffled. More...

double	StepSize () const
	Get the step size. More...

double &	StepSize ()
	Modify the step size. More...

double	Tolerance () const
	Get the tolerance for termination. More...

double &	Tolerance ()
	Modify the tolerance for termination. More...

Detailed Description

AdaDelta is an optimizer that uses two ideas to improve upon the two main drawbacks of the Adagrad method:

Accumulate Over Window
Correct Units with Hessian Approximation

For more information, see the following.

@article{Zeiler2012,
  author  = {Matthew D. Zeiler},
  title   = {{ADADELTA:} An Adaptive Learning Rate Method},
  journal = {CoRR},
  year    = {2012}
}

For AdaDelta to work, a DecomposableFunctionType template parameter is required. This class must implement the following function:

size_t NumFunctions(); double Evaluate(const arma::mat& coordinates, const size_t i, const size_t batchSize); void Gradient(const arma::mat& coordinates, const size_t i, arma::mat& gradient, const size_t batchSize);

NumFunctions() should return the number of functions ( $n$ ), and in the other two functions, the parameter i refers to which individual function (or gradient) is being evaluated. So, for the case of a data-dependent function, such as NCA (see mlpack::nca::NCA), NumFunctions() should return the number of points in the dataset, and Evaluate(coordinates, 0) will evaluate the objective function on the first point in the dataset (presumably, the dataset is held internally in the DecomposableFunctionType).

Definition at line 64 of file ada_delta.hpp.

Constructor & Destructor Documentation

◆ AdaDelta()

AdaDelta	(	const double	stepSize = `1.0`,
		const size_t	batchSize = `32`,
		const double	rho = `0.95`,
		const double	epsilon = `1e-6`,
		const size_t	maxIterations = `100000`,
		const double	tolerance = `1e-5`,
		const bool	shuffle = `true`
	)

Construct the AdaDelta optimizer with the given function and parameters.

The defaults here are not necessarily good for the given problem, so it is suggested that the values used be tailored to the task at hand. The maximum number of iterations refers to the maximum number of points that are processed (i.e., one iteration equals one point; one iteration does not equal one pass over the dataset).

Parameters

stepSize	Step size for each iteration.
batchSize	Number of points to process in one step.
rho	Smoothing constant.
epsilon	Value used to initialise the mean squared gradient parameter.
maxIterations	Maximum number of iterations allowed (0 means no limit).
tolerance	Maximum absolute tolerance to terminate algorithm.
shuffle	If true, the function order is shuffled; otherwise, each function is visited in linear order.

Member Function Documentation

◆ BatchSize() [1/2]

size_t BatchSize ( ) const

inline

Get the batch size.

Definition at line 117 of file ada_delta.hpp.

◆ BatchSize() [2/2]

size_t& BatchSize ( )

inline

Modify the batch size.

Definition at line 119 of file ada_delta.hpp.

◆ Epsilon() [1/2]

double Epsilon ( ) const

inline

Get the value used to initialise the mean squared gradient parameter.

Definition at line 127 of file ada_delta.hpp.

◆ Epsilon() [2/2]

double& Epsilon ( )

inline

Modify the value used to initialise the mean squared gradient parameter.

Definition at line 129 of file ada_delta.hpp.

◆ MaxIterations() [1/2]

size_t MaxIterations ( ) const

inline

Get the maximum number of iterations (0 indicates no limit).

Definition at line 132 of file ada_delta.hpp.

◆ MaxIterations() [2/2]

size_t& MaxIterations ( )

inline

Modify the maximum number of iterations (0 indicates no limit).

Definition at line 134 of file ada_delta.hpp.

◆ Optimize()

double Optimize	(	DecomposableFunctionType &	function,
		arma::mat &	iterate
	)

inline

Optimize the given function using AdaDelta.

The given starting point will be modified to store the finishing point of the algorithm, and the final objective value is returned. The DecomposableFunctionType is checked for API consistency at compile time.

Template Parameters

DecomposableFunctionType Type of the function to optimize.

Parameters

function	Function to optimize.
iterate	Starting point (will be modified).

Returns: Objective value of the final point.

Definition at line 106 of file ada_delta.hpp.

◆ Rho() [1/2]

double Rho ( ) const

inline

Get the smoothing parameter.

Definition at line 122 of file ada_delta.hpp.

◆ Rho() [2/2]

double& Rho ( )

inline

Modify the smoothing parameter.

Definition at line 124 of file ada_delta.hpp.

◆ Shuffle() [1/2]

bool Shuffle ( ) const

inline

Get whether or not the individual functions are shuffled.

Definition at line 142 of file ada_delta.hpp.

◆ Shuffle() [2/2]

bool& Shuffle ( )

inline

Modify whether or not the individual functions are shuffled.

Definition at line 144 of file ada_delta.hpp.

◆ StepSize() [1/2]

double StepSize ( ) const

inline

Get the step size.

Definition at line 112 of file ada_delta.hpp.

◆ StepSize() [2/2]

double& StepSize ( )

inline

Modify the step size.

Definition at line 114 of file ada_delta.hpp.

◆ Tolerance() [1/2]

double Tolerance ( ) const

inline

Get the tolerance for termination.

Definition at line 137 of file ada_delta.hpp.

◆ Tolerance() [2/2]

double& Tolerance ( )

inline

Modify the tolerance for termination.

Definition at line 139 of file ada_delta.hpp.

The documentation for this class was generated from the following file:

src/mlpack/core/optimizers/ada_delta/ada_delta.hpp

Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ AdaDelta()

Member Function Documentation

◆ BatchSize() [1/2]

◆ BatchSize() [2/2]

◆ Epsilon() [1/2]

◆ Epsilon() [2/2]

◆ MaxIterations() [1/2]

◆ MaxIterations() [2/2]

◆ Optimize()

◆ Rho() [1/2]

◆ Rho() [2/2]

◆ Shuffle() [1/2]

◆ Shuffle() [2/2]

◆ StepSize() [1/2]

◆ StepSize() [2/2]

◆ Tolerance() [1/2]

◆ Tolerance() [2/2]