An implementation of parallel stochastic gradient descent using the lock-free HOGWILD! approach. More...
Public Member Functions | |
| ParallelSGD (const size_t maxIterations, const size_t threadShareSize, const double tolerance=1e-5, const bool shuffle=true, const DecayPolicyType &decayPolicy=DecayPolicyType()) | |
| Construct the parallel SGD optimizer to optimize the given function with the given parameters. More... | |
| DecayPolicyType & | DecayPolicy () const |
| Get the step size decay policy. More... | |
| DecayPolicyType & | DecayPolicy () |
| Modify the step size decay policy. More... | |
| size_t | MaxIterations () const |
| Get the maximum number of iterations (0 indicates no limits). More... | |
| size_t & | MaxIterations () |
| Modify the maximum number of iterations (0 indicates no limits). More... | |
template < typename SparseFunctionType > | |
| double | Optimize (SparseFunctionType &function, arma::mat &iterate) |
| Optimize the given function using the parallel SGD algorithm. More... | |
template < > | |
| double | Optimize (mlpack::svd::RegularizedSVDFunction< arma::mat > &function, arma::mat ¶meters) |
| bool | Shuffle () const |
| Get whether or not the individual functions are shuffled. More... | |
| bool & | Shuffle () |
| Modify whether or not the individual functions are shuffled. More... | |
| size_t | ThreadShareSize () const |
| Get the number of datapoints to be processed in one iteration by each thread. More... | |
| size_t & | ThreadShareSize () |
| Modify the number of datapoints to be processed in one iteration by each thread. More... | |
| double | Tolerance () const |
| Get the tolerance for termination. More... | |
| double & | Tolerance () |
| Modify the tolerance for termination. More... | |
An implementation of parallel stochastic gradient descent using the lock-free HOGWILD! approach.
For more information, see the following. {1106.5730, Author = {Feng Niu and Benjamin Recht and Christopher Re and Stephen J. Wright}, Title = {HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent}, Year = {2011}, Eprint = {arXiv:1106.5730}, }
For Parallel SGD to work, a SparseFunctionType template parameter is required. This class must implement the following functions:
size_t NumFunctions(); double Evaluate(const arma::mat& coordinates, const size_t i); void Gradient(const arma::mat& coordinates, const size_t i, arma::sp_mat& gradient);
In these functions the parameter id refers to which individual function (or gradient) is being evaluated. In case of a data-dependent function, the id would refer to the index of the datapoint(or training example). The data is distributed uniformly among the threads made available to the program by the OpenMP runtime.
The Gradient function interface is slightly changed from the DecomposableFunctionType interface, it takes in a sparse matrix as the out-param for the gradient, as ParallelSGD is only expected to be relevant in situations where the computed gradient is sparse.
| DecayPolicyType | Step size update policy used by parallel SGD to update the stepsize after each iteration. |
Definition at line 60 of file parallel_sgd.hpp.
| ParallelSGD | ( | const size_t | maxIterations, |
| const size_t | threadShareSize, | ||
| const double | tolerance = 1e-5, |
||
| const bool | shuffle = true, |
||
| const DecayPolicyType & | decayPolicy = DecayPolicyType() |
||
| ) |
Construct the parallel SGD optimizer to optimize the given function with the given parameters.
One iteration means one batch of datapoints processed by each thread.
The defaults here are not necessarily good for the given problem, so it is suggested that the values used be tailored to the task at hand.
| maxIterations | Maximum number of iterations allowed (0 means no limit). |
| threadShareSize | Number of datapoints to be processed in one iteration by each thread. |
| tolerance | Maximum absolute tolerance to terminate the algorithm. |
| shuffle | If true, the function order is shuffled; otherwise, each function is visited in linear order. |
| decayPolicy | The step size update policy to use. |
|
inline |
Get the step size decay policy.
Definition at line 123 of file parallel_sgd.hpp.
|
inline |
Modify the step size decay policy.
Definition at line 125 of file parallel_sgd.hpp.
|
inline |
Get the maximum number of iterations (0 indicates no limits).
Definition at line 101 of file parallel_sgd.hpp.
|
inline |
Modify the maximum number of iterations (0 indicates no limits).
Definition at line 103 of file parallel_sgd.hpp.
| double Optimize | ( | SparseFunctionType & | function, |
| arma::mat & | iterate | ||
| ) |
Optimize the given function using the parallel SGD algorithm.
The given starting point will be modified to store the finishing point of the algorithm, and the value of the loss function at the final point is returned.
| SparseFunctionType | Type of function to be optimized. |
| function | Function to be optimized(minimized). |
| iterate | Starting point(will be modified). |
|
inline |
|
inline |
Get whether or not the individual functions are shuffled.
Definition at line 118 of file parallel_sgd.hpp.
|
inline |
Modify whether or not the individual functions are shuffled.
Definition at line 120 of file parallel_sgd.hpp.
|
inline |
Get the number of datapoints to be processed in one iteration by each thread.
Definition at line 107 of file parallel_sgd.hpp.
|
inline |
Modify the number of datapoints to be processed in one iteration by each thread.
Definition at line 110 of file parallel_sgd.hpp.
|
inline |
Get the tolerance for termination.
Definition at line 113 of file parallel_sgd.hpp.
|
inline |
Modify the tolerance for termination.
Definition at line 115 of file parallel_sgd.hpp.