Implementation of prioritized experience replay. More...
Public Types | |
| using | ActionType = typename EnvironmentType::Action |
| Convenient typedef for action. More... | |
| using | StateType = typename EnvironmentType::State |
| Convenient typedef for state. More... | |
Public Member Functions | |
| PrioritizedReplay () | |
| Default constructor. More... | |
| PrioritizedReplay (const size_t batchSize, const size_t capacity, const double alpha, const size_t dimension=StateType::dimension) | |
| Construct an instance of prioritized experience replay class. More... | |
| void | BetaAnneal () |
| Annealing the beta. More... | |
| void | Sample (arma::mat &sampledStates, arma::icolvec &sampledActions, arma::colvec &sampledRewards, arma::mat &sampledNextStates, arma::icolvec &isTerminal) |
| Sample some experience according to their priorities. More... | |
| arma::ucolvec | SampleProportional () |
| Sample some experience according to their priorities. More... | |
| const size_t & | Size () |
| Get the number of transitions in the memory. More... | |
| void | Store (const StateType &state, ActionType action, double reward, const StateType &nextState, bool isEnd) |
| Store the given experience and set the priorities for the given experience. More... | |
| void | Update (arma::mat target, arma::icolvec sampledActions, arma::mat nextActionValues, arma::mat &gradients) |
| Update the priorities of transitions and Update the gradients. More... | |
| void | UpdatePriorities (arma::ucolvec &indices, arma::colvec &priorities) |
| Update priorities of sampled transitions. More... | |
Implementation of prioritized experience replay.
Prioritized experience replay can replay important transitions more frequently by prioritizing transitions, and make agent learn more efficiently.
| EnvironmentType | Desired task. |
Definition at line 39 of file prioritized_replay.hpp.
| using ActionType = typename EnvironmentType::Action |
Convenient typedef for action.
Definition at line 43 of file prioritized_replay.hpp.
| using StateType = typename EnvironmentType::State |
Convenient typedef for state.
Definition at line 46 of file prioritized_replay.hpp.
|
inline |
Default constructor.
Definition at line 51 of file prioritized_replay.hpp.
|
inline |
Construct an instance of prioritized experience replay class.
| batchSize | Number of examples returned at each sample. |
| capacity | Total memory size in terms of number of examples. |
| alpha | How much prioritization is used. |
| dimension | The dimension of an encoded state. |
Definition at line 62 of file prioritized_replay.hpp.
|
inline |
Annealing the beta.
Definition at line 203 of file prioritized_replay.hpp.
Referenced by PrioritizedReplay< EnvironmentType >::Sample().
|
inline |
Sample some experience according to their priorities.
| sampledStates | Sampled encoded states. |
| sampledActions | Sampled actions. |
| sampledRewards | Sampled rewards. |
| sampledNextStates | Sampled encoded next states. |
| isTerminal | Indicate whether corresponding next state is terminal state. |
Definition at line 149 of file prioritized_replay.hpp.
References PrioritizedReplay< EnvironmentType >::BetaAnneal(), SumTree< T >::Get(), PrioritizedReplay< EnvironmentType >::SampleProportional(), and SumTree< T >::Sum().
|
inline |
Sample some experience according to their priorities.
Definition at line 126 of file prioritized_replay.hpp.
References SumTree< T >::FindPrefixSum(), and SumTree< T >::Sum().
Referenced by PrioritizedReplay< EnvironmentType >::Sample().
|
inline |
Get the number of transitions in the memory.
Definition at line 195 of file prioritized_replay.hpp.
|
inline |
Store the given experience and set the priorities for the given experience.
| state | Given state. |
| action | Given action. |
| reward | Given reward. |
| nextState | Given next state. |
| isEnd | Whether next state is terminal state. |
Definition at line 99 of file prioritized_replay.hpp.
References SumTree< T >::Set().
|
inline |
Update the priorities of transitions and Update the gradients.
| target | The learned value. |
| sampledActions | Agent's sampled action. |
| nextActionValues | Agent's next action. |
| gradients | The model's gradients. |
Definition at line 216 of file prioritized_replay.hpp.
References PrioritizedReplay< EnvironmentType >::UpdatePriorities().
|
inline |
Update priorities of sampled transitions.
| indices | The indices of sample to be updated. |
| priorities | Their corresponding priorities. |
Definition at line 183 of file prioritized_replay.hpp.
References SumTree< T >::BatchUpdate().
Referenced by PrioritizedReplay< EnvironmentType >::Update().