\section{S\+AC$<$ Environment\+Type, Q\+Network\+Type, Policy\+Network\+Type, Updater\+Type, Replay\+Type $>$ Class Template Reference}
\label{classmlpack_1_1rl_1_1SAC}\index{S\+A\+C$<$ Environment\+Type, Q\+Network\+Type, Policy\+Network\+Type, Updater\+Type, Replay\+Type $>$@{S\+A\+C$<$ Environment\+Type, Q\+Network\+Type, Policy\+Network\+Type, Updater\+Type, Replay\+Type $>$}}


Implementation of Soft Actor-\/\+Critic, a model-\/free off-\/policy actor-\/critic based deep reinforcement learning algorithm.  


\subsection*{Public Types}
\begin{DoxyCompactItemize}
\item 
using \textbf{ Action\+Type} = typename Environment\+Type\+::\+Action
\begin{DoxyCompactList}\small\item\em Convenient typedef for action. \end{DoxyCompactList}\item 
using \textbf{ State\+Type} = typename Environment\+Type\+::\+State
\begin{DoxyCompactList}\small\item\em Convenient typedef for state. \end{DoxyCompactList}\end{DoxyCompactItemize}
\subsection*{Public Member Functions}
\begin{DoxyCompactItemize}
\item 
\textbf{ S\+AC} (\textbf{ Training\+Config} \&config, Q\+Network\+Type \&learning\+Q1\+Network, Policy\+Network\+Type \&policy\+Network, Replay\+Type \&replay\+Method, Updater\+Type q\+Network\+Updater=Updater\+Type(), Updater\+Type policy\+Network\+Updater=Updater\+Type(), Environment\+Type environment=Environment\+Type())
\begin{DoxyCompactList}\small\item\em Create the \doxyref{S\+AC}{p.}{classmlpack_1_1rl_1_1SAC} object with given settings. \end{DoxyCompactList}\item 
\textbf{ $\sim$\+S\+AC} ()
\begin{DoxyCompactList}\small\item\em Clean memory. \end{DoxyCompactList}\item 
const \textbf{ Action\+Type} \& \textbf{ Action} () const
\begin{DoxyCompactList}\small\item\em Get the action of the agent. \end{DoxyCompactList}\item 
bool \& \textbf{ Deterministic} ()
\begin{DoxyCompactList}\small\item\em Modify the training mode / test mode indicator. \end{DoxyCompactList}\item 
const bool \& \textbf{ Deterministic} () const
\begin{DoxyCompactList}\small\item\em Get the indicator of training mode / test mode. \end{DoxyCompactList}\item 
double \textbf{ Episode} ()
\begin{DoxyCompactList}\small\item\em Execute an episode. \end{DoxyCompactList}\item 
void \textbf{ Select\+Action} ()
\begin{DoxyCompactList}\small\item\em Select an action, given an agent. \end{DoxyCompactList}\item 
void \textbf{ Soft\+Update} (double rho)
\begin{DoxyCompactList}\small\item\em Softly update the learning Q network parameters to the target Q network parameters. \end{DoxyCompactList}\item 
\textbf{ State\+Type} \& \textbf{ State} ()
\begin{DoxyCompactList}\small\item\em Modify the state of the agent. \end{DoxyCompactList}\item 
const \textbf{ State\+Type} \& \textbf{ State} () const
\begin{DoxyCompactList}\small\item\em Get the state of the agent. \end{DoxyCompactList}\item 
size\+\_\+t \& \textbf{ Total\+Steps} ()
\begin{DoxyCompactList}\small\item\em Modify total steps from beginning. \end{DoxyCompactList}\item 
const size\+\_\+t \& \textbf{ Total\+Steps} () const
\begin{DoxyCompactList}\small\item\em Get total steps from beginning. \end{DoxyCompactList}\item 
void \textbf{ Update} ()
\begin{DoxyCompactList}\small\item\em Update the Q and policy networks. \end{DoxyCompactList}\end{DoxyCompactItemize}


\subsection{Detailed Description}
\subsubsection*{template$<$typename Environment\+Type, typename Q\+Network\+Type, typename Policy\+Network\+Type, typename Updater\+Type, typename Replay\+Type = Random\+Replay$<$\+Environment\+Type$>$$>$\newline
class mlpack\+::rl\+::\+S\+A\+C$<$ Environment\+Type, Q\+Network\+Type, Policy\+Network\+Type, Updater\+Type, Replay\+Type $>$}

Implementation of Soft Actor-\/\+Critic, a model-\/free off-\/policy actor-\/critic based deep reinforcement learning algorithm. 

For more details, see the following\+: 
\begin{DoxyCode}
@misc\{haarnoja2018soft,
 author    = \{Tuomas Haarnoja and
              Aurick Zhou and
              Kristian Hartikainen and
              George Tucker and
              Sehoon Ha and
              Jie Tan and
              Vikash Kumar and
              Henry Zhu and
              Abhishek Gupta and
              Pieter Abbeel and
              Sergey Levine\},
 title     = \{Soft Actor-Critic Algorithms and Applications\},
 year      = \{2018\},
 url       = \{https:\textcolor{comment}{//arxiv.org/abs/1812.05905\}}
\}
\end{DoxyCode}



\begin{DoxyTemplParams}{Template Parameters}
{\em Environment\+Type} & The environment of the reinforcement learning task. \\
\hline
{\em Network\+Type} & The network to compute action value. \\
\hline
{\em Updater\+Type} & How to apply gradients when training. \\
\hline
{\em Replay\+Type} & Experience replay method. \\
\hline
\end{DoxyTemplParams}


Definition at line 63 of file sac.\+hpp.



\subsection{Member Typedef Documentation}
\mbox{\label{classmlpack_1_1rl_1_1SAC_aaf7b2dc5d49d01961601c7c16be76777}} 
\index{mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}!Action\+Type@{Action\+Type}}
\index{Action\+Type@{Action\+Type}!mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}}
\subsubsection{Action\+Type}
{\footnotesize\ttfamily using \textbf{ Action\+Type} =  typename Environment\+Type\+::\+Action}



Convenient typedef for action. 



Definition at line 70 of file sac.\+hpp.

\mbox{\label{classmlpack_1_1rl_1_1SAC_ada68ef405b7c331a2bee337614f00088}} 
\index{mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}!State\+Type@{State\+Type}}
\index{State\+Type@{State\+Type}!mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}}
\subsubsection{State\+Type}
{\footnotesize\ttfamily using \textbf{ State\+Type} =  typename Environment\+Type\+::\+State}



Convenient typedef for state. 



Definition at line 67 of file sac.\+hpp.



\subsection{Constructor \& Destructor Documentation}
\mbox{\label{classmlpack_1_1rl_1_1SAC_a382013c48f00c0cd5e682edb92a01f16}} 
\index{mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}!S\+AC@{S\+AC}}
\index{S\+AC@{S\+AC}!mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}}
\subsubsection{S\+A\+C()}
{\footnotesize\ttfamily \textbf{ S\+AC} (\begin{DoxyParamCaption}\item[{\textbf{ Training\+Config} \&}]{config,  }\item[{Q\+Network\+Type \&}]{learning\+Q1\+Network,  }\item[{Policy\+Network\+Type \&}]{policy\+Network,  }\item[{Replay\+Type \&}]{replay\+Method,  }\item[{Updater\+Type}]{q\+Network\+Updater = {\ttfamily UpdaterType()},  }\item[{Updater\+Type}]{policy\+Network\+Updater = {\ttfamily UpdaterType()},  }\item[{Environment\+Type}]{environment = {\ttfamily EnvironmentType()} }\end{DoxyParamCaption})}



Create the \doxyref{S\+AC}{p.}{classmlpack_1_1rl_1_1SAC} object with given settings. 

If you want to pass in a parameter and discard the original parameter object, you can directly pass the parameter, as the constructor takes a reference. This avoids unnecessary copy.


\begin{DoxyParams}{Parameters}
{\em config} & Hyper-\/parameters for training. \\
\hline
{\em learning\+Q1\+Network} & The network to compute action value. \\
\hline
{\em policy\+Network} & The network to produce an action given a state. \\
\hline
{\em replay\+Method} & Experience replay method. \\
\hline
{\em q\+Network\+Updater} & How to apply gradients to Q network when training. \\
\hline
{\em policy\+Network\+Updater} & How to apply gradients to policy network when training. \\
\hline
{\em environment} & Reinforcement learning task. \\
\hline
\end{DoxyParams}
\mbox{\label{classmlpack_1_1rl_1_1SAC_abf0c5202e5e579fc47c332f2490fbb8f}} 
\index{mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}!````~S\+AC@{$\sim$\+S\+AC}}
\index{````~S\+AC@{$\sim$\+S\+AC}!mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}}
\subsubsection{$\sim$\+S\+A\+C()}
{\footnotesize\ttfamily $\sim$\textbf{ S\+AC} (\begin{DoxyParamCaption}{ }\end{DoxyParamCaption})}



Clean memory. 



\subsection{Member Function Documentation}
\mbox{\label{classmlpack_1_1rl_1_1SAC_a0d32caed9517e5d2014238a22f78352d}} 
\index{mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}!Action@{Action}}
\index{Action@{Action}!mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}}
\subsubsection{Action()}
{\footnotesize\ttfamily const \textbf{ Action\+Type}\& Action (\begin{DoxyParamCaption}{ }\end{DoxyParamCaption}) const\hspace{0.3cm}{\ttfamily [inline]}}



Get the action of the agent. 



Definition at line 136 of file sac.\+hpp.

\mbox{\label{classmlpack_1_1rl_1_1SAC_a42d4ee3da432cff20d3a41b8b1ec801c}} 
\index{mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}!Deterministic@{Deterministic}}
\index{Deterministic@{Deterministic}!mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}}
\subsubsection{Deterministic()\hspace{0.1cm}{\footnotesize\ttfamily [1/2]}}
{\footnotesize\ttfamily bool\& Deterministic (\begin{DoxyParamCaption}{ }\end{DoxyParamCaption})\hspace{0.3cm}{\ttfamily [inline]}}



Modify the training mode / test mode indicator. 



Definition at line 139 of file sac.\+hpp.

\mbox{\label{classmlpack_1_1rl_1_1SAC_a5d262f7871c5cc8b532971fb644f0abf}} 
\index{mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}!Deterministic@{Deterministic}}
\index{Deterministic@{Deterministic}!mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}}
\subsubsection{Deterministic()\hspace{0.1cm}{\footnotesize\ttfamily [2/2]}}
{\footnotesize\ttfamily const bool\& Deterministic (\begin{DoxyParamCaption}{ }\end{DoxyParamCaption}) const\hspace{0.3cm}{\ttfamily [inline]}}



Get the indicator of training mode / test mode. 



Definition at line 141 of file sac.\+hpp.

\mbox{\label{classmlpack_1_1rl_1_1SAC_a1fb26736f2d90010f882f9628cd26612}} 
\index{mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}!Episode@{Episode}}
\index{Episode@{Episode}!mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}}
\subsubsection{Episode()}
{\footnotesize\ttfamily double Episode (\begin{DoxyParamCaption}{ }\end{DoxyParamCaption})}



Execute an episode. 

\begin{DoxyReturn}{Returns}
Return of the episode. 
\end{DoxyReturn}
\mbox{\label{classmlpack_1_1rl_1_1SAC_abd126acd7f564c8326dc765232624ae4}} 
\index{mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}!Select\+Action@{Select\+Action}}
\index{Select\+Action@{Select\+Action}!mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}}
\subsubsection{Select\+Action()}
{\footnotesize\ttfamily void Select\+Action (\begin{DoxyParamCaption}{ }\end{DoxyParamCaption})}



Select an action, given an agent. 

\mbox{\label{classmlpack_1_1rl_1_1SAC_a0e8b07b1eb04d72c36e95f795340d5c6}} 
\index{mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}!Soft\+Update@{Soft\+Update}}
\index{Soft\+Update@{Soft\+Update}!mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}}
\subsubsection{Soft\+Update()}
{\footnotesize\ttfamily void Soft\+Update (\begin{DoxyParamCaption}\item[{double}]{rho }\end{DoxyParamCaption})}



Softly update the learning Q network parameters to the target Q network parameters. 


\begin{DoxyParams}{Parameters}
{\em rho} & How \char`\"{}softly\char`\"{} should the parameters be copied. \\
\hline
\end{DoxyParams}
\mbox{\label{classmlpack_1_1rl_1_1SAC_ad7a595de4a1a67da528603c20f80315f}} 
\index{mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}!State@{State}}
\index{State@{State}!mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}}
\subsubsection{State()\hspace{0.1cm}{\footnotesize\ttfamily [1/2]}}
{\footnotesize\ttfamily \textbf{ State\+Type}\& State (\begin{DoxyParamCaption}{ }\end{DoxyParamCaption})\hspace{0.3cm}{\ttfamily [inline]}}



Modify the state of the agent. 



Definition at line 131 of file sac.\+hpp.

\mbox{\label{classmlpack_1_1rl_1_1SAC_afa3e388ae5e024c8ec49fd4d1ef725ad}} 
\index{mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}!State@{State}}
\index{State@{State}!mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}}
\subsubsection{State()\hspace{0.1cm}{\footnotesize\ttfamily [2/2]}}
{\footnotesize\ttfamily const \textbf{ State\+Type}\& State (\begin{DoxyParamCaption}{ }\end{DoxyParamCaption}) const\hspace{0.3cm}{\ttfamily [inline]}}



Get the state of the agent. 



Definition at line 133 of file sac.\+hpp.

\mbox{\label{classmlpack_1_1rl_1_1SAC_abaf0bb243c2e643c57654b8e65058fa0}} 
\index{mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}!Total\+Steps@{Total\+Steps}}
\index{Total\+Steps@{Total\+Steps}!mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}}
\subsubsection{Total\+Steps()\hspace{0.1cm}{\footnotesize\ttfamily [1/2]}}
{\footnotesize\ttfamily size\+\_\+t\& Total\+Steps (\begin{DoxyParamCaption}{ }\end{DoxyParamCaption})\hspace{0.3cm}{\ttfamily [inline]}}



Modify total steps from beginning. 



Definition at line 126 of file sac.\+hpp.

\mbox{\label{classmlpack_1_1rl_1_1SAC_a689af4e6e564ab01f40e6ec49638bdaf}} 
\index{mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}!Total\+Steps@{Total\+Steps}}
\index{Total\+Steps@{Total\+Steps}!mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}}
\subsubsection{Total\+Steps()\hspace{0.1cm}{\footnotesize\ttfamily [2/2]}}
{\footnotesize\ttfamily const size\+\_\+t\& Total\+Steps (\begin{DoxyParamCaption}{ }\end{DoxyParamCaption}) const\hspace{0.3cm}{\ttfamily [inline]}}



Get total steps from beginning. 



Definition at line 128 of file sac.\+hpp.

\mbox{\label{classmlpack_1_1rl_1_1SAC_aec0783b5a136e042adcc47bae4fe5291}} 
\index{mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}!Update@{Update}}
\index{Update@{Update}!mlpack\+::rl\+::\+S\+AC@{mlpack\+::rl\+::\+S\+AC}}
\subsubsection{Update()}
{\footnotesize\ttfamily void Update (\begin{DoxyParamCaption}{ }\end{DoxyParamCaption})}



Update the Q and policy networks. 



The documentation for this class was generated from the following file\+:\begin{DoxyCompactItemize}
\item 
/var/www/mlpack.\+ratml.\+org/mlpack.\+org/\+\_\+src/mlpack-\/git/src/mlpack/methods/reinforcement\+\_\+learning/\textbf{ sac.\+hpp}\end{DoxyCompactItemize}
