\section{Introduction}\label{dettutorial_intro_det_tut}
D\+E\+Ts perform the unsupervised task of density estimation using decision trees. Using a trained density estimation tree (D\+ET), the density at any particular point can be estimated very quickly (O(log n) time, where n is the number of points the tree is built on).

The details of this work is presented in the following paper\+: 
\begin{DoxyCode}
@inproceedings\{ram2011density,
  title=\{Density estimation trees\},
  author=\{Ram, P. and Gray, A.G.\},
  booktitle=\{Proceedings of the 17th ACM SIGKDD International Conference on
      Knowledge Discovery and Data Mining\},
  pages=\{627--635\},
  year=\{2011\},
  organization=\{ACM\}
\}
\end{DoxyCode}


{\bfseries mlpack} provides\+:


\begin{DoxyItemize}
\item a \doxyref{simple command-\/line executable}{p.}{dettutorial_cli_det_tut} to perform density estimation and related analyses using D\+E\+Ts
\item a \doxyref{generic C++ class (D\+Tree)}{p.}{dettutorial_dtree_det_tut} which provides various functionality for the D\+E\+Ts
\item a set of functions in the namespace \doxyref{mlpack\+:\+:det}{p.}{dettutorial_dtutils_det_tut} to perform cross-\/validation for the task of density estimation with D\+E\+Ts
\end{DoxyItemize}\section{Table of Contents}\label{dettutorial_toc_det_tut}
A list of all the sections this tutorial contains.


\begin{DoxyItemize}
\item \doxyref{Introduction}{p.}{dettutorial_intro_det_tut}
\item \doxyref{Table of Contents}{p.}{dettutorial_toc_det_tut}
\item \doxyref{Command-\/\+Line mlpack\+\_\+det}{p.}{dettutorial_cli_det_tut}
\begin{DoxyItemize}
\item \doxyref{Plain-\/vanilla density estimation}{p.}{dettutorial_cli_ex1_de_tut}
\item \doxyref{Estimation on a test set}{p.}{dettutorial_cli_ex2_de_test_tut}
\item \doxyref{Computing the variable importance}{p.}{dettutorial_cli_ex4_de_vi_tut}
\item \doxyref{Saving trained D\+E\+Ts}{p.}{dettutorial_cli_ex6_de_save}
\item \doxyref{Loading trained D\+E\+Ts}{p.}{dettutorial_cli_ex7_de_load}
\end{DoxyItemize}
\item \doxyref{The \textquotesingle{}D\+Tree\textquotesingle{} class}{p.}{dettutorial_dtree_det_tut}
\begin{DoxyItemize}
\item \doxyref{Public Functions}{p.}{dettutorial_dtree_pub_func_det_tut}
\end{DoxyItemize}
\item \doxyref{\textquotesingle{}namespace mlpack\+::det\textquotesingle{}}{p.}{dettutorial_dtutils_det_tut}
\begin{DoxyItemize}
\item \doxyref{Utility Functions}{p.}{dettutorial_dtutils_util_funcs}
\end{DoxyItemize}
\item \doxyref{Further Documentation}{p.}{dettutorial_further_doc_det_tut}
\end{DoxyItemize}\section{Command-\/\+Line mlpack\+\_\+det}\label{dettutorial_cli_det_tut}
The command line arguments of this program can be viewed using the {\ttfamily -\/h} option\+:


\begin{DoxyCode}
$ mlpack\_det -h
Density Estimation With Density Estimation Trees

  This program performs a number of functions related to Density Estimation
  Trees.  The optimal Density Estimation Tree (DET) can be trained on a \textcolor{keyword}{set} of
  data (specified by --training\_file or -t) \textcolor{keyword}{using} cross-validation (with number
  of folds specified by --folds).  This trained density estimation tree may then
  be saved to a model file with the --output\_model\_file (-M) option.

  The variable importances of each dimension may be saved with the --vi\_file
  (-i) option, and the density estimates on each training point may be saved to
  the file specified with the --training\_set\_estimates\_file (-e) option.

  This program also can provide density estimates \textcolor{keywordflow}{for} a \textcolor{keyword}{set} of test points,
  specified in the --test\_file (-T) file.  The density estimation tree used \textcolor{keywordflow}{for}
  \textcolor{keyword}{this} task will be the tree that was trained on the given training points, or a
  tree stored in the file given with the --input\_model\_file (-m) parameter.  The
  density estimates \textcolor{keywordflow}{for} the test points may be saved into the file specified
  with the --test\_set\_estimates\_file (-E) option.


Options:

  --folds (-f) [int]            The number of folds of cross-validation to
                                perform \textcolor{keywordflow}{for} the estimation (0 is LOOCV)  Default
                                value 10.
  --help (-h)                   Default help info.
  --info [string]               Get help on a specific module or option.
                                Default value \textcolor{stringliteral}{''}.
  --input\_model\_file (-m) [string]
                                File containing already trained density
                                estimation tree.  Default value \textcolor{stringliteral}{''}.
  --max\_leaf\_size (-L) [int]    The maximum size of a leaf in the unpruned,
                                fully grown DET.  Default value 10.
  --min\_leaf\_size (-l) [int]    The minimum size of a leaf in the unpruned,
                                fully grown DET.  Default value 5.
  --output\_model\_file (-M) [string]
                                File to save trained density estimation tree to.
                                 Default value \textcolor{stringliteral}{''}.
  --test\_file (-T) [string]     A \textcolor{keyword}{set} of test points to estimate the density of.
                                 Default value \textcolor{stringliteral}{''}.
  --test\_set\_estimates\_file (-E) [string]
                                The file in which to output the estimates on the
                                test \textcolor{keyword}{set} from the \textcolor{keyword}{final} optimally pruned tree.
                                Default value \textcolor{stringliteral}{''}.
  --training\_file (-t) [string]
                                The data \textcolor{keyword}{set} on which to build a density
                                estimation tree.  Default value \textcolor{stringliteral}{''}.
  --training\_set\_estimates\_file (-e) [string]
                                The file in which to output the density
                                estimates on the training \textcolor{keyword}{set} from the \textcolor{keyword}{final}
                                optimally pruned tree.  Default value \textcolor{stringliteral}{''}.
  --verbose (-v)                Display informational messages and the full list
                                of parameters and timers at the end of
                                execution.
  --version (-V)                Display the version of mlpack.
  --vi\_file (-i) [string]       The file to output the variable importance
                                values \textcolor{keywordflow}{for each} feature.  Default value \textcolor{stringliteral}{''}.

For further information, including relevant papers, citations, and theory,
consult the documentation found at http:\textcolor{comment}{//www.mlpack.org or included with your}
distribution of mlpack.
\end{DoxyCode}
\subsection{Plain-\/vanilla density estimation}\label{dettutorial_cli_ex1_de_tut}
We can just train a D\+ET on the provided data set {\itshape S}. Like all datasets {\bfseries mlpack} uses, the data should be row-\/major ({\bfseries mlpack} transposes data when it is loaded; internally, the data is column-\/major -- see \doxyref{this page}{p.}{matrices} for more information).


\begin{DoxyCode}
$ mlpack\_det -t dataset.csv -v
\end{DoxyCode}


By default, {\ttfamily mlpack\+\_\+det} performs 10-\/fold cross-\/validation (using the $\alpha$-\/pruning regularization for decision trees). To perform L\+O\+O\+CV (leave-\/one-\/out cross-\/validation), which can provide better results but will take longer, use the following command\+:


\begin{DoxyCode}
$ mlpack\_det -t dataset.csv -f 0 -v
\end{DoxyCode}


To perform k-\/fold crossvalidation, use {\ttfamily -\/f} {\ttfamily k} (or {\ttfamily --folds} {\ttfamily k}). There are certain other options available for training. For example, in the construction of the initial tree, you can specify the maximum and minimum leaf sizes. By default, they are 10 and 5 respectively; you can set them using the {\ttfamily -\/M} ({\ttfamily --max\+\_\+leaf\+\_\+size}) and the {\ttfamily -\/N} ({\ttfamily --min\+\_\+leaf\+\_\+size}) options.


\begin{DoxyCode}
$ mlpack\_det -t dataset.csv -M 20 -N 10
\end{DoxyCode}


In case you want to output the density estimates at the points in the training set, use the {\ttfamily -\/e} ({\ttfamily --training\+\_\+set\+\_\+estimates\+\_\+file}) option to specify the output file to which the estimates will be saved. The first line in density\+\_\+estimates.\+txt will correspond to the density at the first point in the training set. Note that the logarithm of the density estimates are given, which allows smaller estimates to be saved.


\begin{DoxyCode}
$ mlpack\_det -t dataset.csv -e density\_estimates.txt -v
\end{DoxyCode}
\subsection{Estimation on a test set}\label{dettutorial_cli_ex2_de_test_tut}
Often, it is useful to train a density estimation tree on a training set and then obtain density estimates from the learned estimator for a separate set of test points. The {\ttfamily -\/T} ({\ttfamily --test\+\_\+file}) option allows specification of a set of test points, and the {\ttfamily -\/E} ({\ttfamily --test\+\_\+set\+\_\+estimates\+\_\+file}) option allows specification of the file into which the test set estimates are saved. Note that the logarithm of the density estimates are saved; this allows smaller values to be saved.


\begin{DoxyCode}
$ mlpack\_det -t dataset.csv -T test\_points.csv -E test\_density\_estimates.txt -v
\end{DoxyCode}
\subsection{Computing the variable importance}\label{dettutorial_cli_ex4_de_vi_tut}
The variable importance (with respect to density estimation) of the different features in the data set can be obtained by using the {\ttfamily -\/i} ({\ttfamily --vi\+\_\+file} ) option. This outputs the absolute (as opposed to relative) variable importance of the all the features into the specified file.


\begin{DoxyCode}
$ mlpack\_det -t dataset.csv -i variable\_importance.txt -v
\end{DoxyCode}
\subsection{Saving trained D\+E\+Ts}\label{dettutorial_cli_ex6_de_save}
The {\ttfamily mlpack\+\_\+det} program is capable of saving a trained D\+ET to a file for later usage. The {\ttfamily --output\+\_\+model\+\_\+file} or {\ttfamily -\/M} option allows specification of the file to save to. In the example below, a D\+ET trained on {\ttfamily dataset.\+csv} is saved to the file {\ttfamily det.\+xml}.


\begin{DoxyCode}
$ mlpack\_det -t dataset.csv -M det.xml -v
\end{DoxyCode}
\subsection{Loading trained D\+E\+Ts}\label{dettutorial_cli_ex7_de_load}
A saved D\+ET can be used to perform any of the functionality in the examples above. A saved D\+ET is loaded with the {\ttfamily --input\+\_\+model\+\_\+file} or {\ttfamily -\/m} option. The example below loads a saved D\+ET from {\ttfamily det.\+xml} and outputs density estimates on the dataset {\ttfamily test\+\_\+dataset.\+csv} into the file {\ttfamily estimates.\+csv}.


\begin{DoxyCode}
$ mlpack\_det -m det.xml -T test\_dataset.csv -E estimates.csv -v
\end{DoxyCode}
\section{The \textquotesingle{}\+D\+Tree\textquotesingle{} class}\label{dettutorial_dtree_det_tut}
This class implements density estimation trees. Below is a simple example which initializes a density estimation tree.


\begin{DoxyCode}
\textcolor{preprocessor}{#include <mlpack/methods/det/dtree.hpp>}

\textcolor{keyword}{using namespace }mlpack::det;

\textcolor{comment}{// The dataset matrix, on which to learn the density estimation tree.}
\textcolor{keyword}{extern} arma::Mat<float> data;

\textcolor{comment}{// Initialize the tree.  This function also creates and saves the bounding box}
\textcolor{comment}{// of the data.  Note that it does not actually build the tree.}
DTree<> det(data);
\end{DoxyCode}
\subsection{Public Functions}\label{dettutorial_dtree_pub_func_det_tut}
The function {\ttfamily Grow()} greedily grows the tree, adding new points to the tree. Note that the points in the dataset will be reordered. This should only be run on a tree which has not already been built. In general, it is more useful to use the {\ttfamily \doxyref{Trainer()}{p.}{namespacemlpack_1_1det_a4a84945ed0d2a629c86f8538e6e7090c}} function found in \doxyref{\textquotesingle{}namespace mlpack\+::det\textquotesingle{}}{p.}{dettutorial_dtutils_det_tut}.


\begin{DoxyCode}
\textcolor{comment}{// This keeps track of the data during the shuffle that occurs while growing the}
\textcolor{comment}{// tree.}
arma::Col<size\_t> oldFromNew(data.n\_cols);
\textcolor{keywordflow}{for} (\textcolor{keywordtype}{size\_t} i = 0; i < data.n\_cols; i++)
  oldFromNew[i] = i;

\textcolor{comment}{// This function grows the tree down to the leaves. It returns the current}
\textcolor{comment}{// minimum value of the regularization parameter alpha.}
\textcolor{keywordtype}{size\_t} maxLeafSize = 10;
\textcolor{keywordtype}{size\_t} minLeafSize = 5;

\textcolor{keywordtype}{double} alpha = det.Grow(data, oldFromNew, \textcolor{keyword}{false}, maxLeafSize, minLeafSize);
\end{DoxyCode}


Note that the alternate volume regularization should not be used (see ticket \#238).

To estimate the density at a given query point, use the following code. Note that the logarithm of the density is returned.


\begin{DoxyCode}
\textcolor{comment}{// For a given query, you can obtain the density estimate.}
\textcolor{keyword}{extern} arma::Col<float> query;
\textcolor{keyword}{extern} DTree* det;
\textcolor{keywordtype}{double} estimate = det->ComputeValue(&query);
\end{DoxyCode}


Computing the {\bfseries variable} {\bfseries importance} of each feature for the given D\+ET.


\begin{DoxyCode}
\textcolor{comment}{// The data matrix and density estimation tree.}
\textcolor{keyword}{extern} arma::mat data;
\textcolor{keyword}{extern} DTree* det;

\textcolor{comment}{// The variable importances will be saved into this vector.}
arma::Col<double> varImps;

\textcolor{comment}{// You can obtain the variable importance from the current tree.}
det->ComputeVariableImportance(varImps);
\end{DoxyCode}
\section{\textquotesingle{}namespace mlpack\+::det\textquotesingle{}}\label{dettutorial_dtutils_det_tut}
The functions in this namespace allows the user to perform tasks with the \textquotesingle{}D\+Tree\textquotesingle{} class. Most importantly, the {\ttfamily \doxyref{Trainer()}{p.}{namespacemlpack_1_1det_a4a84945ed0d2a629c86f8538e6e7090c}} method allows the full training of a density estimation tree with cross-\/validation. There are also utility functions which allow printing of leaf membership and variable importance.\subsection{Utility Functions}\label{dettutorial_dtutils_util_funcs}
The code below details how to train a density estimation tree with cross-\/validation.


\begin{DoxyCode}
\textcolor{preprocessor}{#include <mlpack/methods/det/dt_utils.hpp>}

\textcolor{keyword}{using namespace }mlpack::det;

\textcolor{comment}{// The dataset matrix, on which to learn the density estimation tree.}
\textcolor{keyword}{extern} arma::Mat<float> data;

\textcolor{comment}{// The number of folds for cross-validation.}
\textcolor{keyword}{const} \textcolor{keywordtype}{size\_t} folds = 10; \textcolor{comment}{// Set folds = 0 for LOOCV.}

\textcolor{keyword}{const} \textcolor{keywordtype}{size\_t} maxLeafSize = 10;
\textcolor{keyword}{const} \textcolor{keywordtype}{size\_t} minLeafSize = 5;

\textcolor{comment}{// Train the density estimation tree with cross-validation.}
DTree<>* dtree\_opt = Trainer(data, folds, \textcolor{keyword}{false}, maxLeafSize, minLeafSize);
\end{DoxyCode}


Note that the alternate volume regularization should be set to false because it has known bugs (see \#238).

To print the class membership of leaves in the tree into a file, see the following code.


\begin{DoxyCode}
\textcolor{keyword}{extern} arma::Mat<size\_t> labels;
\textcolor{keyword}{extern} DTree* det;
\textcolor{keyword}{const} \textcolor{keywordtype}{size\_t} numClasses = 3; \textcolor{comment}{// The number of classes must be known.}

\textcolor{keyword}{extern} \textcolor{keywordtype}{string} leafClassMembershipFile;

PrintLeafMembership(det, data, labels, numClasses, leafClassMembershipFile);
\end{DoxyCode}


Note that you can find the number of classes with {\ttfamily max(labels)} {\ttfamily +} {\ttfamily 1}. The variable importance can also be printed to a file in a similar manner.


\begin{DoxyCode}
\textcolor{keyword}{extern} DTree* det;

\textcolor{keyword}{extern} \textcolor{keywordtype}{string} variableImportanceFile;
\textcolor{keyword}{const} \textcolor{keywordtype}{size\_t} numFeatures = data.n\_rows;

PrintVariableImportance(det, numFeatures, variableImportanceFile);
\end{DoxyCode}
\section{Further Documentation}\label{dettutorial_further_doc_det_tut}
For further documentation on the D\+Tree class, consult the \doxyref{complete A\+PI documentation}{p.}{classmlpack_1_1det_1_1DTree}. 