\section{Introduction}\label{trees_treeintro}
Trees are an important data structure in mlpack and are used in a number of the machine learning algorithms that mlpack implements. Often, the use of trees can allow significant acceleration of an algorithm; this is generally done by pruning away large parts of the tree during computation.

Most mlpack algorithms that use trees are not tied to a specific tree but instead allow the user to choose a tree via the {\ttfamily Tree\+Type} template parameter. Any tree passed as a {\ttfamily Tree\+Type} template parameter will need to implement a certain set of functions. In addition, a tree may optionally specify some traits about itself with the {\ttfamily Tree\+Traits} trait class.

This document aims to clarify the abstractions underlying mlpack trees, list and describe the required functionality of the {\ttfamily Tree\+Type} policy, and point users towards existing types of trees. A table of contents is below\+:


\begin{DoxyItemize}
\item \doxyref{Introduction}{p.}{trees_treeintro}
\item \doxyref{What is a tree?}{p.}{trees_whatistree}
\item \doxyref{Template parameters required by the Tree\+Type policy}{p.}{trees_treetype_template_params}
\item \doxyref{The Tree\+Type A\+PI}{p.}{trees_treetype_api}
\item \doxyref{Rigorous A\+PI documentation}{p.}{trees_treetype_rigorous}
\begin{DoxyItemize}
\item \doxyref{Template parameters}{p.}{trees_treetype_rigorous_template}
\item \doxyref{Constructors and destructors}{p.}{trees_treetype_rigorous_constructor}
\item \doxyref{Basic tree functionality}{p.}{trees_treetype_rigorous_basic}
\item \doxyref{Complex tree functionality and bounds}{p.}{trees_treetype_rigorous_complex}
\item \doxyref{Serialization}{p.}{trees_treetype_rigorous_serialization}
\end{DoxyItemize}
\item \doxyref{The Tree\+Traits trait class}{p.}{trees_treetype_traits}
\item \doxyref{A list of trees in mlpack and more information}{p.}{trees_treetype_more}
\end{DoxyItemize}

Although this document is long, there may still be errors and unclear areas. If you are having trouble understanding anything, please get in touch on Github or on the mailing list and someone will help you (and possibly update the documentation afterwards).\section{What is a tree?}\label{trees_whatistree}
In mlpack, we assume that we have some sort of data matrix, which might be sparse or dense (that is, it could be of type {\ttfamily arma\+::mat} or {\ttfamily arma\+::sp\+\_\+mat}, or any variant that implements the Armadillo A\+PI). This data matrix corresponds to a collection of points in some space (usually a Euclidean space). A tree is a way of organizing this data matrix in a hierarchical manner---so, points that are nearby should lie in similar nodes.

We can rigorously define what a tree is, using the definition of {\bfseries space tree} introduced in the following paper\+:

R.\+R. Curtin, W.\+B. March, P. Ram, D.\+V. Anderson, A.\+G. Gray, and C.\+L. Isbell Jr., \char`\"{}\+Tree-\/independent dual-\/tree algorithms,\char`\"{} in Proceedings of the 30th International Conference on Machine Learning (I\+C\+ML \textquotesingle{}13), pp. 1435--1443, 2013. 

The definition is\+:

A {\bfseries space tree} on a dataset $ S \in \mathcal{R}^{N \times d} $ is an undirected, connected, acyclic, rooted simple graph with the following properties\+:


\begin{DoxyItemize}
\item Each node (or vertex) holds a number of points (possibly zero) and is connected to one parent node and a number of child nodes (possibly zero).
\item There is one node in every space tree with no parent; this is the root node of the tree.
\item Each point in $S$ is contained in at least one node.
\item Each node corresponds to some subset of $\mathcal{R}^d$ that contains each point in the node and also the subsets that correspond to each child of the node.
\end{DoxyItemize}

This is really a quite straightforward definition\+: a tree is hierarchical, and each node corresponds to some region of the input space. Each node may have some number of children, and may hold some number of points. However, there is an important terminology distinction to make\+: the term {\bfseries points held by a node} has a different meaning than the term {\bfseries descendant points held by a node}. The points held in a node are just that---points held only in the node. The descendant points of a node are the combination of the points held in a node with the points held in the node\textquotesingle{}s children and the points held in the node\textquotesingle{}s children\textquotesingle{}s children (and so forth). For the purposes of clarity in all discussions about trees, care is taken to differentiate the terms \char`\"{}descendant
point\char`\"{} and \char`\"{}point\char`\"{}.

Now, it\textquotesingle{}s also important to note that a point does not {\itshape need} to hold any children, and that a node {\itshape can} hold the same points as its children (or its parent). Some types of trees do this. For instance, each node in the cover tree holds only one point, and may have a child that holds the same point. As another example, the $kd$-\/tree holds its points only in the leaves (at the bottom of the tree). More information on space trees can be found in either the \char`\"{}\+Tree-\/independent dual-\/tree algorithms\char`\"{} paper or any of the related literature.

So there is a huge amount of possible variety in the types of trees that can fall into the class of {\itshape space trees}. Therefore, it\textquotesingle{}s important to treat them abstractly, and the {\ttfamily Tree\+Type} policy allows us to do just that. All we need to remember is that a node in a tree can be represented as the combination of some points held in the node, some child nodes, and some geometric structure that represents the space that all of the descendant points fall into (this is a restatement of the fourth part of the definition).\section{Template parameters required by the Tree\+Type policy}\label{trees_treetype_template_params}
Most everything in mlpack is decomposed into a series of configurable template parameters, and trees are no exception. In order to ease usage of high-\/level mlpack algorithms, each {\ttfamily Tree\+Type} itself must be a template class taking three parameters\+:


\begin{DoxyItemize}
\item {\ttfamily Metric\+Type} -- the underlying metric that the tree will be built on (see \doxyref{the Metric\+Type policy documentation}{p.}{metrics})
\item {\ttfamily Statistic\+Type} -- holds any auxiliary information that individual algorithms may need
\item {\ttfamily Mat\+Type} -- the type of the matrix used to represent the data
\end{DoxyItemize}

The reason that these three template parameters are necessary is so that each {\ttfamily Tree\+Type} can be used as a template template parameter, which can radically simplify the required syntax for instantiating mlpack algorithms. By using template template parameters, a user needs only to write


\begin{DoxyCode}
\textcolor{comment}{// The RangeSearch class takes a MetricType and a TreeType template parameter.}

\textcolor{comment}{// This code instantiates RangeSearch with the ManhattanDistance and a}
\textcolor{comment}{// QuadTree.  Note that the QuadTree itself is a template, and takes a}
\textcolor{comment}{// MetricType, StatisticType, and MatType, just like the policy requires.}

\textcolor{comment}{// This example ignores the constructor parameters, for the sake of simplicity.}
RangeSearch<ManhattanDistance, QuadTree> rs(...);
\end{DoxyCode}


as opposed to the far more complicated alternative, where the user must specify the values of each template parameter of the tree type\+:


\begin{DoxyCode}
\textcolor{comment}{// This is a much worse alternative, where the user must specify the template}
\textcolor{comment}{// arguments of their tree.}
RangeSearch<ManhattanDistance,
            QuadTree<ManhattanDistance, EmptyStatistic, arma::mat>> rs(...);
\end{DoxyCode}


Unfortunately, the price to pay for this user convenience is that {\itshape every} {\ttfamily Tree\+Type} must have three template parameters, and they must be in exactly that order. Fortunately, there is an additional benefit\+: we are guaranteed that the tree is built using the same metric as the method (that is, a user can\textquotesingle{}t specify different metric types to the algorithm and to the tree, which they can without template template parameters).

There are two important notes about this\+:


\begin{DoxyItemize}
\item Not every possible input of Metric\+Type, Statistic\+Type, and/or Mat\+Type necessarily need to be valid or work correctly for each type of tree. For instance, the Quad\+Tree is limited to Euclidean metrics and will not work otherwise. Either compile-\/time static checks or detailed documentation can help keep users from using invalid combinations of template arguments.
\item Some types of trees have more template parameters than just these three. One example is the generalized binary space tree, where the bounding shape of each node is easily made into a fourth template parameter (the {\ttfamily Binary\+Space\+Tree} class calls this the {\ttfamily Bound\+Type} parameter), and the procedure used to split a node is easily made into a fifth template parameter (the {\ttfamily Binary\+Space\+Tree} class calls this the {\ttfamily Split\+Type} parameter). However, the syntax of template template parameters {\itshape requires} that the class only has the correct number of template parameters---no more, no less. Fortunately, C++11 allows template typedefs, which can be used to provide partial specialization of template classes\+:
\end{DoxyItemize}


\begin{DoxyCode}
\textcolor{comment}{// This is the definition of the BinarySpaceTree class, which has five template}
\textcolor{comment}{// parameters.}
\textcolor{keyword}{template}<\textcolor{keyword}{typename} MetricType,
         \textcolor{keyword}{typename} StatisticType,
         \textcolor{keyword}{typename} MatType,
         \textcolor{keyword}{typename} BoundType,
         \textcolor{keyword}{typename} SplitType>
\textcolor{keyword}{class }BinarySpaceTree;

\textcolor{comment}{// The 'using' keyword gives us a template typedef, so we can define the}
\textcolor{comment}{// MeanSplitKDTree template class, which has three parameters and is a valid}
\textcolor{comment}{// TreeType policy class.}
\textcolor{keyword}{template}<\textcolor{keyword}{typename} MetricType, \textcolor{keyword}{typename} StatisticType, \textcolor{keyword}{typename} MatType>
\textcolor{keyword}{using} MeanSplitKDTree = BinarySpaceTree<MetricType,
                                        StatisticType,
                                        MatType,
                                        HRectBound<MetricType>
                                        MeanSplit<BoundType, MetricType>>;
\end{DoxyCode}


Now, the {\ttfamily Mean\+Split\+K\+D\+Tree} class has only three template parameters and can be used as a {\ttfamily Tree\+Type} policy class in various mlpack algorithms. Many types of trees in mlpack have more than three template parameters and rely on template typedefs to provide simplified {\ttfamily Tree\+Type} interfaces.\section{The Tree\+Type A\+PI}\label{trees_treetype_api}
As a result of the definition of {\itshape space tree} in the previous section, a simplified A\+PI presents itself quite easily. However, more complex functionality is often necessary in mlpack, so this leads to more functions being necessary for a class to satisfy the {\ttfamily Tree\+Type} policy. Combining this with the template parameters required for trees given in the previous section gives us the complete A\+PI required for a class implementing the {\ttfamily Tree\+Type} policy. Below is the minimal set of functions required with minor documentation for each function. (More extensive documentation and explanation is given afterwards.)


\begin{DoxyCode}
\textcolor{comment}{// The three template parameters will be supplied by the user, and are detailed}
\textcolor{comment}{// in the previous section.}
\textcolor{keyword}{template}<\textcolor{keyword}{typename} MetricType,
         \textcolor{keyword}{typename} StatisticType,
         \textcolor{keyword}{typename} MatType>
\textcolor{keyword}{class }ExampleTree
\{
 \textcolor{keyword}{public}:

  \textcolor{comment}{// This batch constructor does not modify the dataset, and builds the entire}
  \textcolor{comment}{// tree using a default-constructed MetricType.}
  ExampleTree(\textcolor{keyword}{const} MatType& data);

  \textcolor{comment}{// This batch constructor does not modify the dataset, and builds the entire}
  \textcolor{comment}{// tree using the given MetricType.}
  ExampleTree(\textcolor{keyword}{const} MatType& data, MetricType& metric);

  \textcolor{comment}{// Initialize the tree from a given boost::serialization archive.  SFINAE (the}
  \textcolor{comment}{// second argument) is necessary to ensure that the archive is loading, not}
  \textcolor{comment}{// saving.}
  \textcolor{keyword}{template}<\textcolor{keyword}{typename} Archive>
  ExampleTree(
      Archive& ar,
      \textcolor{keyword}{const} \textcolor{keyword}{typename} boost::enable\_if<typename Archive::is\_loading>::type* = 0);

  \textcolor{comment}{// Release any resources held by the tree.}
  ~ExampleTree();

  \textcolor{comment}{// ///////////////////////// //}
  \textcolor{comment}{// // Basic functionality // //}
  \textcolor{comment}{// ///////////////////////// //}

  \textcolor{comment}{// Get the dataset that the tree is built on.}
  \textcolor{keyword}{const} MatType& Dataset();

  \textcolor{comment}{// Get the metric that the tree is built with.}
  MetricType& Metric();

  \textcolor{comment}{// Get/modify the StatisticType for this node.}
  StatisticType& Stat();

  \textcolor{comment}{// Return the parent of the node, or NULL if this is the root.}
  ExampleTree* Parent();

  \textcolor{comment}{// Return the number of children held by the node.}
  \textcolor{keywordtype}{size\_t} NumChildren();
  \textcolor{comment}{// Return the i'th child held by the node.}
  ExampleTree& Child(\textcolor{keyword}{const} \textcolor{keywordtype}{size\_t} i);

  \textcolor{comment}{// Return the number of points held in the node.}
  \textcolor{keywordtype}{size\_t} NumPoints();
  \textcolor{comment}{// Return the index of the i'th point held in the node.}
  \textcolor{keywordtype}{size\_t} Point(\textcolor{keyword}{const} \textcolor{keywordtype}{size\_t} i);

  \textcolor{comment}{// Return the number of descendant nodes of this node.}
  \textcolor{keywordtype}{size\_t} NumDescendantNodes();
  \textcolor{comment}{// Return the i'th descendant node of this node.}
  ExampleTree& DescendantNode(\textcolor{keyword}{const} \textcolor{keywordtype}{size\_t} i);

  \textcolor{comment}{// Return the number of descendant points of this node.}
  \textcolor{keywordtype}{size\_t} NumDescendants();
  \textcolor{comment}{// Return the index of the i'th descendant point of this node.}
  \textcolor{keywordtype}{size\_t} Descendant(\textcolor{keyword}{const} \textcolor{keywordtype}{size\_t} i);

  \textcolor{comment}{// Store the center of the bounding region of the node in the given vector.}
  \textcolor{keywordtype}{void} Center(arma::vec& center);

  \textcolor{comment}{// ///////////////////////////////////////////////// //}
  \textcolor{comment}{// // More complex distance-related functionality // //}
  \textcolor{comment}{// ///////////////////////////////////////////////// //}

  \textcolor{comment}{// Return the distance between the center of this node and the center of}
  \textcolor{comment}{// its parent.}
  \textcolor{keywordtype}{double} ParentDistance();

  \textcolor{comment}{// Return an upper bound on the furthest possible distance between the}
  \textcolor{comment}{// center of the node and any point held in the node.}
  \textcolor{keywordtype}{double} FurthestPointDistance();

  \textcolor{comment}{// Return an upper bound on the furthest possible distance between the}
  \textcolor{comment}{// center of the node and any descendant point of the node.}
  \textcolor{keywordtype}{double} FurthestDescendantDistance();

  \textcolor{comment}{// Return a lower bound on the minimum distance between the center and any}
  \textcolor{comment}{// edge of the node's bounding shape.}
  \textcolor{keywordtype}{double} MinimumBoundDistance();

  \textcolor{comment}{// Return a lower bound on the minimum distance between the given point and}
  \textcolor{comment}{// the node.}
  \textcolor{keyword}{template}<\textcolor{keyword}{typename} VecType>
  \textcolor{keywordtype}{double} MinDistance(VecType& point);

  \textcolor{comment}{// Return a lower bound on the minimum distance between the given node and}
  \textcolor{comment}{// this node.}
  \textcolor{keywordtype}{double} MinDistance(ExampleTree& otherNode);

  \textcolor{comment}{// Return an upper bound on the maximum distance between the given point and}
  \textcolor{comment}{// the node.}
  \textcolor{keyword}{template}<\textcolor{keyword}{typename} VecType>
  \textcolor{keywordtype}{double} MaxDistance(VecType& point);

  \textcolor{comment}{// Return an upper bound on the maximum distance between the given node and}
  \textcolor{comment}{// this node.}
  \textcolor{keywordtype}{double} MaxDistance(ExampleTree& otherNode);

  \textcolor{comment}{// Return the combined results of MinDistance() and MaxDistance().}
  \textcolor{keyword}{template}<\textcolor{keyword}{typename} VecType>
  math::Range RangeDistance(VecType& point);

  \textcolor{comment}{// Return the combined results of MinDistance() and MaxDistance().}
  math::Range RangeDistance(ExampleTree& otherNode);

  \textcolor{comment}{// //////////////////////////////////// //}
  \textcolor{comment}{// // Serialization (loading/saving) // //}
  \textcolor{comment}{// //////////////////////////////////// //}

  \textcolor{comment}{// Return a string representation of the tree.}
  std::string ToString() \textcolor{keyword}{const};

  \textcolor{comment}{// Serialize the tree (load from the given archive / save to the given}
  \textcolor{comment}{// archive, depending on its type).}
  \textcolor{keyword}{template}<\textcolor{keyword}{typename} Archive>
  \textcolor{keywordtype}{void} Serialize(Archive& ar, \textcolor{keyword}{const} \textcolor{keywordtype}{unsigned} \textcolor{keywordtype}{int} version);

 \textcolor{keyword}{protected}:
  \textcolor{comment}{// A default constructor; only meant to be used by boost::serialization.  This}
  \textcolor{comment}{// must be protected so that boost::serialization will work; it does not need}
  \textcolor{comment}{// to return a valid tree.}
  ExampleTree();

  \textcolor{comment}{// Friend access must be given for the default constructor.}
  \textcolor{keyword}{friend} \textcolor{keyword}{class }boost::serialization::access;
\};
\end{DoxyCode}


Although this is significantly more complex than the four-\/item definition of space tree$\ast$ might suggest, it turns out many of these methods are not difficult to implement for most reasonable tree types. It is also important to realize that this is a {\itshape minimum} A\+PI; you may implement more complex tree types at your leisure (and you may include more template parameters too, though you will have to use template typedefs to provide versions with three parameters; see \doxyref{the previous section}{p.}{trees_treetype_template_params}).

Before diving into the detailed documentation for each function, let us consider a few important points about the implications of this A\+PI\+:


\begin{DoxyItemize}
\item {\bfseries Trees are not default-\/constructible} and should not (in general) provide a default constructor. This helps prevent invalid trees. In general, any instantiated mlpack object should be valid and ready to use---and a tree built on no points is not valid or ready to use.
\item {\bfseries Trees only need to provide batch constructors.} Although many tree types do have algorithms for incremental insertions, in mlpack this is not required because the tree-\/based algorithms that mlpack implements generally assume fully-\/built, non-\/modifiable trees. For this purpose, batch construction is perfectly sufficient. (It\textquotesingle{}s also worth pointing out that for some types of trees, like kd-\/trees, the cost of a handful of insertions often outweighs the cost of completely rebuilding the tree.)
\item {\bfseries Trees must provide a number of distance bounding functions.} The utility of trees generally stems from the ability to place quick bounds on distance-\/related quantities. For instance, if all the descendant points of a node are bounded by a ball of radius $\lambda$ and the center of the node is a point $c$, then the minimum distance between some point $p$ and any descendant point of the node is equal to the distance between $p$ and $c$ minus the radius $\lambda$\+: $d(p, c) - \lambda$. This is a fast calculation, and (usually) provides a decent bound on the minimum distance between $p$ and any descendant point of the node.
\item {\bfseries Trees need to be able to be serialized.} mlpack uses the \doxyref{boost\+::serialization}{p.}{namespaceboost_1_1serialization} library for saving and loading objects. Trees---which can be a part of machine learning models---therefore must have the ability to be saved and loaded. Making this all work requires a protected constructor (part of the A\+PI) and generally makes it impossible to hold references instead of pointers internally, because if a tree is loaded from a file then it must own the dataset it is built on and the metric it uses (this also means that a destructor must exist for freeing these resources).
\end{DoxyItemize}

Now, we can consider each part of the A\+PI more rigorously.\section{Rigorous A\+P\+I documentation}\label{trees_treetype_rigorous}
This section is divided into five parts\+:


\begin{DoxyItemize}
\item \doxyref{Template parameters}{p.}{trees_treetype_rigorous_template}
\item \doxyref{Constructors and destructors}{p.}{trees_treetype_rigorous_constructor}
\item \doxyref{Basic tree functionality}{p.}{trees_treetype_rigorous_basic}
\item \doxyref{Complex tree functionality and bounds}{p.}{trees_treetype_rigorous_complex}
\item \doxyref{Serialization}{p.}{trees_treetype_rigorous_serialization}
\end{DoxyItemize}\subsection{Template parameters}\label{trees_treetype_rigorous_template}
An earlier section discussed the three different template parameters that are required by the {\ttfamily Tree\+Type} policy.

The \doxyref{Metric\+Type policy}{p.}{metrics} provides one method that will be useful for tree building and other operations\+:


\begin{DoxyCode}
\textcolor{comment}{// This function is required by the MetricType policy.}
\textcolor{comment}{// Evaluate the metric between two points (which may be of different types).}
\textcolor{keyword}{template}<\textcolor{keyword}{typename} VecTypeA, \textcolor{keyword}{typename} VecTypeB>
\textcolor{keywordtype}{double} Evaluate(\textcolor{keyword}{const} VecTypeA& a, \textcolor{keyword}{const} VecTypeB& b);
\end{DoxyCode}


Note that this method is not necessarily static, so a {\ttfamily Metric\+Type} object should be held internally and its {\ttfamily Evaluate()} method should be called whenever the distance between two points is required. {\bfseries It is generally a bad idea to hardcode any distance calculation in your tree.} This will make the tree unable to generalize to arbitrary metrics. If your tree must depend on certain assumptions holding about the metric (i.\+e. the metric is a Euclidean metric), then make that clear in the documentation of the tree, so users do not try to use the tree with an inappropriate metric.

The second template parameter, {\ttfamily Statistic\+Type}, is for auxiliary information that is required by certain algorithms. For instance, consider an algorithm which repeatedly uses the variance of the descendant points of a node. It might be tempting to add a {\ttfamily Variance()} method to the required {\ttfamily Tree\+Type} A\+PI, but this quickly leads to code bloat (after all, the A\+PI already has quite enough functions as it is). Instead, it is better to create a {\ttfamily Statistic\+Type} class which provides the {\ttfamily Variance()} method, and then call {\ttfamily Stat()}.Variance() when the variance is required. This also holds true for cached data members.

Each node should have its own instance of a {\ttfamily Statistic\+Type} class. The {\ttfamily Statistic\+Type} must provide the following constructors\+:


\begin{DoxyCode}
\textcolor{comment}{// Default constructor required by the StatisticType policy.}
StatisticType();

\textcolor{comment}{// This constructor is required by the StatisticType policy.}
\textcolor{keyword}{template}<\textcolor{keyword}{typename} TreeType>
StatisticType(TreeType& node);
\end{DoxyCode}


This constructor should be called with {\ttfamily }($\ast$this) after the node is constructed (usually, this ends up being the last line in the constructor of a node).

The last template parameter is the {\ttfamily Mat\+Type} parameter. This is generally {\ttfamily arma\+::mat} or {\ttfamily arma\+::sp\+\_\+mat}, but could be any Armadillo type, including matrices that hold data points of different precisions (such as {\ttfamily float} or even {\ttfamily int}). It generally suffices to write {\ttfamily Mat\+Type} assuming that {\ttfamily arma\+::mat} will be used, since the vast majority of the time this will be what is used.\subsection{Constructors and destructors}\label{trees_treetype_rigorous_constructor}
The {\ttfamily Tree\+Type} A\+PI requires at least three constructors. Technically, it does not {\itshape require} a destructor, but almost certainly your tree class will be doing some memory management internally and should have one (though not always).

The first two constructors are variations of the same idea\+:


\begin{DoxyCode}
\textcolor{comment}{// This batch constructor does not modify the dataset, and builds the entire}
\textcolor{comment}{// tree using a default-constructed MetricType.}
ExampleTree(\textcolor{keyword}{const} MatType& data);

\textcolor{comment}{// This batch constructor does not modify the dataset, and builds the entire}
\textcolor{comment}{// tree using the given MetricType.}
ExampleTree(\textcolor{keyword}{const} MatType& data, MetricType& metric);
\end{DoxyCode}


All that is required here is that a constructor is available that takes a dataset and optionally an instantiated metric. If no metric is provided, then it should be assumed that the {\ttfamily Metric\+Type} class has a default constructor and a default-\/constructed metric should be used. The constructor {\itshape must} return a valid, fully-\/constructed, ready-\/to-\/use tree that satisfies the definition of {\itshape space tree} that was \doxyref{given earlier}{p.}{trees_whatistree}.

It is possible to implement both these constructors as one by using {\ttfamily boost\+::optional}.

The third constructor requires the tree to be initializable from a {\ttfamily \doxyref{boost\+::serialization}{p.}{namespaceboost_1_1serialization}} archive\+:


\begin{DoxyCode}
\textcolor{comment}{// Initialize the tree from a given boost::serialization archive.  SFINAE (the}
\textcolor{comment}{// second argument) is necessary to ensure that the archive is loading, not}
\textcolor{comment}{// saving.}
\textcolor{keyword}{template}<\textcolor{keyword}{typename} Archive>
ExampleTree(
    Archive& ar,
    \textcolor{keyword}{const} \textcolor{keyword}{typename} boost::enable\_if<typename Archive::is\_loading>::type* = 0);
\end{DoxyCode}


This has implications on how the tree must be stored. In this case, the dataset is {\itshape not yet loaded} and therefore the tree {\bfseries may be required to have ownership of the data matrix}. This means that realistically the most reasonable way to represent the data matrix internally in a tree class is not with a reference but instead with a pointer. If this is true, then a destructor will be required\+:


\begin{DoxyCode}
\textcolor{comment}{// Release any resources held by the tree.}
~ExampleTree();
\end{DoxyCode}


and, if the data matrix is represented internally with a pointer, this destructor will need to release the memory for the data matrix (in the case that the tree was created via {\ttfamily \doxyref{boost\+::serialization}{p.}{namespaceboost_1_1serialization}} ).

Note that these constructors are not necessarily the only constructors that a {\ttfamily Tree\+Type} implementation can provide. One important example of when more constructors are useful is when the tree rearranges points internally; this might be desired for the sake of speed or memory optimization. But to do this with the required constructors would necessarily incur a copy of the data matrix, because the user will pass a {\ttfamily \char`\"{}const Mat\+Type\&\char`\"{}}. One alternate solution is to provide a constructor which takes an rvalue reference to a {\ttfamily Mat\+Type\+:} 


\begin{DoxyCode}
\textcolor{keyword}{template}<\textcolor{keyword}{typename} Archive>
ExampleTree(MatType&& data);
\end{DoxyCode}


(and another overload that takes an instantiated metric), and then the user can use {\ttfamily std\+::move()} to build the tree without copying the data matrix, although the data matrix will be modified\+:


\begin{DoxyCode}
ExampleTree exTree(std::move(dataset));
\end{DoxyCode}


It is, of course, possible to add even more constructors if desired.\subsection{Basic tree functionality}\label{trees_treetype_rigorous_basic}
The basic functionality of a class implementing the {\ttfamily Tree\+Type} A\+PI is quite straightforward and intuitive.


\begin{DoxyCode}
\textcolor{comment}{// Get the dataset that the tree is built on.}
\textcolor{keyword}{const} MatType& Dataset();
\end{DoxyCode}


This should return a {\ttfamily const} reference to the dataset the tree is built on. The fact that this function is required essentially means that each node in the tree must store a pointer to the dataset (this is not the only option, but it is the most obvious option).


\begin{DoxyCode}
\textcolor{comment}{// Get the metric that the tree is built with.}
MetricType& Metric();
\end{DoxyCode}


Each node must also store an instantiated metric or a pointer to one (note that this is required even for metrics that have no state and have a {\ttfamily static} {\ttfamily Evaluate()} function).


\begin{DoxyCode}
\textcolor{comment}{// Get/modify the StatisticType for this node.}
StatisticType& Stat();
\end{DoxyCode}


As discussed earlier, each node must hold a {\ttfamily Statistic\+Type}; this is accessible through the {\ttfamily Stat()} function.


\begin{DoxyCode}
\textcolor{comment}{// Return the parent of the node, or NULL if this is the root.}
ExampleTree* Parent();

\textcolor{comment}{// Return the number of children held by the node.}
\textcolor{keywordtype}{size\_t} NumChildren();
\textcolor{comment}{// Return the i'th child held by the node.}
ExampleTree& Child(\textcolor{keyword}{const} \textcolor{keywordtype}{size\_t} i);

\textcolor{comment}{// Return the number of points held in the node.}
\textcolor{keywordtype}{size\_t} NumPoints();
\textcolor{comment}{// Return the index of the i'th point held in the node.}
\textcolor{keywordtype}{size\_t} Point(\textcolor{keyword}{const} \textcolor{keywordtype}{size\_t} i);

\textcolor{comment}{// Return the number of descendant nodes of this node.}
\textcolor{keywordtype}{size\_t} NumDescendantNodes();
\textcolor{comment}{// Return the i'th descendant node of this node.}
ExampleTree& DescendantNode(\textcolor{keyword}{const} \textcolor{keywordtype}{size\_t} i);

\textcolor{comment}{// Return the number of descendant points of this node.}
\textcolor{keywordtype}{size\_t} NumDescendants();
\textcolor{comment}{// Return the index of the i'th descendant point of this node.}
\textcolor{keywordtype}{size\_t} Descendant(\textcolor{keyword}{const} \textcolor{keywordtype}{size\_t} i);
\end{DoxyCode}


These functions are all fairly self-\/explanatory. Most algorithms will use the {\ttfamily Parent()}, {\ttfamily Children()}, {\ttfamily Num\+Children()}, {\ttfamily Point()}, and {\ttfamily Num\+Points()} functions, so care should be taken when implementing those functions to ensure they will be efficient. Note that {\ttfamily Point()} and {\ttfamily Descendant()} should return indices of points, so the actual points can be accessed by calling {\ttfamily \char`\"{}\+Dataset().\+col(\+Point(i))\char`\"{}} for some index {\ttfamily i} (or something similar).

An important note about the {\ttfamily Descendant()} function is that each descendant point should be unique. So if a node holds the point with index 6 and it has one child that holds the points with indices 6 and 7, then {\ttfamily Num\+Descendants()} should return 2, not 3. The ordering in which the descendants are returned can be arbitrary; so, {\ttfamily Descendant(0)} can return 6 {\bfseries or} 7, and {\ttfamily Descendant(1)} should return the other index.


\begin{DoxyCode}
\textcolor{comment}{// Store the center of the bounding region of the node in the given vector.}
\textcolor{keywordtype}{void} Center(arma::vec& center);
\end{DoxyCode}


The last function, {\ttfamily \doxyref{Center()}{p.}{namespacemlpack_1_1math_af20ca29adeac02601e8f4386bda3588e}}, should calculate the center of the bounding shape and store it in the given vector. So, for instance, if the tree is a ball tree, then the center is simply the center of the ball. Algorithm writers would be wise to try and avoid the use of {\ttfamily \doxyref{Center()}{p.}{namespacemlpack_1_1math_af20ca29adeac02601e8f4386bda3588e}} if possible, since it will necessarily cost a copy of a vector.\subsection{Complex tree functionality and bounds}\label{trees_treetype_rigorous_complex}
A node in a tree should also be able to calculate various distance-\/related bounds; these are particularly useful in tree-\/based algorithms. Note that any of these bounds does not necessarily need to be maximally tight; generally it is more important that each bound can be easily calculated.

Details on each bounding function that the {\ttfamily Tree\+Type} A\+PI requires are given below.


\begin{DoxyCode}
\textcolor{comment}{// Return the distance between the center of this node and the center of}
\textcolor{comment}{// its parent.}
\textcolor{keywordtype}{double} ParentDistance();
\end{DoxyCode}


Remember that each node corresponds to some region in the space that the dataset lies in. For most tree types this shape is often something geometrically simple\+: a ball, a cone, a hyperrectangle, a slice, or something similar. The {\ttfamily Parent\+Distance()} function should return the distance between the center of this node\textquotesingle{}s region and the center of the parent node\textquotesingle{}s region.

In practice this bound is often used in dual-\/tree (or single-\/tree) algorithms to place an easy {\ttfamily Min\+Distance()} (or {\ttfamily Max\+Distance()} ) bound for a child node; the parent\textquotesingle{}s {\ttfamily Min\+Distance()} (or {\ttfamily Max\+Distance()} ) function is called and then adjusted with {\ttfamily Parent\+Distance()} to provide a possibly loose but efficient bound on what the result of {\ttfamily Min\+Distance()} (or {\ttfamily Max\+Distance()} ) would be with the child.


\begin{DoxyCode}
\textcolor{comment}{// Return an upper bound on the furthest possible distance between the}
\textcolor{comment}{// center of the node and any point held in the node.}
\textcolor{keywordtype}{double} FurthestPointDistance();

\textcolor{comment}{// Return an upper bound on the furthest possible distance between the}
\textcolor{comment}{// center of the node and any descendant point of the node.}
\textcolor{keywordtype}{double} FurthestDescendantDistance();
\end{DoxyCode}


It is often very useful to be able to bound the radius of a node, which is effectively what {\ttfamily Furthest\+Descendant\+Distance()} does. Often it is easiest to simply calculate and cache the furthest descendant distance at tree construction time. Some trees, such as the cover tree, are able to give guarantees that the points held in the node will necessarily be closer than the descendant points; therefore, the {\ttfamily Furthest\+Point\+Distance()} function is also useful.

It is permissible to simply have {\ttfamily Furthest\+Point\+Distance()} return the result of {\ttfamily Furthest\+Descendant\+Distance()}, and that will still be a valid bound, but depending on the type of tree it may be possible to have {\ttfamily Furthest\+Point\+Distance()} return a tighter bound.


\begin{DoxyCode}
\textcolor{comment}{// Return a lower bound on the minimum distance between the center and any}
\textcolor{comment}{// edge of the node's bounding shape.}
\textcolor{keywordtype}{double} MinimumBoundDistance();
\end{DoxyCode}


This is, admittedly, a somewhat complex and weird quantity. It is one of the less important bounding functions, so it is valid to simply return 0...

The bound is a bound on the minimum distance between the center of the node and any edge of the shape that bounds all of the descendants of the node. So, if the bounding shape is a ball (as in a ball tree or a cover tree), then {\ttfamily Minimum\+Bound\+Distance()} should just return the radius of the ball. If the bounding shape is a hypercube (as in a generalized octree), then {\ttfamily Minimum\+Bound\+Distance()} should return the side length divided by two. If the bounding shape is a hyperrectangle (as in a kd-\/tree or a spill tree), then {\ttfamily Minimum\+Bound\+Distance()} should return half the side length of the hyperrectangle\textquotesingle{}s smallest side.


\begin{DoxyCode}
\textcolor{comment}{// Return a lower bound on the minimum distance between the given point and}
\textcolor{comment}{// the node.}
\textcolor{keyword}{template}<\textcolor{keyword}{typename} VecType>
\textcolor{keywordtype}{double} MinDistance(VecType& point);

\textcolor{comment}{// Return a lower bound on the minimum distance between the given node and}
\textcolor{comment}{// this node.}
\textcolor{keywordtype}{double} MinDistance(ExampleTree& otherNode);

\textcolor{comment}{// Return an upper bound on the maximum distance between the given point and}
\textcolor{comment}{// the node.}
\textcolor{keyword}{template}<\textcolor{keyword}{typename} VecType>
\textcolor{keywordtype}{double} MaxDistance(VecType& point);

\textcolor{comment}{// Return an upper bound on the maximum distance between the given node and}
\textcolor{comment}{// this node.}
\textcolor{keywordtype}{double} MaxDistance(ExampleTree& otherNode);

\textcolor{comment}{// Return the combined results of MinDistance() and MaxDistance().}
\textcolor{keyword}{template}<\textcolor{keyword}{typename} VecType>
math::Range RangeDistance(VecType& point);

\textcolor{comment}{// Return the combined results of MinDistance() and MaxDistance().}
math::Range RangeDistance(ExampleTree& otherNode);
\end{DoxyCode}


These six functions are almost without a doubt the most important functionality of a tree. Therefore, it is preferable that these methods be implemented as efficiently as possible, as they may potentially be called many millions of times in a tree-\/based algorithm. It is also preferable that these bounds be as tight as possible. In tree-\/based algorithms, these are used for pruning away work, and tighter bounds mean that more pruning is possible.

Of these six functions, there are only really two bounds that are desired here\+: the {\itshape minimum distance} between a node and an object, and the {\itshape maximum distance} between a node and an object. The object may be either a vector (usually {\ttfamily arma\+::vec} ) or another tree node.

Consider the first case, where the object is a vector. The result of {\ttfamily Min\+Distance()} needs to be less than or equal to the true minimum distance, which could be calculated as below\+:


\begin{DoxyCode}
\textcolor{comment}{// We assume that we have a vector 'vec', and a tree node 'node'.}
\textcolor{keywordtype}{double} trueMinDist = DBL\_MAX;
\textcolor{keywordflow}{for} (\textcolor{keywordtype}{size\_t} i = 0; i < node.NumDescendants(); ++i)
\{
  \textcolor{keyword}{const} \textcolor{keywordtype}{double} dist = node.Metric().Evaluate(vec,
      node.Dataset().col(node.Descendant(i)));
  \textcolor{keywordflow}{if} (dist < trueMinDist)
    trueMinDist = dist;
\}
\textcolor{comment}{// At the end of the loop, trueMinDist will hold the true minimum distance}
\textcolor{comment}{// between 'vec' and any descendant point of 'node'.}
\end{DoxyCode}


Often the bounding shape of a node will allow a quick calculation that will make a reasonable bound. For instance, if the node\textquotesingle{}s bounding shape is a ball with radius {\ttfamily r} and center {\ttfamily ctr}, the calculation is simply {\ttfamily \char`\"{}(node.\+Metric().\+Evaluate(vec, ctr) -\/ r)\char`\"{}}. Usually a good {\ttfamily Min\+Distance()} or {\ttfamily Max\+Distance()} function will make only one call to the {\ttfamily Evaluate()} function of the metric.

The {\ttfamily Range\+Distance()} function allows a way for both bounds to be calculated at once. It is possible to implement this as a call to {\ttfamily Min\+Distance()} followed by a call to {\ttfamily Max\+Distance()}, but this may incur more metric {\ttfamily Evaluate()} calls than necessary. Often calculating both bounds at once can be more efficient and can be done with fewer {\ttfamily Evaluate()} calls than calling both {\ttfamily Min\+Distance()} and {\ttfamily Max\+Distance()}.\subsection{Serialization}\label{trees_treetype_rigorous_serialization}
The last two public functions that the {\ttfamily Tree\+Type} A\+PI requires are related to serialization and printing.


\begin{DoxyCode}
\textcolor{comment}{// Return a string representation of the tree.}
std::string ToString() \textcolor{keyword}{const};
\end{DoxyCode}


There are few restrictions on the precise way that the {\ttfamily To\+String()} function should operate, but generally it should behave similarly to the {\ttfamily To\+String()} function in other mlpack methods. Generally, a user will call {\ttfamily To\+String()} when they want to inspect the object and see what it looks like. For a tree, printing the entire tree may be way more information than the user was expecting, so it may be a better option to print either only the node itself or the node plus one or two levels of children.


\begin{DoxyCode}
\textcolor{comment}{// Serialize the tree (load from the given archive / save to the given}
\textcolor{comment}{// archive, depending on its type).}
\textcolor{keyword}{template}<\textcolor{keyword}{typename} Archive>
\textcolor{keywordtype}{void} Serialize(Archive& ar, \textcolor{keyword}{const} \textcolor{keywordtype}{unsigned} \textcolor{keywordtype}{int} version);

\textcolor{keyword}{protected}:
\textcolor{comment}{// A default constructor; only meant to be used by boost::serialization.  This}
\textcolor{comment}{// must be protected so that boost::serialization will work; it does not need}
\textcolor{comment}{// to return a valid tree.}
ExampleTree();

\textcolor{comment}{// Friend access must be given for the default constructor.}
\textcolor{keyword}{friend} \textcolor{keyword}{class }boost::serialization::access;
\end{DoxyCode}


On the other hand, the specifics of the functionality required for the {\ttfamily Serialize()} function are somewhat more difficult. The {\ttfamily Serialize()} function will be called either when a tree is being saved to disk or loaded from disk. The {\ttfamily \doxyref{boost\+::serialization}{p.}{namespaceboost_1_1serialization}} documentation is fairly comprehensive, but when writing a {\ttfamily Serialize()} method for mlpack trees you should use {\ttfamily data\+::\+Create\+N\+V\+P()} instead of {\ttfamily B\+O\+O\+S\+T\+\_\+\+S\+E\+R\+I\+A\+L\+I\+Z\+A\+T\+I\+O\+N\+\_\+\+N\+V\+P()}. This is because mlpack classes implement {\ttfamily Serialize()} instead of {\ttfamily serialize()} in order to conform to the mlpack style guidelines, and making this work requires some interesting shim code, which is hidden inside of {\ttfamily data\+::\+Create\+N\+V\+P()}. It may be useful to look at other {\ttfamily Serialize()} methods contained in other mlpack classes as an example.

An important note is that it is very difficult to use references with {\ttfamily \doxyref{boost\+::serialization}{p.}{namespaceboost_1_1serialization}}, because {\ttfamily Serialize()} may be called at any time during the object\textquotesingle{}s lifetime, and references cannot be re-\/seated. In general this will require the use of pointers, which then require manual memory management. Therefore, be careful that {\ttfamily Serialize()} (and the tree\textquotesingle{}s destructor) properly handle memory management!\section{The Tree\+Traits trait class}\label{trees_treetype_traits}
Some tree-\/based algorithms can specialize if the tree fulfills certain conditions. For instance, if the regions represented by two sibling nodes cannot overlap, an algorithm may be able to perform a simpler computation. Based on this reasoning, the {\ttfamily Tree\+Traits} trait class (much like the \doxyref{mlpack\+::kernel\+::\+Kernel\+Traits}{p.}{classmlpack_1_1kernel_1_1KernelTraits} class) exists in order to allow a tree to specify (via a {\ttfamily const} {\ttfamily static} {\ttfamily bool}) when these types of conditions are satisfied. {\bfseries Note that a Tree\+Traits class is not required,} but may be helpful.

The {\ttfamily Tree\+Traits} trait class is a template class that takes a {\ttfamily Tree\+Type} as a parameter, and exposes {\ttfamily const} {\ttfamily static} {\ttfamily bool} values that depend on the tree. Setting these values is achieved by specialization. The code below shows the default {\ttfamily Tree\+Traits} values (these are the values that will be used if no specialization is provided for a given {\ttfamily Tree\+Type}).


\begin{DoxyCode}
\textcolor{keyword}{template}<\textcolor{keyword}{typename} TreeType>
\textcolor{keyword}{class }TreeTraits
\{
 \textcolor{keyword}{public}:
  \textcolor{comment}{// This is true if the subspaces represented by the children of a node can}
  \textcolor{comment}{// overlap.}
  \textcolor{keyword}{static} \textcolor{keyword}{const} \textcolor{keywordtype}{bool} HasOverlappingChildren = \textcolor{keyword}{true};

  \textcolor{comment}{// This is true if Point(0) is the centroid of the node.}
  \textcolor{keyword}{static} \textcolor{keyword}{const} \textcolor{keywordtype}{bool} FirstPointIsCentroid = \textcolor{keyword}{false};

  \textcolor{comment}{// This is true if the points contained in the first child of a node}
  \textcolor{comment}{// (Child(0)) are also contained in that node.}
  \textcolor{keyword}{static} \textcolor{keyword}{const} \textcolor{keywordtype}{bool} HasSelfChildren = \textcolor{keyword}{false};

  \textcolor{comment}{// This is true if the tree rearranges points in the dataset when it is built.}
  \textcolor{keyword}{static} \textcolor{keyword}{const} \textcolor{keywordtype}{bool} RearrangesDataset = \textcolor{keyword}{false};

  \textcolor{comment}{// This is true if the tree always has only two children.}
  \textcolor{keyword}{static} \textcolor{keyword}{const} \textcolor{keywordtype}{bool} BinaryTree = \textcolor{keyword}{false};
\};
\end{DoxyCode}


An example specialization for the \doxyref{mlpack\+::tree\+::\+K\+D\+Tree}{p.}{namespacemlpack_1_1tree_a73c2146f8d1da65d927c7746bfe7e750} class is given below. Note that \doxyref{mlpack\+::tree\+::\+K\+D\+Tree}{p.}{namespacemlpack_1_1tree_a73c2146f8d1da65d927c7746bfe7e750} is itself a template class (like every class satisfying the {\ttfamily Tree\+Type} policy), so we are specializing to a template parameter.


\begin{DoxyCode}
\textcolor{keyword}{template}<\textcolor{keyword}{typename} MetricType,
         \textcolor{keyword}{typename} StatisticType,
         \textcolor{keyword}{typename} MatType>
\textcolor{keyword}{template}<>
\textcolor{keyword}{class }TreeTraits<KDTree<MetricType, StatisticType, MatType>>
\{
 \textcolor{keyword}{public}:
  \textcolor{comment}{// The regions represented by the two children of a node may not overlap.}
  \textcolor{keyword}{static} \textcolor{keyword}{const} \textcolor{keywordtype}{bool} HasOverlappingChildren = \textcolor{keyword}{false};

  \textcolor{comment}{// There is no guarantee that the first point of a node is the centroid.}
  \textcolor{keyword}{static} \textcolor{keyword}{const} \textcolor{keywordtype}{bool} FirstPointIsCentroid = \textcolor{keyword}{false};

  \textcolor{comment}{// Points are not contained at multiple levels (only at the leaves).}
  \textcolor{keyword}{static} \textcolor{keyword}{const} \textcolor{keywordtype}{bool} HasSelfChildren = \textcolor{keyword}{false};

  \textcolor{comment}{// Points are rearranged during the building of the tree.}
  \textcolor{keyword}{static} \textcolor{keyword}{const} \textcolor{keywordtype}{bool} RearrangesDataset = \textcolor{keyword}{true};

  \textcolor{comment}{// The tree is always binary.}
  \textcolor{keyword}{static} \textcolor{keyword}{const} \textcolor{keywordtype}{bool} BinaryTree = \textcolor{keyword}{true};
\};
\end{DoxyCode}


Currently, the traits available are each of the five detailed above. For more information, see the \doxyref{mlpack\+::tree\+::\+Tree\+Traits}{p.}{classmlpack_1_1tree_1_1TreeTraits} documentation.\section{A list of trees in mlpack and more information}\label{trees_treetype_more}
mlpack contains several ready-\/to-\/use implementations of trees that satisfy the Tree\+Type policy A\+PI\+:


\begin{DoxyItemize}
\item \doxyref{mlpack\+::tree\+::\+K\+D\+Tree}{p.}{namespacemlpack_1_1tree_a73c2146f8d1da65d927c7746bfe7e750}
\item \doxyref{mlpack\+::tree\+::\+Mean\+Split\+K\+D\+Tree}{p.}{namespacemlpack_1_1tree_a1028e6acf1fc61997237d3677cae0947}
\item \doxyref{mlpack\+::tree\+::\+Ball\+Tree}{p.}{namespacemlpack_1_1tree_a9d4905444011bbd045122cc985638b32}
\item \doxyref{mlpack\+::tree\+::\+Mean\+Split\+Ball\+Tree}{p.}{namespacemlpack_1_1tree_a530d041f3f210c6097891301478e10bd}
\item \doxyref{mlpack\+::tree\+::\+R\+Tree}{p.}{namespacemlpack_1_1tree_ae4af35641769744ba680cc934e1c1f0e}
\item \doxyref{mlpack\+::tree\+::\+R\+Star\+Tree}{p.}{namespacemlpack_1_1tree_a879db9c5c88d62f13f4a1667bc5adf5c}
\item \doxyref{mlpack\+::tree\+::\+Standard\+Cover\+Tree}{p.}{namespacemlpack_1_1tree_a6ed9d585969e7837af0d41e0c3975602}
\end{DoxyItemize}

Often, these are template typedefs of more flexible tree classes\+:


\begin{DoxyItemize}
\item \doxyref{mlpack\+::tree\+::\+Binary\+Space\+Tree}{p.}{classmlpack_1_1tree_1_1BinarySpaceTree} -- binary trees, such as the K\+D-\/tree and ball tree
\item \doxyref{mlpack\+::tree\+::\+Rectangle\+Tree}{p.}{classmlpack_1_1tree_1_1RectangleTree} -- the R tree and variants
\item \doxyref{mlpack\+::tree\+::\+Cover\+Tree}{p.}{classmlpack_1_1tree_1_1CoverTree} -- the cover tree and variants 
\end{DoxyItemize}