\section{Introduction}\label{fmkstutorial_intro_fmkstut}
The Fast\+M\+KS algorithm (fast exact max-\/kernel search) is a recent algorithm proposed in the following papers\+:


\begin{DoxyCode}
@inproceedings\{curtin2013fast,
  title=\{Fast Exact Max-Kernel Search\},
  author=\{Curtin, Ryan R. and Ram, Parikshit and Gray, Alexander G.\},
  booktitle=\{Proceedings of the 2013 SIAM International Conference on Data
      Mining (SDM \textcolor{stringliteral}{'13)\},}
\textcolor{stringliteral}{  year=\{2013\},}
\textcolor{stringliteral}{  pages=\{1--9\}}
\textcolor{stringliteral}{\}}
\textcolor{stringliteral}{}
\textcolor{stringliteral}{@article\{curtin2014dual,}
\textcolor{stringliteral}{  author = \{Curtin, Ryan R. and Ram, Parikshit\},}
\textcolor{stringliteral}{  title = \{Dual-tree fast exact max-kernel search\},}
\textcolor{stringliteral}{  journal = \{Statistical Analysis and Data Mining\},}
\textcolor{stringliteral}{  volume = \{7\},}
\textcolor{stringliteral}{  number = \{4\},}
\textcolor{stringliteral}{  publisher = \{Wiley Subscription Services, Inc., A Wiley Company\},}
\textcolor{stringliteral}{  issn = \{1932-1872\},}
\textcolor{stringliteral}{  url = \{http://dx.doi.org/10.1002/sam.11218\},}
\textcolor{stringliteral}{  doi = \{10.1002/sam.11218\},}
\textcolor{stringliteral}{  pages = \{229--253\},}
\textcolor{stringliteral}{  year = \{2014\},}
\textcolor{stringliteral}{\}}
\end{DoxyCode}


Given a set of query points $Q$ and a set of reference points $R$, the Fast\+M\+KS algorithm is a fast dual-\/tree (or single-\/tree) algorithm which finds

\[ \arg\max_{p_r \in R} K(p_q, p_r) \]

for all points $p_q \in Q$ and for some Mercer kernel $K(\cdot, \cdot)$. A Mercer kernel is a kernel that is positive semidefinite; these are the classes of kernels that can be used with the kernel trick. In short, the positive semidefiniteness of a Mercer kernel means that any kernel matrix (or Gram matrix) created on a dataset must be positive semidefinite.

The Fast\+M\+KS algorithm builds trees on the datasets $Q$ and $R$ in such a way that explicit representation of the points in the kernel space is unnecessary, by using cover trees (\doxyref{mlpack\+::tree\+::\+Cover\+Tree}{p.}{classmlpack_1_1tree_1_1CoverTree}). This allows the algorithm to be run, for instance, on string kernels, where there is no sensible explicit representation. The {\bfseries mlpack} implementation allows any type of tree that does not require an explicit representation to be used. For more details, see the paper.

At the time of this writing there is no other fast algorithm for exact max-\/kernel search. {\bfseries mlpack} implements both single-\/tree and dual-\/tree fast max-\/kernel search.

{\bfseries mlpack} provides\+:


\begin{DoxyItemize}
\item a \doxyref{simple command-\/line executable}{p.}{fmkstutorial_cli_fmkstut} to run Fast\+M\+KS
\item a \doxyref{C++ interface}{p.}{fmkstutorial_fastmks_fmkstut} to run Fast\+M\+KS
\end{DoxyItemize}\section{Table of Contents}\label{fmkstutorial_toc_fmkstut}
A list of all the sections this tutorial contains.


\begin{DoxyItemize}
\item \doxyref{Introduction}{p.}{fmkstutorial_intro_fmkstut}
\item \doxyref{Table of Contents}{p.}{fmkstutorial_toc_fmkstut}
\item \doxyref{Command-\/line Fast\+M\+KS (mlpack\+\_\+fastmks)}{p.}{fmkstutorial_cli_fmkstut}
\begin{DoxyItemize}
\item \doxyref{Fast\+M\+KS with a linear kernel on one dataset}{p.}{fmkstutorial_cli_ex1_fmkstut}
\item \doxyref{Fast\+M\+KS on a reference and query dataset}{p.}{fmkstutorial_cli_ex2_fmkstut}
\item \doxyref{Fast\+M\+KS with a different kernel}{p.}{fmkstutorial_cli_ex3_fmkstut}
\item \doxyref{Using single-\/tree search or naive search}{p.}{fmkstutorial_cli_ex4_fmkstut}
\item \doxyref{Parameters for alternate kernels}{p.}{fmkstutorial_cli_ex5_fmkstut}
\item \doxyref{Saving a Fast\+M\+KS model/tree}{p.}{fmkstutorial_cli_ex6_fmkstut}
\item \doxyref{Loading a Fast\+M\+KS model for further searches}{p.}{fmkstutorial_cli_ex7_fmkstut}
\end{DoxyItemize}
\item \doxyref{The \textquotesingle{}Fast\+M\+KS\textquotesingle{} class}{p.}{fmkstutorial_fastmks_fmkstut}
\begin{DoxyItemize}
\item \doxyref{Fast\+M\+KS on one dataset}{p.}{fmkstutorial_fastmks_ex1_fmkstut}
\item \doxyref{Fast\+M\+KS with a query and reference dataset}{p.}{fmkstutorial_fastmks_ex2_fmkstut}
\item \doxyref{Fast\+M\+KS with an initialized kernel}{p.}{fmkstutorial_fastmks_ex3_fmkstut}
\item \doxyref{Fast\+M\+KS with an already-\/created tree}{p.}{fmkstutorial_fastmks_ex4_fmkstut}
\end{DoxyItemize}
\item \doxyref{Writing a custom kernel for Fast\+M\+KS}{p.}{fmkstutorial_writing_kernel_fmkstut}
\item \doxyref{Using other tree types for Fast\+M\+KS}{p.}{fmkstutorial_custom_tree_fmkstut}
\item \doxyref{Running Fast\+M\+KS on objects}{p.}{fmkstutorial_objects_fmkstut}
\item \doxyref{Further documentation}{p.}{fmkstutorial_further_doc_fmkstut}
\end{DoxyItemize}\section{Command-\/line Fast\+M\+K\+S (mlpack\+\_\+fastmks)}\label{fmkstutorial_cli_fmkstut}
{\bfseries mlpack} provides a command-\/line program, {\ttfamily mlpack\+\_\+fastmks}, which is used to perform Fast\+M\+KS on a given query and reference dataset. It supports numerous different types of kernels\+:


\begin{DoxyItemize}
\item \doxyref{linear kernel}{p.}{classmlpack_1_1kernel_1_1LinearKernel}
\item \doxyref{polynomial kernel}{p.}{classmlpack_1_1kernel_1_1PolynomialKernel}
\item \doxyref{cosine distance}{p.}{classmlpack_1_1kernel_1_1CosineDistance}
\item \doxyref{Gaussian kernel}{p.}{classmlpack_1_1kernel_1_1GaussianKernel}
\item \doxyref{Epanechnikov kernel}{p.}{classmlpack_1_1kernel_1_1EpanechnikovKernel}
\item \doxyref{triangular kernel}{p.}{classmlpack_1_1kernel_1_1TriangularKernel}
\item \doxyref{hyperbolic tangent kernel}{p.}{classmlpack_1_1kernel_1_1HyperbolicTangentKernel}
\end{DoxyItemize}

Note that when a shift-\/invariant kernel is used, the results will be the same as nearest neighbor search, so \doxyref{K\+NN}{p.}{nstutorial} may be a better option. A shift-\/invariant kernel is a kernel that depends only on the distance between the two input points. The \doxyref{Gaussian kernel}{p.}{classmlpack_1_1kernel_1_1GaussianKernel}, \doxyref{Epanechnikov kernel}{p.}{classmlpack_1_1kernel_1_1EpanechnikovKernel}, and \doxyref{triangular kernel}{p.}{classmlpack_1_1kernel_1_1TriangularKernel} are instances of shift-\/invariant kernels. The paper contains more details on this situation. The {\ttfamily mlpack\+\_\+fastmks} executable still provides these kernels as options, though.

The following examples detail usage of the {\ttfamily mlpack\+\_\+fastmks} program. Note that you can get documentation on all the possible parameters by typing\+:


\begin{DoxyCode}
$ mlpack\_fastmks --help
\end{DoxyCode}
\subsection{Fast\+M\+K\+S with a linear kernel on one dataset}\label{fmkstutorial_cli_ex1_fmkstut}
If only one dataset is specified (with {\ttfamily -\/r} or {\ttfamily --reference\+\_\+file}), the reference dataset is taken to be both the query and reference datasets. The example below finds the 4 maximum kernels of each point in dataset.\+csv, using the default linear kernel.


\begin{DoxyCode}
$ mlpack\_fastmks -r dataset.csv -k 4 -v -p products.csv -i indices.csv
\end{DoxyCode}


When the operation completes, the values of the kernels are saved in products.\+csv and the indices of the points which give the maximum kernels are saved in indices.\+csv.


\begin{DoxyCode}
$ head indices.csv
762,910,863,890
762,910,426,568
910,762,863,426
762,910,863,426
863,910,614,762
762,863,910,614
762,910,488,568
762,910,863,426
910,762,863,426
863,762,910,614
\end{DoxyCode}



\begin{DoxyCode}
$ head products.csv
1.6221652894e+00,1.5998743443e+00,1.5898890769e+00,1.5406789753e+00
1.3387953449e+00,1.3317349486e+00,1.2966613184e+00,1.2774493620e+00
1.6386110476e+00,1.6332029753e+00,1.5952629124e+00,1.5887195330e+00
1.0917545803e+00,1.0820878726e+00,1.0668992636e+00,1.0419838050e+00
1.2272441028e+00,1.2169643942e+00,1.2104597963e+00,1.2067780154e+00
1.5720962456e+00,1.5618504956e+00,1.5609069923e+00,1.5235605095e+00
1.3655478674e+00,1.3548593212e+00,1.3311547298e+00,1.3250728881e+00
2.0119149744e+00,2.0043668067e+00,1.9847289214e+00,1.9298280046e+00
1.1586923205e+00,1.1494586097e+00,1.1274872962e+00,1.1248172766e+00
4.4789820372e-01,4.4618539778e-01,4.4200024852e-01,4.3989721792e-01
\end{DoxyCode}


We can see in this example that for point 0, the point with maximum kernel value is point 762, with a kernel value of 1.\+622165. For point 3, the point with third largest kernel value is point 863, with a kernel value of 1.\+0669.\subsection{Fast\+M\+K\+S on a reference and query dataset}\label{fmkstutorial_cli_ex2_fmkstut}
The query points may be different than the reference points. To specify a different query set, the {\ttfamily -\/q} (or {\ttfamily --query\+\_\+file}) option is used, as in the example below.


\begin{DoxyCode}
$ mlpack\_fastmks -q query\_set.csv -r reference\_set.csv -k 5 -i indices.csv \(\backslash\)
> -p products.csv
\end{DoxyCode}
\subsection{Fast\+M\+K\+S with a different kernel}\label{fmkstutorial_cli_ex3_fmkstut}
The {\ttfamily mlpack\+\_\+fastmks} program offers more than just the linear kernel. Valid options are {\ttfamily \textquotesingle{}linear\textquotesingle{}}, {\ttfamily \textquotesingle{}polynomial\textquotesingle{}}, {\ttfamily \textquotesingle{}cosine\textquotesingle{}}, {\ttfamily \textquotesingle{}gaussian\textquotesingle{}}, {\ttfamily \textquotesingle{}epanechnikov\textquotesingle{}}, {\ttfamily \textquotesingle{}triangular\textquotesingle{}} and {\ttfamily \textquotesingle{}hyptan\textquotesingle{}} (the hyperbolic tangent kernel). Note that the hyperbolic tangent kernel is provably not a Mercer kernel but is positive semidefinite on most datasets and is commonly used as a kernel. Note also that the Gaussian kernel and other shift-\/invariant kernels give the same results as nearest neighbor search (see \doxyref{Neighbor\+Search tutorial (k-\/nearest-\/neighbors)}{p.}{nstutorial}).

The kernel to use is specified with the {\ttfamily -\/K} (or {\ttfamily --kernel}) option. The example below uses the cosine similarity as a kernel.


\begin{DoxyCode}
$ mlpack\_fastmks -r dataset.csv -k 5 -K cosine -i indices.csv -p products.csv -v
\end{DoxyCode}
\subsection{Using single-\/tree search or naive search}\label{fmkstutorial_cli_ex4_fmkstut}
In some cases, it may be useful to not use the dual-\/tree Fast\+M\+KS algorithm. Instead you can specify the {\ttfamily --single} option, indicating that a tree should be built only on the reference set, and then the queries should be processed in a linear scan (instead of in a tree). Alternately, the {\ttfamily -\/N} (or {\ttfamily --naive}) option makes the program not build trees at all and instead use brute-\/force search to find the solutions.

The example below uses single-\/tree search on two datasets with the linear kernel.


\begin{DoxyCode}
$ mlpack\_fastmks -q query\_set.csv -r reference\_set.csv --single -k 5 \(\backslash\)
> -p products.csv -i indices.csv -K linear
\end{DoxyCode}


The example below uses naive search on one dataset.


\begin{DoxyCode}
$ mlpack\_fastmks -r reference\_set.csv -k 5 -N -p products.csv -i indices.csv
\end{DoxyCode}
\subsection{Parameters for alternate kernels}\label{fmkstutorial_cli_ex5_fmkstut}
Many of the alternate kernel choices have parameters which can be chosen; these are detailed in this section.


\begin{DoxyItemize}
\item {\bfseries {\ttfamily -\/w} }({\ttfamily --bandwidth})\+: this sets the bandwidth of the kernel, and is applicable to the {\ttfamily \textquotesingle{}gaussian\textquotesingle{}}, {\ttfamily \textquotesingle{}epanechnikov\textquotesingle{}}, and {\ttfamily \textquotesingle{}triangular\textquotesingle{}} kernels. This is the \char`\"{}spread\char`\"{} of the kernel.
\item {\bfseries {\ttfamily -\/d} }({\ttfamily --degree})\+: this sets the degree of the polynomial kernel (the power to which the result is raised). It is only applicable to the {\ttfamily \textquotesingle{}polynomial\textquotesingle{}} kernel.
\item {\bfseries {\ttfamily -\/o} }({\ttfamily --offset})\+: this sets the offset of the kernel, for the {\ttfamily \textquotesingle{}polynomial\textquotesingle{}} and {\ttfamily \textquotesingle{}hyptan\textquotesingle{}} kernel. See \doxyref{the polynomial kernel documentation}{p.}{classmlpack_1_1kernel_1_1PolynomialKernel} and \doxyref{the hyperbolic tangent kernel documentation}{p.}{classmlpack_1_1kernel_1_1HyperbolicTangentKernel} for more information.
\item {\bfseries {\ttfamily -\/s} }({\ttfamily --scale})\+: this sets the scale of the kernel, and is only applicable to the {\ttfamily \textquotesingle{}hyptan\textquotesingle{}} kernel. See \doxyref{the hyperbolic tangent kernel documentation}{p.}{classmlpack_1_1kernel_1_1HyperbolicTangentKernel} for more information.
\end{DoxyItemize}\subsection{Saving a Fast\+M\+K\+S model/tree}\label{fmkstutorial_cli_ex6_fmkstut}
The {\ttfamily mlpack\+\_\+fastmks} program also supports saving a model built on a reference dataset (this model includes the tree, the kernel, and the search parameters). The {\ttfamily --output\+\_\+model\+\_\+file} or {\ttfamily -\/M} option allows one to save these parameters to disk for later usage. An example is below\+:


\begin{DoxyCode}
$ mlpack\_fastmks -r reference\_set.csv -K cosine -M fastmks\_model.xml
\end{DoxyCode}


This example builds a tree on the dataset in {\ttfamily reference\+\_\+set.\+csv} using the cosine similarity kernel, and saves the resulting model to {\ttfamily fastmks\+\_\+model.\+xml}. This model may then be used in later calls to the {\ttfamily mlpack\+\_\+fastmks} program.\subsection{Loading a Fast\+M\+K\+S model for further searches}\label{fmkstutorial_cli_ex7_fmkstut}
Supposing that a Fast\+M\+KS model has been saved with the {\ttfamily --output\+\_\+model\+\_\+file} or {\ttfamily -\/M} parameter, that model can then be later loaded in subsequent calls to the {\ttfamily mlpack\+\_\+fastmks} program, using the {\ttfamily --input\+\_\+model\+\_\+file} or {\ttfamily -\/m} option. For instance, with a model saved in {\ttfamily fastmks\+\_\+model.\+xml} and a query set in {\ttfamily query\+\_\+set.\+csv}, we can find 3 max-\/kernel candidates, saving to {\ttfamily indices.\+csv} and {\ttfamily kernels.\+csv\+:} 


\begin{DoxyCode}
$ mlpack\_fastmks -m fastmks\_model.xml -k 3 -i indices.csv -p kernels.csv
\end{DoxyCode}


Loading a model as opposed to building a model is advantageous because the reference tree is already built. So, among other situations, this could be useful in the setting where many different query sets (or many different values of k) will be used.

Note that the kernel cannot be changed in a saved model without rebuilding the model entirely.\section{The \textquotesingle{}\+Fast\+M\+K\+S\textquotesingle{} class}\label{fmkstutorial_fastmks_fmkstut}
The {\ttfamily Fast\+M\+K\+S$<$$>$} class offers a simple A\+PI for use within C++ applications, and allows further flexibility in kernel choice and tree type choice. However, {\ttfamily Fast\+M\+K\+S$<$$>$} has no default template parameter for the kernel type -- that must be manually specified. Choices that {\bfseries mlpack} provides include\+:


\begin{DoxyItemize}
\item \doxyref{mlpack\+::kernel\+::\+Linear\+Kernel}{p.}{classmlpack_1_1kernel_1_1LinearKernel}
\item \doxyref{mlpack\+::kernel\+::\+Polynomial\+Kernel}{p.}{classmlpack_1_1kernel_1_1PolynomialKernel}
\item \doxyref{mlpack\+::kernel\+::\+Cosine\+Distance}{p.}{classmlpack_1_1kernel_1_1CosineDistance}
\item \doxyref{mlpack\+::kernel\+::\+Gaussian\+Kernel}{p.}{classmlpack_1_1kernel_1_1GaussianKernel}
\item \doxyref{mlpack\+::kernel\+::\+Epanechnikov\+Kernel}{p.}{classmlpack_1_1kernel_1_1EpanechnikovKernel}
\item \doxyref{mlpack\+::kernel\+::\+Triangular\+Kernel}{p.}{classmlpack_1_1kernel_1_1TriangularKernel}
\item \doxyref{mlpack\+::kernel\+::\+Hyperbolic\+Tangent\+Kernel}{p.}{classmlpack_1_1kernel_1_1HyperbolicTangentKernel}
\item \doxyref{mlpack\+::kernel\+::\+Laplacian\+Kernel}{p.}{classmlpack_1_1kernel_1_1LaplacianKernel}
\item \doxyref{mlpack\+::kernel\+::\+P\+Spectrum\+String\+Kernel}{p.}{classmlpack_1_1kernel_1_1PSpectrumStringKernel}
\end{DoxyItemize}

The following examples use kernels from that list. Writing your own kernel is detailed in \doxyref{the next section}{p.}{fmkstutorial_writing_kernel_fmkstut}. Remember that when you are using the C++ interface, the data matrices must be column-\/major. See \doxyref{Matrices in mlpack}{p.}{matrices} for more information.\subsection{Fast\+M\+K\+S on one dataset}\label{fmkstutorial_fastmks_ex1_fmkstut}
Given only a reference dataset, the following code will run Fast\+M\+KS with k set to 5.


\begin{DoxyCode}
\textcolor{preprocessor}{#include <mlpack/methods/fastmks/fastmks.hpp>}
\textcolor{preprocessor}{#include <mlpack/core/kernels/linear_kernel.hpp>}

\textcolor{keyword}{using namespace }mlpack::fastmks;

\textcolor{comment}{// The reference dataset, which is column-major.}
\textcolor{keyword}{extern} arma::mat data;

\textcolor{comment}{// This will initialize the FastMKS object with the linear kernel with default}
\textcolor{comment}{// options: K(x, y) = x^T y.  The tree is built in the constructor.}
FastMKS<LinearKernel> f(data);

\textcolor{comment}{// The results will be stored in these matrices.}
arma::Mat<size\_t> indices;
arma::mat products;

\textcolor{comment}{// Run FastMKS.}
f.Search(5, indices, products);
\end{DoxyCode}
\subsection{Fast\+M\+K\+S with a query and reference dataset}\label{fmkstutorial_fastmks_ex2_fmkstut}
In this setting we have both a query and reference dataset. We search for 10 maximum kernels.


\begin{DoxyCode}
\textcolor{preprocessor}{#include <mlpack/methods/fastmks/fastmks.hpp>}
\textcolor{preprocessor}{#include <mlpack/core/kernels/triangular_kernel.hpp>}

\textcolor{keyword}{using namespace }mlpack::fastmks;
\textcolor{keyword}{using namespace }mlpack::kernel;

\textcolor{comment}{// The reference and query datasets, which are column-major.}
\textcolor{keyword}{extern} arma::mat referenceData;
\textcolor{keyword}{extern} arma::mat queryData;

\textcolor{comment}{// This will initialize the FastMKS object with the triangular kernel with}
\textcolor{comment}{// default options (bandwidth of 1).  The reference tree is built in the}
\textcolor{comment}{// constructor.}
FastMKS<TriangularKernel> f(referenceData);

\textcolor{comment}{// The results will be stored in these matrices.}
arma::Mat<size\_t> indices;
arma::mat products;

\textcolor{comment}{// Run FastMKS.  The query tree is built during the call to Search().}
f.Search(queryData, 10, indices, products);
\end{DoxyCode}
\subsection{Fast\+M\+K\+S with an initialized kernel}\label{fmkstutorial_fastmks_ex3_fmkstut}
Often, kernels have parameters which need to be specified. {\ttfamily Fast\+M\+K\+S$<$$>$} has constructors which take initialized kernels. Note that temporary kernels cannot be passed as an argument. The example below initializes a {\ttfamily Polynomial\+Kernel} object and then runs Fast\+M\+KS with a query and reference dataset.


\begin{DoxyCode}
\textcolor{preprocessor}{#include <mlpack/methods/fastmks/fastmks.hpp>}
\textcolor{preprocessor}{#include <mlpack/core/kernels/polynomial_kernel.hpp>}

\textcolor{keyword}{using namespace }mlpack::fastmks;
\textcolor{keyword}{using namespace }mlpack::kernel;

\textcolor{comment}{// The reference and query datasets, which are column-major.}
\textcolor{keyword}{extern} arma::mat referenceData;
\textcolor{keyword}{extern} arma::mat queryData;

\textcolor{comment}{// Initialize the polynomial kernel with degree of 3 and offset of 2.5.}
PolynomialKernel pk(3.0, 2.5);

\textcolor{comment}{// Create the FastMKS object with the initialized kernel.}
FastMKS<PolynomialKernel> f(referenceData, pk);

\textcolor{comment}{// The results will be stored in these matrices.}
arma::Mat<size\_t> indices;
arma::mat products;

\textcolor{comment}{// Run FastMKS.}
f.Search(queryData, 10, indices, products);
\end{DoxyCode}


The syntax for running Fast\+M\+KS with one dataset and an initialized kernel is very similar\+:


\begin{DoxyCode}
f.Search(10, indices, products);
\end{DoxyCode}
\subsection{Fast\+M\+K\+S with an already-\/created tree}\label{fmkstutorial_fastmks_ex4_fmkstut}
By default, {\ttfamily Fast\+M\+K\+S$<$$>$} uses the cover tree datastructure (see \doxyref{mlpack\+::tree\+::\+Cover\+Tree}{p.}{classmlpack_1_1tree_1_1CoverTree}). Sometimes, it is useful to modify the parameters of the cover tree. In this scenario, a tree must be built outside of the constructor, and then passed to the appropriate {\ttfamily Fast\+M\+K\+S$<$$>$} constructor. An example on just a reference dataset is shown below, where the base of the cover tree is modified.

We also use an instantiated kernel, but because we are building our own tree, we must use \doxyref{I\+P\+Metric}{p.}{classmlpack_1_1metric_1_1IPMetric} so that our tree is built on the metric induced by our kernel function.


\begin{DoxyCode}
\textcolor{preprocessor}{#include <mlpack/methods/fastmks/fastmks.hpp>}
\textcolor{preprocessor}{#include <mlpack/core/kernels/polynomial_kernel.hpp>}

\textcolor{comment}{// The reference dataset, which is column-major.}
\textcolor{keyword}{extern} arma::mat data;

\textcolor{comment}{// Initialize the polynomial kernel with a degree of 4 and offset of 2.0.}
PolynomialKernel pk(4.0, 2.0);

\textcolor{comment}{// Create the metric induced by this kernel (because a kernel is not a metric}
\textcolor{comment}{// and we can't build a tree on a kernel alone).}
IPMetric<PolynomialKernel> metric(pk);

\textcolor{comment}{// Now build a tree on the reference dataset using the instantiated metric and}
\textcolor{comment}{// the custom base of 1.5 (default is 1.3).  We have to be sure to use the right}
\textcolor{comment}{// type here -- FastMKS needs the FastMKSStat object as the tree's}
\textcolor{comment}{// StatisticType.}
\textcolor{keyword}{typedef} tree::CoverTree<IPMetric<PolynomialKernel>, tree::FirstPointIsRoot,
    FastMKSStat> TreeType; \textcolor{comment}{// Convenience typedef.}
TreeType* tree = \textcolor{keyword}{new} TreeType(data, metric, 1.5);

\textcolor{comment}{// Now initialize FastMKS with that statistic.  We don't need to specify the}
\textcolor{comment}{// TreeType template parameter since we are still using the default.  We don't}
\textcolor{comment}{// need to pass the kernel because that is contained in the tree.}
FastMKS<PolynomialKernel> f(tree);

\textcolor{comment}{// The results will be stored in these matrices.}
arma::Mat<size\_t> indices;
arma::mat products;

\textcolor{comment}{// Run FastMKS.}
f.Search(10, indices, products);
\end{DoxyCode}


The syntax is similar for the case where different query and reference datasets are given; but trees for both need to be built in the manner specified above. Be sure to build both trees using the same metric (or at least a metric with the exact same parameters).


\begin{DoxyCode}
f.Search(queryTree, 10, indices, products);
\end{DoxyCode}
\section{Writing a custom kernel for Fast\+M\+KS}\label{fmkstutorial_writing_kernel_fmkstut}
While {\bfseries mlpack} provides some number of kernels in the \doxyref{mlpack\+::kernel}{p.}{namespacemlpack_1_1kernel} namespace, it is easy to create a custom kernel. To satisfy the Kernel\+Type policy, a class must implement the following methods\+:


\begin{DoxyCode}
\textcolor{comment}{// Empty constructor is required.}
KernelType();

\textcolor{comment}{// Evaluate the kernel between two points.}
\textcolor{keyword}{template}<\textcolor{keyword}{typename} VecType>
\textcolor{keywordtype}{double} Evaluate(\textcolor{keyword}{const} VecType& a, \textcolor{keyword}{const} VecType& b);
\end{DoxyCode}


The template parameter {\ttfamily Vec\+Type} is helpful (but not necessary) so that the kernel can be used with both sparse and dense matrices ({\ttfamily arma\+::sp\+\_\+mat} and {\ttfamily arma\+::mat}).\section{Using other tree types for Fast\+M\+KS}\label{fmkstutorial_custom_tree_fmkstut}
The use of the cover tree (see \doxyref{Cover\+Tree}{p.}{classmlpack_1_1tree_1_1CoverTree}) is not necessary for Fast\+M\+KS, although it is the default tree type. A different type of tree can be specified with the Tree\+Type template parameter. However, the tree type is required to have \doxyref{Fast\+M\+K\+S\+Stat}{p.}{classmlpack_1_1fastmks_1_1FastMKSStat} as the Statistic\+Type, and for Fast\+M\+KS to work, the tree must be built only on kernel evaluations (or distance evaluations in the kernel space via \doxyref{I\+P\+Metric\+:\+:Evaluate()}{p.}{classmlpack_1_1metric_1_1IPMetric}).

Below is an example where a custom tree class, {\ttfamily Custom\+Tree}, is used as the tree type for Fast\+M\+KS. In this example Fast\+M\+KS is only run on one dataset.


\begin{DoxyCode}
\textcolor{preprocessor}{#include <mlpack/methods/fastmks/fastmks.hpp>}
\textcolor{preprocessor}{#include "custom\_tree.hpp"}

\textcolor{keyword}{using namespace }mlpack::fastmks;
\textcolor{keyword}{using namespace }mlpack::tree;

\textcolor{comment}{// The dataset that FastMKS will be run on.}
\textcolor{keyword}{extern} arma::mat data;

\textcolor{comment}{// The custom tree type.  We'll assume that the first template parameter is the}
\textcolor{comment}{// statistic type.}
\textcolor{keyword}{typedef} CustomTree<FastMKSStat> TreeType;

\textcolor{comment}{// The FastMKS constructor will create the tree.}
FastMKS<LinearKernel, arma::mat, TreeType> f(data);

\textcolor{comment}{// These will hold the results.}
arma::Mat<size\_t> indices;
arma::mat products;

\textcolor{comment}{// Run FastMKS.}
f.Search(5, indices, products);
\end{DoxyCode}
\section{Running Fast\+M\+K\+S on objects}\label{fmkstutorial_objects_fmkstut}
Fast\+M\+KS has a lot of utility on objects which are not representable in some sort of metric space. These objects might be strings, graphs, models, or other objects. For these types of objects, questions based on distance don\textquotesingle{}t really make sense. One good example is with strings. The question \char`\"{}how far is \textquotesingle{}dog\textquotesingle{}
from \textquotesingle{}\+Taki Inoue\textquotesingle{}?\char`\"{} simply doesn\textquotesingle{}t make sense. We can\textquotesingle{}t have a centroid of the terms \textquotesingle{}Fritz\textquotesingle{}, \textquotesingle{}E28\textquotesingle{}, and \textquotesingle{}popsicle\textquotesingle{}.

However, what we can do is define some sort of kernel on these objects. These kernels generally correspond to some similarity measure, with one example being the p-\/spectrum string kernel (see \doxyref{mlpack\+::kernel\+::\+P\+Spectrum\+String\+Kernel}{p.}{classmlpack_1_1kernel_1_1PSpectrumStringKernel}). Using that, we can say \char`\"{}how similar is \textquotesingle{}dog\textquotesingle{} to \textquotesingle{}\+Taki Inoue\textquotesingle{}?\char`\"{} and get an actual numerical result by evaluating K(\textquotesingle{}dog\textquotesingle{}, \textquotesingle{}Taki Inoue\textquotesingle{}) (where K is our p-\/spectrum string kernel).

The only requirement on these kernels is that they are positive definite kernels (or Mercer kernels). For more information on those details, refer to the Fast\+M\+KS paper.

Remember that Fast\+M\+KS is a tree-\/based method. But trees like the binary space tree require centroids -- and as we said earlier, centroids often don\textquotesingle{}t make sense with these types of objects. Therefore, we need a type of tree which is built {\bfseries exclusively} on points in the dataset -- those are points which we can evaluate our kernel function on. The cover tree is one example of a type of tree satisfying this condition; its construction will only call the kernel function on two points that are in the dataset.

But, we have one more problem. The {\ttfamily Cover\+Tree} class is built on {\ttfamily arma\+::mat} objects (dense matrices). Our objects, however, are not necessarily representable in a column of a matrix. To use the example we have been using, strings cannot be represented easily in a matrix because they may all have different lengths.

The way to work around this problem is to create a \char`\"{}fake\char`\"{} data matrix which simply holds indices to objects. A good example of how to do this is detailed in the documentation for the \doxyref{P\+Spectrum\+String\+Kernel}{p.}{classmlpack_1_1kernel_1_1PSpectrumStringKernel}.

In short, the trick is to make each data matrix one-\/dimensional and containing linear indices\+:


\begin{DoxyCode}
arma::mat data = \textcolor{stringliteral}{"0 1 2 3 4 5 6 7 8"};
\end{DoxyCode}


Then, when {\ttfamily Evaluate()} is called on the kernel function, the parameters will be two one-\/dimensional vectors that simply contain indices to objects. The example below details the process a little better\+:


\begin{DoxyCode}
\textcolor{comment}{// This function evaluates the kernel on two Objects (in this example, its}
\textcolor{comment}{// implementation is not important; the only important thing is that the}
\textcolor{comment}{// function exists).}
\textcolor{keywordtype}{double} ObjectKernel::Evaluate(\textcolor{keyword}{const} Object& a, \textcolor{keyword}{const} Object& b) \textcolor{keyword}{const};

\textcolor{keyword}{template}<\textcolor{keyword}{typename} VecType>
\textcolor{keywordtype}{double} ObjectKernel::Evaluate(\textcolor{keyword}{const} VecType& a, \textcolor{keyword}{const} VecType& b)\textcolor{keyword}{ const}
\textcolor{keyword}{}\{
  \textcolor{comment}{// Extract the indices from the vectors.}
  \textcolor{keyword}{const} \textcolor{keywordtype}{size\_t} indexA = size\_t(a[0]);
  \textcolor{keyword}{const} \textcolor{keywordtype}{size\_t} indexB = size\_t(b[0]);

  \textcolor{comment}{// Assume that 'objects' is an array (or std::vector or other container)}
  \textcolor{comment}{// holding Objects.}
  \textcolor{keyword}{const} Object& objectA = objects[indexA];
  \textcolor{keyword}{const} Object& objectB = objects[indexB];

  \textcolor{comment}{// Now call the function that does the actual evaluation on the objects and}
  \textcolor{comment}{// return its result.}
  \textcolor{keywordflow}{return} Evaluate(objectA, objectB);
\}
\end{DoxyCode}


As written earlier, the documentation for \doxyref{P\+Spectrum\+String\+Kernel}{p.}{classmlpack_1_1kernel_1_1PSpectrumStringKernel} is a good place to consult for further reference on this. That kernel uses two dimensional indices; one dimension represents the index of the string, and the other represents whether it is referring to the query set or the reference set. If your kernel is meant to work on separate query and reference sets, that strategy should be considered.\section{Further documentation}\label{fmkstutorial_further_doc_fmkstut}
For further documentation on the Fast\+M\+KS class, consult the \doxyref{complete A\+PI documentation}{p.}{classmlpack_1_1fastmks_1_1FastMKS}. 