\section{Introduction}\label{emst_tutorial_intro_emsttut}
The Euclidean Minimum Spanning Tree problem is widely used in machine learning and data mining applications. Given a set $S$ of points in $\mathbf{R}^d$, our task is to compute lowest weight spanning tree in the complete graph on $S$ with edge weights given by the Euclidean distance between points.

Among other applications, the E\+M\+ST can be used to compute hierarchical clusterings of data. A {\itshape single-\/linkage clustering} can be obtained from the E\+M\+ST by deleting all edges longer than a given cluster length. This technique is also referred to as a {\itshape Friends-\/of-\/\+Friends} clustering in the astronomy literature.

mlpack includes an implementation of {\bfseries Dual-\/\+Tree Boruvka} which uses $kd$-\/trees by default; this is the empirically and theoretically fastest E\+M\+ST algorithm. In addition, the implementation supports the use of different trees via templates. For more details, see the following paper\+:


\begin{DoxyCode}
@inproceedings\{march2010fast,
  title=\{Fast \{E\}uclidean minimum spanning tree: algorithm, analysis, and
applications\},
  author=\{March, William B. and Ram, Parikshit and Gray, Alexander G.\},
  booktitle=\{Proceedings of the 16th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD \textcolor{stringliteral}{'10)\},}
\textcolor{stringliteral}{  pages=\{603--612\},}
\textcolor{stringliteral}{  year=\{2010\},}
\textcolor{stringliteral}{  organization=\{ACM\}}
\textcolor{stringliteral}{\}}
\end{DoxyCode}


{\bfseries mlpack} provides\+:


\begin{DoxyItemize}
\item a \doxyref{simple command-\/line executable}{p.}{emst_tutorial_cli_emsttut} to compute the E\+M\+ST of a given data set
\item a \doxyref{simple C++ interface}{p.}{emst_tutorial_dtb_emsttut} to compute the E\+M\+ST
\end{DoxyItemize}\section{Table of Contents}\label{emst_tutorial_toc_emsttut}
A list of all the sections this tutorial contains.


\begin{DoxyItemize}
\item \doxyref{Introduction}{p.}{emst_tutorial_intro_emsttut}
\item \doxyref{Table of Contents}{p.}{emst_tutorial_toc_emsttut}
\item \doxyref{Command-\/\+Line \textquotesingle{}E\+M\+ST\textquotesingle{}}{p.}{emst_tutorial_cli_emsttut}
\item \doxyref{The \textquotesingle{}Dual\+Tree\+Boruvka\textquotesingle{} class}{p.}{emst_tutorial_dtb_emsttut}
\item \doxyref{Further documentation}{p.}{emst_tutorial_further_doc_emsttut}
\end{DoxyItemize}\section{Command-\/\+Line \textquotesingle{}\+E\+M\+S\+T\textquotesingle{}}\label{emst_tutorial_cli_emsttut}
The {\ttfamily mlpack\+\_\+emst} executable in {\bfseries mlpack} will compute the E\+M\+ST of a given set of points and store the resulting edge list to a file.

The output file contains an edge list representation of the M\+ST in an $n-1 \times 3 $ matrix, where the first and second columns are labels of points and the third column is the edge weight. The edges are sorted in order of increasing weight.

Below are several examples of simple usage (and the resultant output). The {\ttfamily -\/v} option is used so that verbose output is given. Further documentation on each individual option can be found by typing


\begin{DoxyCode}
$ mlpack\_emst --help
\end{DoxyCode}



\begin{DoxyCode}
$ mlpack\_emst --input\_file=dataset.csv --output\_file=edge\_list.csv -v
[INFO ] Reading in data.
[INFO ] Loading \textcolor{stringliteral}{'dataset.csv'} as CSV data.
[INFO ] Data read, building tree.
[INFO ] Tree built, running algorithm.
[INFO ] 4 edges found so far.
[INFO ] 5 edges found so far.
[INFO ] Total spanning tree length: 1002.45
[INFO ] Saving CSV data to \textcolor{stringliteral}{'edge\_list.csv'}.
[INFO ]
[INFO ] Execution parameters:
[INFO ]   help: \textcolor{keyword}{false}
[INFO ]   info: \textcolor{stringliteral}{""}
[INFO ]   input\_file: dataset.csv
[INFO ]   leaf\_size: 1
[INFO ]   naive: \textcolor{keyword}{false}
[INFO ]   output\_file: edge\_list.csv
[INFO ]   verbose: \textcolor{keyword}{true}
[INFO ]
[INFO ] Program timers:
[INFO ]   emst/mst\_computation: 0.000179s
[INFO ]   emst/tree\_building: 0.000061s
[INFO ]   total\_time: 0.052641s
\end{DoxyCode}


The code performs at most $\log N$ iterations for $N$ data points. It will print an update on the number of M\+ST edges found after each iteration. Convenient program timers are given for different parts of the calculation at the bottom of the output, as well as the parameters the simulation was run with.


\begin{DoxyCode}
$ cat dataset.csv
0, 0
1, 1
3, 3
0.5, 0
1000, 0
1001, 0

$ cat edge\_list.csv
0.0000000000e+00,3.0000000000e+00,5.0000000000e-01
4.0000000000e+00,5.0000000000e+00,1.0000000000e+00
1.0000000000e+00,3.0000000000e+00,1.1180339887e+00
1.0000000000e+00,2.0000000000e+00,2.8284271247e+00
2.0000000000e+00,4.0000000000e+00,9.9700451353e+02
\end{DoxyCode}


The input points are labeled 0-\/5. The output tells us that the M\+ST connects point 0 to point 3, point 4 to point 5, point 1 to point 3, point 1 to point 2, and point 2 to point 4, with the corresponding edge weights given in the third column. The total length of the M\+ST is also given in the verbose output.

Note that it is also possible to compute the E\+M\+ST using a naive ( $O(N^2)$) algorithm for timing and comparison purposes, using the {\ttfamily --naive} option.\section{The \textquotesingle{}\+Dual\+Tree\+Boruvka\textquotesingle{} class}\label{emst_tutorial_dtb_emsttut}
The \textquotesingle{}Dual\+Tree\+Boruvka\textquotesingle{} class contains our implementation of the Dual-\/\+Tree Boruvka algorithm.

The class has two constructors\+: the first takes the data set, constructs the tree (where the type of tree constructed is the Tree\+Type template parameter), and computes the M\+ST. The second takes data set and an already constructed tree.

The class provides one method that performs the M\+ST computation\+: 
\begin{DoxyCode}
\textcolor{keywordtype}{void} ComputeMST(\textcolor{keyword}{const} arma::mat& results);
\end{DoxyCode}


This method stores the computed M\+ST in the matrix results in the format given above.\section{Further documentation}\label{emst_tutorial_further_doc_emsttut}
For further documentation on the Dual\+Tree\+Boruvka class, consult the \doxyref{complete A\+PI documentation}{p.}{classmlpack_1_1emst_1_1DualTreeBoruvka}. 