This page describes how you can quickly get started using mlpack from the command-line and gives a few examples of usage, and pointers to deeper documentation.
This quickstart guide is also available for Python.
Installing the mlpack is straightforward and can be done with your system's package manager.
For instance, for Ubuntu or Debian the command is simply
On Fedora or Red Hat:
If you use a different distribution, mlpack may be packaged under a different name. And if it is not packaged, you can use a Docker image from Dockerhub:
This Docker image has mlpack already built and installed.
If you prefer to build mlpack from scratch, see Building mlpack From Source.
As a really simple example of how to use mlpack from the command-line, let's do some simple classification on a subset of the standard machine learning covertype dataset. We'll first split the dataset into a training set and a testing set, then we'll train an mlpack random forest on the training data, and finally we'll print the accuracy of the random forest on the test dataset.
You can copy-paste this code directly into your shell to run it.
We can see by looking at the output that we achieve reasonably good accuracy on the test dataset (80%+). The file predictions.csv could also be used by other tools; for instance, we can easily calculate the number of points that were predicted incorrectly:
It's easy to modify the code above to do more complex things, or to use different mlpack learners, or to interface with other machine learning toolkits.
The example above has only shown a little bit of the functionality of mlpack. Lots of other commands are available with different functionality. Below is a list of all the mlpack functionality offered through the command-line, split into some categories.
mlpack_adaboost, mlpack_decision_stump, mlpack_decision_tree, mlpack_hmm_train, mlpack_hmm_generate, mlpack_hmm_loglik, mlpack_hmm_viterbi, mlpack_hoeffding_tree, mlpack_logistic_regression, mlpack_nbc, mlpack_perceptron, mlpack_random_forest, mlpack_softmax_regression, mlpack_cfmlpack_approx_kfn, mlpack_emst, mlpack_fastmks, mlpack_kfn, mlpack_knn, mlpack_krann, mlpack_lsh, mlpack_det, mlpack_range_searchmlpack_kmeans, mlpack_mean_shift, mlpack_gmm_train, mlpack_gmm_generate, mlpack_gmm_probability, mlpack_dbscanmlpack_pca, mlpack_radical, mlpack_local_coordinate_coding, mlpack_sparse_coding, mlpack_nca, mlpack_kernel_pcamlpack_linear_regression, mlpack_larsmlpack_preprocess_binarize, mlpack_preprocess_split, mlpack_preprocess_describe, mlpack_preprocess_imputer, mlpack_nmfFor more information on what mlpack does, see http://www.mlpack.org/about.html. Next, let's go through another example for providing movie recommendations with mlpack.
In this example, we'll train a collaborative filtering model using mlpack's mlpack_cf program. We'll train this on the MovieLens dataset from https://grouplens.org/datasets/movielens/, and then we'll use the model that we train to give recommendations.
You can copy-paste this code directly into the command line to run it.
Here is some example output, showing that user 1 seems to have good taste in movies:
Now that you have done some simple work with mlpack, you have seen how it can easily plug into a data science production workflow for the command line. A great thing to do next would be to look at more documentation for the mlpack command-line programs:
Also, mlpack is much more flexible from C++ and allows much greater functionality. So, more complicated tasks are possible if you are willing to write C++. To get started learning about mlpack in C++, the following resources might be helpful: