Functions to load and save matrices and models. More...

Classes
class	CustomImputation
	A simple custom imputation class. More...

class	DatasetMapper
	Auxiliary information for a dataset, including mappings to/from strings (or other types) and the datatype of each dimension. More...

struct	HasSerialize

struct	HasSerializeFunction

class	ImageInfo

class	Imputer
	Given a dataset of a particular datatype, replace user-specified missing value with a variable dependent on the StrategyType and MapperType. More...

class	IncrementPolicy
	IncrementPolicy is used as a helper class for DatasetMapper. More...

class	ListwiseDeletion
	A complete-case analysis to remove the values containing mappedValue. More...

class	LoadCSV
	Load the csv file.This class use boost::spirit to implement the parser, please refer to following link http://theboostcpplibraries.com/boost.spirit for quick review. More...

class	MaxAbsScaler
	A simple MaxAbs Scaler class. More...

class	MeanImputation
	A simple mean imputation class. More...

class	MeanNormalization
	A simple Mean Normalization class. More...

class	MedianImputation
	This is a class implementation of simple median imputation. More...

class	MinMaxScaler
	A simple MinMax Scaler class. More...

class	MissingPolicy
	MissingPolicy is used as a helper class for DatasetMapper. More...

class	PCAWhitening
	A simple PCAWhitening class. More...

class	ScalingModel
	The model to save to disk. More...

class	StandardScaler
	A simple Standard Scaler class. More...

class	ZCAWhitening
	A simple ZCAWhitening class. More...

Typedefs
using	DatasetInfo = DatasetMapper< data::IncrementPolicy >

Enumerations
enum	Datatype : bool { numeric = 0, categorical = 1 }
	The Datatype enum specifies the types of data mlpack algorithms can use. More...

enum	format { autodetect , text , xml , binary }
	Define the formats we can read through boost::serialization. More...

Functions
template < typename T >
void	Binarize (const arma::Mat< T > &input, arma::Mat< T > &output, const double threshold)
	Given an input dataset and threshold, set values greater than threshold to 1 and values less than or equal to the threshold to 0. More...

template < typename T >
void	Binarize (const arma::Mat< T > &input, arma::Mat< T > &output, const double threshold, const size_t dimension)
	Given an input dataset and threshold, set values greater than threshold to 1 and values less than or equal to the threshold to 0. More...

template < typename eT >
void	ConfusionMatrix (const arma::Row< size_t > predictors, const arma::Row< size_t > responses, arma::Mat< eT > &output, const size_t numClasses)
	A confusion matrix is a summary of prediction results on a classification problem. More...

std::string	Extension (const std::string &filename)

	HAS_EXACT_METHOD_FORM (serialize, HasSerializeCheck)

template < typename T >
bool	IsNaNInf (T &val, const std::string &token)
	See if the token is a NaN or an Inf, and if so, set the value accordingly and return a boolean representing whether or not it is. More...

template < typename eT >
bool	Load (const std::string &filename, arma::Mat< eT > &matrix, const bool fatal=false, const bool transpose=true)
	Loads a matrix from file, guessing the filetype from the extension. More...

template < typename eT >
bool	Load (const std::string &filename, arma::Col< eT > &vec, const bool fatal=false)
	Don't document these with doxygen; these declarations aren't helpful to users. More...

template < typename eT >
bool	Load (const std::string &filename, arma::Row< eT > &rowvec, const bool fatal=false)
	Load a row vector from a file, guessing the filetype from the extension. More...

template < typename eT , typename PolicyType >
bool	Load (const std::string &filename, arma::Mat< eT > &matrix, DatasetMapper< PolicyType > &info, const bool fatal=false, const bool transpose=true)
	Loads a matrix from a file, guessing the filetype from the extension and mapping categorical features with a DatasetMapper object. More...

template < typename T >
bool	Load (const std::string &filename, const std::string &name, T &t, const bool fatal=false, format f=format::autodetect)
	Don't document these with doxygen; they aren't helpful for users to know about. More...

template < typename eT >
void	LoadARFF (const std::string &filename, arma::Mat< eT > &matrix)
	A utility function to load an ARFF dataset as numeric features (that is, as an Armadillo matrix without any modification). More...

template < typename eT , typename PolicyType >
void	LoadARFF (const std::string &filename, arma::Mat< eT > &matrix, DatasetMapper< PolicyType > &info)
	A utility function to load an ARFF dataset as numeric and categorical features, using the DatasetInfo structure for mapping. More...

template < typename eT , typename RowType >
void	NormalizeLabels (const RowType &labelsIn, arma::Row< size_t > &labels, arma::Col< eT > &mapping)
	Given a set of labels of a particular datatype, convert them to unsigned labels in the range [0, n) where n is the number of different labels. More...

template < typename eT , typename RowType >
void	OneHotEncoding (const RowType &labelsIn, arma::Mat< eT > &output)
	Given a set of labels of a particular datatype, convert them to binary vector. More...

template < typename eT >
void	RevertLabels (const arma::Row< size_t > &labels, const arma::Col< eT > &mapping, arma::Row< eT > &labelsOut)
	Given a set of labels that have been mapped to the range [0, n), map them back to the original labels given by the 'mapping' vector. More...

template < typename eT >
bool	Save (const std::string &filename, const arma::Mat< eT > &matrix, const bool fatal=false, bool transpose=true)
	Saves a matrix to file, guessing the filetype from the extension. More...

template < typename T >
bool	Save (const std::string &filename, const std::string &name, T &t, const bool fatal=false, format f=format::autodetect)
	Saves a model to file, guessing the filetype from the extension, or, optionally, saving the specified format. More...

template < typename T , typename U >
void	Split (const arma::Mat< T > &input, const arma::Row< U > &inputLabel, arma::Mat< T > &trainData, arma::Mat< T > &testData, arma::Row< U > &trainLabel, arma::Row< U > &testLabel, const double testRatio)
	Given an input dataset and labels, split into a training set and test set. More...

template < typename T >
void	Split (const arma::Mat< T > &input, arma::Mat< T > &trainData, arma::Mat< T > &testData, const double testRatio)
	Given an input dataset, split into a training set and test set. More...

template < typename T , typename U >
std::tuple< arma::Mat< T >, arma::Mat< T >, arma::Row< U >, arma::Row< U > >	Split (const arma::Mat< T > &input, const arma::Row< U > &inputLabel, const double testRatio)
	Given an input dataset and labels, split into a training set and test set. More...

template < typename T >
std::tuple< arma::Mat< T >, arma::Mat< T > >	Split (const arma::Mat< T > &input, const double testRatio)
	Given an input dataset, split into a training set and test set. More...

Detailed Description

Functions to load and save matrices and models.

Functions to load and save matrices.

Typedef Documentation

◆ DatasetInfo

typedef DatasetMapper< IncrementPolicy, std::string > DatasetInfo

Definition at line 196 of file dataset_mapper.hpp.

Enumeration Type Documentation

◆ Datatype

enum Datatype : bool

The Datatype enum specifies the types of data mlpack algorithms can use.

The vast majority of mlpack algorithms can only use numeric data (i.e. float/double/etc.), but some algorithms can use categorical data, specified via this Datatype enum and the DatasetMapper class.

Enumerator
numeric
categorical

Definition at line 24 of file datatype.hpp.

◆ format

enum format

Define the formats we can read through boost::serialization.

Enumerator
autodetect
text
xml
binary

Definition at line 20 of file format.hpp.

Function Documentation

◆ Binarize() [1/2]

void mlpack::data::Binarize	(	const arma::Mat< T > &	input,
		arma::Mat< T > &	output,
		const double	threshold
	)

Given an input dataset and threshold, set values greater than threshold to 1 and values less than or equal to the threshold to 0.

This overload applies the changes to all dimensions.

arma::Mat<double> input = loadData();
arma::Mat<double> output;
double threshold = 0.5;
// Binarize the whole Matrix. All positive values in will be set to 1 and
// the values less than or equal to 0.5 will become 0.
Binarize<double>(input, output, threshold);

Parameters

input	Input matrix to Binarize.
output	Matrix you want to save binarized data into.
threshold	Threshold can by any number.

Definition at line 41 of file binarize.hpp.

References omp_size_t.

◆ Binarize() [2/2]

void mlpack::data::Binarize	(	const arma::Mat< T > &	input,
		arma::Mat< T > &	output,
		const double	threshold,
		const size_t	dimension
	)

Given an input dataset and threshold, set values greater than threshold to 1 and values less than or equal to the threshold to 0.

This overload takes a dimension and applys the changes to the given dimension.

arma::Mat<double> input = loadData();
arma::Mat<double> output;
double threshold = 0.5;
size_t dimension = 0;
// Binarize the first dimension. All positive values in the first dimension
// will be set to 1 and the values less than or equal to 0 will become 0.
Binarize<double>(input, output, threshold, dimension);

Parameters

input	Input matrix to Binarize.
output	Matrix you want to save binarized data into.
threshold	Threshold can by any number.
dimension	Feature to apply the Binarize function.

Definition at line 77 of file binarize.hpp.

References omp_size_t.

◆ ConfusionMatrix()

void mlpack::data::ConfusionMatrix	(	const arma::Row< size_t >	predictors,
		const arma::Row< size_t >	responses,
		arma::Mat< eT > &	output,
		const size_t	numClasses
	)

A confusion matrix is a summary of prediction results on a classification problem.

The number of correct and incorrect predictions are summarized by count and broken down by each class. For example, for 2 classes, the function call will be

ConfusionMatrix(predictors, responses, output, 2)

In this case, the output matrix will be of size 2 * 2:

   1
  TP    FN
  FP    TN

The confusion matrix for two labels will look like what is shown above. In this confusion matrix, TP represents the number of true positives, FP represents the number of false positives, FN represents the number of false negatives, and TN represents the number of true negatives.

When generalizing to 2 or more classes, the row index of the confusion matrix represents the predicted classes and column index represents the actual class.

Parameters

predictors	Vector of data points.
responses	The measured data for each point.
output	Matrix which is represented as confusion matrix.
numClasses	Number of classes.

◆ Extension()

std::string mlpack::data::Extension ( const std::string & filename )

inline

Definition at line 21 of file extension.hpp.

◆ HAS_EXACT_METHOD_FORM()

mlpack::data::HAS_EXACT_METHOD_FORM	(	serialize	,
		HasSerializeCheck
	)

◆ IsNaNInf()

bool mlpack::data::IsNaNInf	(	T &	val,
		const std::string &	token
	)

inline

See if the token is a NaN or an Inf, and if so, set the value accordingly and return a boolean representing whether or not it is.

Definition at line 27 of file is_naninf.hpp.

◆ Load() [1/5]

bool mlpack::data::Load	(	const std::string &	filename,
		arma::Mat< eT > &	matrix,
		const bool	fatal = `false`,
		const bool	transpose = `true`
	)

Loads a matrix from file, guessing the filetype from the extension.

This will transpose the matrix at load time (unless the transpose parameter is set to false). If the filetype cannot be determined, an error will be given.

The supported types of files are the same as found in Armadillo:

CSV (csv_ascii), denoted by .csv, or optionally .txt
TSV (raw_ascii), denoted by .tsv, .csv, or .txt
ASCII (raw_ascii), denoted by .txt
Armadillo ASCII (arma_ascii), also denoted by .txt
PGM (pgm_binary), denoted by .pgm
PPM (ppm_binary), denoted by .ppm
Raw binary (raw_binary), denoted by .bin
Armadillo binary (arma_binary), denoted by .bin
HDF5, denoted by .hdf, .hdf5, .h5, or .he5

If the file extension is not one of those types, an error will be given. This is preferable to Armadillo's default behavior of loading an unknown filetype as raw_binary, which can have very confusing effects.

If the parameter 'fatal' is set to true, a std::runtime_error exception will be thrown if the matrix does not load successfully. The parameter 'transpose' controls whether or not the matrix is transposed after loading. In most cases, because data is generally stored in a row-major format and mlpack requires column-major matrices, this should be left at its default value of 'true'.

Parameters

filename	Name of file to load.
matrix	Matrix to load contents of file into.
fatal	If an error should be reported as fatal (default false).
transpose	If true, transpose the matrix after loading.

Returns: Boolean value indicating success or failure of load.

Referenced by mlpack::bindings::cli::GetParam().

◆ Load() [2/5]

bool mlpack::data::Load	(	const std::string &	filename,
		arma::Col< eT > &	vec,
		const bool	fatal = `false`
	)

Don't document these with doxygen; these declarations aren't helpful to users.

Load a column vector from a file, guessing the filetype from the extension.

The supported types of files are the same as found in Armadillo:

CSV (csv_ascii), denoted by .csv, or optionally .txt
TSV (raw_ascii), denoted by .tsv, .csv, or .txt
ASCII (raw_ascii), denoted by .txt
Armadillo ASCII (arma_ascii), also denoted by .txt
PGM (pgm_binary), denoted by .pgm
PPM (ppm_binary), denoted by .ppm
Raw binary (raw_binary), denoted by .bin
Armadillo binary (arma_binary), denoted by .bin
HDF5, denoted by .hdf, .hdf5, .h5, or .he5

If the file extension is not one of those types, an error will be given. This is preferable to Armadillo's default behavior of loading an unknown filetype as raw_binary, which can have very confusing effects.

If the parameter 'fatal' is set to true, a std::runtime_error exception will be thrown if the matrix does not load successfully.

Parameters

filename	Name of file to load.
colvec	Column vector to load contents of file into.
fatal	If an error should be reported as fatal (default false).

Returns: Boolean value indicating success or failure of load.

◆ Load() [3/5]

bool mlpack::data::Load	(	const std::string &	filename,
		arma::Row< eT > &	rowvec,
		const bool	fatal = `false`
	)

Load a row vector from a file, guessing the filetype from the extension.

The supported types of files are the same as found in Armadillo:

CSV (csv_ascii), denoted by .csv, or optionally .txt
TSV (raw_ascii), denoted by .tsv, .csv, or .txt
ASCII (raw_ascii), denoted by .txt
Armadillo ASCII (arma_ascii), also denoted by .txt
PGM (pgm_binary), denoted by .pgm
PPM (ppm_binary), denoted by .ppm
Raw binary (raw_binary), denoted by .bin
Armadillo binary (arma_binary), denoted by .bin
HDF5, denoted by .hdf, .hdf5, .h5, or .he5

If the file extension is not one of those types, an error will be given. This is preferable to Armadillo's default behavior of loading an unknown filetype as raw_binary, which can have very confusing effects.

If the parameter 'fatal' is set to true, a std::runtime_error exception will be thrown if the matrix does not load successfully.

Parameters

filename	Name of file to load.
colvec	Column vector to load contents of file into.
fatal	If an error should be reported as fatal (default false).

Returns: Boolean value indicating success or failure of load.

◆ Load() [4/5]

bool mlpack::data::Load	(	const std::string &	filename,
		arma::Mat< eT > &	matrix,
		DatasetMapper< PolicyType > &	info,
		const bool	fatal = `false`,
		const bool	transpose = `true`
	)

Loads a matrix from a file, guessing the filetype from the extension and mapping categorical features with a DatasetMapper object.

This will transpose the matrix (unless the transpose parameter is set to false). This particular overload of Load() can only load text-based formats, such as those given below:

CSV (csv_ascii), denoted by .csv, or optionally .txt
TSV (raw_ascii), denoted by .tsv, .csv, or .txt
ASCII (raw_ascii), denoted by .txt

If the file extension is not one of those types, an error will be given. This is preferable to Armadillo's default behavior of loading an unknown filetype as raw_binary, which can have very confusing effects.

If the parameter 'fatal' is set to true, a std::runtime_error exception will be thrown if the matrix does not load successfully. The parameter 'transpose' controls whether or not the matrix is transposed after loading. In most cases, because data is generally stored in a row-major format and mlpack requires column-major matrices, this should be left at its default value of 'true'.

The DatasetMapper object passed to this function will be re-created, so any mappings from previous loads will be lost.

Parameters

filename	Name of file to load.
matrix	Matrix to load contents of file into.
info	DatasetMapper object to populate with mappings and data types.
fatal	If an error should be reported as fatal (default false).
transpose	If true, transpose the matrix after loading.

Returns: Boolean value indicating success or failure of load.

◆ Load() [5/5]

bool mlpack::data::Load	(	const std::string &	filename,
		const std::string &	name,
		T &	t,
		const bool	fatal = `false`,
		format	f = `format::autodetect`
	)

Don't document these with doxygen; they aren't helpful for users to know about.

Load a model from a file, guessing the filetype from the extension, or, optionally, loading the specified format. If automatic extension detection is used and the filetype cannot be determined, an error will be given.

The supported types of files are the same as what is supported by the boost::serialization library:

text, denoted by .txt
xml, denoted by .xml
binary, denoted by .bin

The format parameter can take any of the values in the 'format' enum: 'format::autodetect', 'format::text', 'format::xml', and 'format::binary'. The autodetect functionality operates on the file extension (so, "file.txt" would be autodetected as text).

The name parameter should be specified to indicate the name of the structure to be loaded. This should be the same as the name that was used to save the structure (otherwise, the loading procedure will fail).

If the parameter 'fatal' is set to true, then an exception will be thrown in the event of load failure. Otherwise, the method will return false and the relevant error information will be printed to Log::Warn.

◆ LoadARFF() [1/2]

void mlpack::data::LoadARFF	(	const std::string &	filename,
		arma::Mat< eT > &	matrix
	)

A utility function to load an ARFF dataset as numeric features (that is, as an Armadillo matrix without any modification).

An exception will be thrown if any features are non-numeric.

◆ LoadARFF() [2/2]

void mlpack::data::LoadARFF	(	const std::string &	filename,
		arma::Mat< eT > &	matrix,
		DatasetMapper< PolicyType > &	info
	)

A utility function to load an ARFF dataset as numeric and categorical features, using the DatasetInfo structure for mapping.

An exception will be thrown upon failure.

A pre-existing DatasetInfo object can be passed in, but if the dimensionality of the given DatasetInfo object (info.Dimensionality()) does not match the dimensionality of the data, a std::invalid_argument exception will be thrown. If an empty DatasetInfo object is given (constructed with the default constructor or otherwise, so that info.Dimensionality() is 0), it will be set to the right dimensionality.

This ability to pass in pre-existing DatasetInfo objects is very necessary when, e.g., loading a test set after training. If the same DatasetInfo from loading the training set is not used, then the test set may be loaded with different mappings—which can cause horrible problems!

Parameters

filename	Name of ARFF file to load.
matrix	Matrix to load data into.
info	DatasetInfo object; can be default-constructed or pre-existing from another call to LoadARFF().

◆ NormalizeLabels()

void mlpack::data::NormalizeLabels	(	const RowType &	labelsIn,
		arma::Row< size_t > &	labels,
		arma::Col< eT > &	mapping
	)

Given a set of labels of a particular datatype, convert them to unsigned labels in the range [0, n) where n is the number of different labels.

Also, a reverse mapping from the new label to the old value is stored in the 'mapping' vector.

Parameters

labelsIn	Input labels of arbitrary datatype.
labels	Vector that unsigned labels will be stored in.
mapping	Reverse mapping to convert new labels back to old labels.

◆ OneHotEncoding()

void mlpack::data::OneHotEncoding	(	const RowType &	labelsIn,
		arma::Mat< eT > &	output
	)

Given a set of labels of a particular datatype, convert them to binary vector.

The categorical values be mapped to integer values. Then, each integer value is represented as a binary vector that is all zero values except the index of the integer, which is marked with a 1.

Parameters

labelsIn	Input labels of arbitrary datatype.
output	Binary matrix.

◆ RevertLabels()

void mlpack::data::RevertLabels	(	const arma::Row< size_t > &	labels,
		const arma::Col< eT > &	mapping,
		arma::Row< eT > &	labelsOut
	)

Given a set of labels that have been mapped to the range [0, n), map them back to the original labels given by the 'mapping' vector.

Parameters

labels	Set of normalized labels to convert.
mapping	Mapping to use to convert labels.
labelsOut	Vector to store new labels in.

◆ Save() [1/2]

bool mlpack::data::Save	(	const std::string &	filename,
		const arma::Mat< eT > &	matrix,
		const bool	fatal = `false`,
		bool	transpose = `true`
	)

Saves a matrix to file, guessing the filetype from the extension.

This will transpose the matrix at save time. If the filetype cannot be determined, an error will be given.

The supported types of files are the same as found in Armadillo:

CSV (csv_ascii), denoted by .csv, or optionally .txt
ASCII (raw_ascii), denoted by .txt
Armadillo ASCII (arma_ascii), also denoted by .txt
PGM (pgm_binary), denoted by .pgm
PPM (ppm_binary), denoted by .ppm
Raw binary (raw_binary), denoted by .bin
Armadillo binary (arma_binary), denoted by .bin
HDF5 (hdf5_binary), denoted by .hdf5, .hdf, .h5, or .he5

If the file extension is not one of those types, an error will be given. If the 'fatal' parameter is set to true, a std::runtime_error exception will be thrown upon failure. If the 'transpose' parameter is set to true, the matrix will be transposed before saving. Generally, because mlpack stores matrices in a column-major format and most datasets are stored on disk as row-major, this parameter should be left at its default value of 'true'.

Parameters

filename	Name of file to save to.
matrix	Matrix to save into file.
fatal	If an error should be reported as fatal (default false).
transpose	If true, transpose the matrix before saving.

Returns: Boolean value indicating success or failure of save.

◆ Save() [2/2]

bool mlpack::data::Save	(	const std::string &	filename,
		const std::string &	name,
		T &	t,
		const bool	fatal = `false`,
		format	f = `format::autodetect`
	)

Saves a model to file, guessing the filetype from the extension, or, optionally, saving the specified format.

If automatic extension detection is used and the filetype cannot be determined, and error will be given.

The supported types of files are the same as what is supported by the boost::serialization library:

text, denoted by .txt
xml, denoted by .xml
binary, denoted by .bin

The format parameter can take any of the values in the 'format' enum: 'format::autodetect', 'format::text', 'format::xml', and 'format::binary'. The autodetect functionality operates on the file extension (so, "file.txt" would be autodetected as text).

The name parameter should be specified to indicate the name of the structure to be saved. If Load() is later called on the generated file, the name used to load should be the same as the name used for this call to Save().

If the parameter 'fatal' is set to true, then an exception will be thrown in the event of a save failure. Otherwise, the method will return false and the relevant error information will be printed to Log::Warn.

◆ Split() [1/4]

void mlpack::data::Split	(	const arma::Mat< T > &	input,
		const arma::Row< U > &	inputLabel,
		arma::Mat< T > &	trainData,
		arma::Mat< T > &	testData,
		arma::Row< U > &	trainLabel,
		arma::Row< U > &	testLabel,
		const double	testRatio
	)

Given an input dataset and labels, split into a training set and test set.

Example usage below. This overload places the split dataset into the four output parameters given (trainData, testData, trainLabel, and testLabel).

arma::mat input = loadData();
arma::Row<size_t> label = loadLabel();
arma::mat trainData;
arma::mat testData;
arma::Row<size_t> trainLabel;
arma::Row<size_t> testLabel;
math::RandomSeed(100); // Set the seed if you like.
// Split the dataset into a training and test set, with 30% of the data being
// held out for the test set.
Split(input, label, trainData,
               testData, trainLabel, testLabel, 0.3);

Parameters

input	Input dataset to split.
label	Input labels to split.
trainData	Matrix to store training data into.
testData	Matrix to store test data into.
trainLabel	Vector to store training labels into.
testLabel	Vector to store test labels into.
testRatio	Percentage of dataset to use for test set (between 0 and 1).

Definition at line 49 of file split_data.hpp.

Referenced by Split().

◆ Split() [2/4]

void mlpack::data::Split	(	const arma::Mat< T > &	input,
		arma::Mat< T > &	trainData,
		arma::Mat< T > &	testData,
		const double	testRatio
	)

Given an input dataset, split into a training set and test set.

Example usage below. This overload places the split dataset into the two output parameters given (trainData, testData).

arma::mat input = loadData();
arma::mat trainData;
arma::mat testData;
math::RandomSeed(100); // Set the seed if you like.
// Split the dataset into a training and test set, with 30% of the data being
// held out for the test set.
Split(input, trainData, testData, 0.3);

Parameters

input	Input dataset to split.
trainData	Matrix to store training data into.
testData	Matrix to store test data into.
testRatio	Percentage of dataset to use for test set (between 0 and 1).

Definition at line 103 of file split_data.hpp.

◆ Split() [3/4]

std::tuple<arma::Mat<T>, arma::Mat<T>, arma::Row<U>, arma::Row<U> > mlpack::data::Split	(	const arma::Mat< T > &	input,
		const arma::Row< U > &	inputLabel,
		const double	testRatio
	)

Given an input dataset and labels, split into a training set and test set.

Example usage below. This overload returns the split dataset as a std::tuple with four elements: an arma::Mat<T> containing the training data, an arma::Mat<T> containing the test data, an arma::Row<U> containing the training labels, and an arma::Row<U> containing the test labels.

arma::mat input = loadData();
arma::Row<size_t> label = loadLabel();
auto splitResult = Split(input, label, 0.2);

Parameters

input	Input dataset to split.
label	Input labels to split.
testRatio	Percentage of dataset to use for test set (between 0 and 1).

Returns: std::tuple containing trainData (arma::Mat<T>), testData (arma::Mat<T>), trainLabel (arma::Row<U>), and testLabel (arma::Row<U>).

Definition at line 148 of file split_data.hpp.

References Split().

◆ Split() [4/4]

std::tuple<arma::Mat<T>, arma::Mat<T> > mlpack::data::Split	(	const arma::Mat< T > &	input,
		const double	testRatio
	)

Given an input dataset, split into a training set and test set.

Example usage below. This overload returns the split dataset as a std::tuple with two elements: an arma::Mat<T> containing the training data and an arma::Mat<T> containing the test data.

arma::mat input = loadData();

auto splitResult = Split(input, 0.2);

Parameters

input	Input dataset to split.
testRatio	Percentage of dataset to use for test set (between 0 and 1).

Returns: std::tuple containing trainData (arma::Mat<T>) and testData (arma::Mat<T>).

Definition at line 184 of file split_data.hpp.

References Split().

Classes

Typedefs

Enumerations

Functions

Detailed Description

Typedef Documentation

◆ DatasetInfo

Enumeration Type Documentation

◆ Datatype

◆ format

Function Documentation

◆ Binarize() [1/2]

◆ Binarize() [2/2]

◆ ConfusionMatrix()

◆ Extension()

◆ HAS_EXACT_METHOD_FORM()

◆ IsNaNInf()

◆ Load() [1/5]

◆ Load() [2/5]

◆ Load() [3/5]

◆ Load() [4/5]

◆ Load() [5/5]

◆ LoadARFF() [1/2]

◆ LoadARFF() [2/2]

◆ NormalizeLabels()

◆ OneHotEncoding()

◆ RevertLabels()

◆ Save() [1/2]

◆ Save() [2/2]

◆ Split() [1/4]

◆ Split() [2/4]

◆ Split() [3/4]

◆ Split() [4/4]