Training the model
neonrvm_train
function requires a pair of training parameter structures, one
for just getting rid of pretty useless basis functions, and another one for
polishing the training results. In the first one you usually want to keep
majority of basis functions, while in the last one you want to reduce number of
basis functions as much as possible in order to achieve a more general and
sparse model.
By choosing a batch size value smaller than total input basis function count, the model will be trained incrementally using the first training parameter structure, and will get polished using the last training parameter structure in the end.
neonrvm has been carefully designed to handle multiple training scenarios. Different scenarios are discussed below:
A) Optimizing kernel parameters
During this period one needs to rapidly try different kernel parameters either using an optimization algorithm (Hyperopt to the rescue) or brute force method to achieve optimal kernel parameters. Make sure that the optimization algorithm can deal with possible training failures.
In this case users need to use small batch sizes, and training parameters with more relaxed convergence conditions. In other words, a highly polished and sparse model isn't required.
B) Finalized kernel parameters and model
When you are finished with tuning kernel parameters and trying different model creation ideas, you need access to the best basis functions and finely tuned weights associated to them in order to make accurate predictions.
You need to throw bigger training batch sizes at neonrvm, and use training parameters with low basis function percentage and high iteration count for the polishing step in this case.
🍔) Big data sets
Memory and storage requirements do quickly skyrocket when dealing with large data sets. You don't necessarily need to feed the whole design matrix to the neonrvm all at once. It can also be fed in smaller chunks by loading different design matrix parts from disk, or simply generating them on the fly.
neonrvm allows users to split the design matrix and perform the training process
incrementally at higher level through caching mechanism provided. You just need
to make multiple neonrvm_train
function calls and neonrvm will store the
useful basis functions in the given neonrvm_cache
on the go.
It is a good idea to group together similar data using clustering algorithms, and feed neonrvm with a mixture of them incrementally.
🚀 C/C++
Alright, now that we covered the different use cases, it's time to get familiar
with the neonrvm_train
function:
int neonrvm_train(neonrvm_cache* cache, neonrvm_param* param1, neonrvm_param* param2,
double* phi, size_t* index, size_t count, size_t batch_size_max)
➡️ Parameters
- ⬇️
[in] cache
: Stores intermediate variables and training results. - ⬇️
[in] param1
: Incremental training parameters. - ⬇️
[in] param2
: Final polish parameters. - ⬇️
[in] phi
: Column major design matrix, with row count equivalent to the total sample count, and column count equivalent to the given basis function count. A copy of useful basis functions will be kept inside the training cache. - ⬇️
[in] index
: Basis function indices, a vector with elements count equivalent to the given basis function count. A copy of useful basis function indices will be kept inside the training cache. Must be a vector of positive numbers, and shouldn't contain any value equal to theSIZE_MAX
, which is used internally to identify bias index. - ⬇️
[in] count
: Number of basis functions given. Must be a positive non-zero number. - ⬇️
[in] batch_size_max
: Maximum number of basis functions in every incremental training session. Must be a positive non-zero value.
⬅️ Returns
NEONRVM_SUCCESS
: After successful execution.NEONRVM_INVALID_Px
: When facing erroneous parameters.NEONRVM_LAPACK_ERROR
: When LAPACK fails to perform factorization or solve equations.NEONRVM_MATH_ERROR
: WhenNaN
or∞
numbers show up in the calculations.
🐍 Python
def train(cache: Cache, param1: Param, param2: Param,
phi: numpy.ndarray, index: numpy.ndarray, batch_size_max: int)
➡️ Parameters
- ⬇️
[in] cache
: Stores intermediate variables and training results. - ⬇️
[in] param1
: Incremental training parameters. - ⬇️
[in] param2
: Final polish parameters. - ⬇️
[in] phi
: Column major design matrix, with row count equivalent to the total sample count, and column count equivalent to the given basis function count. A copy of useful basis functions will be kept inside the training cache. - ⬇️
[in] index
: Basis function indices, a vector with elements count equivalent to the given basis function count. A copy of useful basis function indices will be kept inside the training cache. Must be a vector of positive numbers, and shouldn't contain any value equal to theSIZE_MAX
, which is used internally to identify bias index. - ⬇️
[in] batch_size_max
: Maximum number of basis functions in every incremental training session. Must be a positive non-zero value.
⬅️ Returns
- Nothing that I'm aware of.