11 Predict
The Predict module allows training of cross-validated and optionally tuned regression and classification models.
11.1 Resampling
The “Outer Resampling” settings control the training-test splits. These control the arguments passed to the rtemis resample()
function.
- Resampling method
- Stratified subsampling
- Stratified bootstrap
- K-fold
- Bootstrap
- Leave-one-out
- Seed: for reproducibility, if you set a seed here, all train-test resamples will be the same between runs, e.g. this allows direct comparison of models trained with different algorithms
11.2 Algorithm
Available algorithms:
- GLMNET Elastic Net Regularization
- SVM Support Vector Machine
- CART Classification and Regression Trees
- RF Random Forest
- GBM Gradient Boosting
- XGBoost (a gradient boosting implementation)
Algorithm-specific options appear once an Algorithm has been selected. Tooltips explain each hyperparameter.
11.3 Hyperparameter tuning
The Predict module uses the rtemis elevate()
function to perform automatic nested resampling, which means:
- Splitting full sample into multiple training & testing subsets
- Splitting each training sample into training & validation subsets to perform hyperparameter tuning (model selection)
A musical note in front of an input box means the hyperparameter is tunable. Automatic hyperparameter tuning will be performed if more than one value is entered.
For example, if you have selected Gradient Boosting as the learning algorithm, you can input “2, 3” in “Max depth”. Internal 5-fold cross-validation of each training set will be performed, the best overall performing combination of hyperparameters will be chosen, and a model will be retrained on the full training set using the best hyperparameter combination.