library(rtemis)
2 rtemis in 60 seconds
2.1 Load rtemis
2.2 Regression
For regression, the outcome must be continuous
<- rnormmat(500, 50, seed = 2019)
x <- rnorm(50)
w <- x %*% w + rnorm(500)
y <- data.frame(x, y)
dat <- resample(dat) res
06-30-24 10:56:52 Input contains more than one columns; will stratify on last [resample]
.:Resampling Parameters
n.resamples: 10
resampler: strat.sub
stratify.var: y
train.p: 0.75
strat.n.bins: 4
06-30-24 10:56:52 Created 10 stratified subsamples [resample]
<- dat[res$Subsample_1, ]
dat.train <- dat[-res$Subsample_1, ] dat.test
2.2.1 Check Data
check_data(x)
x: A data.table with 500 rows and 50 columns
Data types
* 50 numeric features
* 0 integer features
* 0 factors
* 0 character features
* 0 date features
Issues
* 0 constant features
* 0 duplicate cases
* 0 missing values
Recommendations
* Everything looks good
2.2.2 Single Model
<- s_GLM(dat.train, dat.test) mod
06-30-24 10:56:52 Hello, egenn [s_GLM]
.:Regression Input Summary
Training features: 374 x 50
Training outcome: 374 x 1
Testing features: 126 x 50
Testing outcome: 126 x 1
06-30-24 10:56:52 Training GLM... [s_GLM]
.:GLM Regression Training Summary
MSE = 1.02 (97.81%)
RMSE = 1.01 (85.18%)
MAE = 0.81 (84.62%)
r = 0.99 (p = 1.3e-310)
R sq = 0.98
.:GLM Regression Testing Summary
MSE = 0.98 (97.85%)
RMSE = 0.99 (85.35%)
MAE = 0.76 (85.57%)
r = 0.99 (p = 2.7e-105)
R sq = 0.98
06-30-24 10:56:52 Completed in 5e-04 minutes (Real: 0.03; User: 0.02; System: 3e-03) [s_GLM]
2.2.3 Crossvalidated Model
<- train_cv(dat, mod = "glm") mod
06-30-24 10:56:52 Hello, egenn [train_cv]
.:Regression Input Summary
Training features: 500 x 50
Training outcome: 500 x 1
06-30-24 10:56:52 Training Ranger Random Forest on 10 stratified subsamples... [train_cv]
06-30-24 10:56:52 Outer resampling plan set to sequential [resLearn]
.:Cross-validated Ranger
Mean MSE of 10 stratified subsamples: 27.48
Mean MSE reduction: 44.11%
06-30-24 10:56:55 Completed in 0.04 minutes (Real: 2.59; User: 12.95; System: 0.32) [train_cv]
Use the describe
function to get a summary in (plain) English:
$describe() mod
Regression was performed using Ranger Random Forest. Model generalizability was assessed using 10 stratified subsamples. The mean R-squared across all testing set resamples was 0.44.
$plot() mod
2.3 Classification
For classification the outcome must be a factor. In the case of binary classification, the first level should be the “positive” class.
2.3.1 Check Data
data(Sonar, package = 'mlbench')
check_data(Sonar)
Sonar: A data.table with 208 rows and 61 columns
Data types
* 60 numeric features
* 0 integer features
* 1 factor, which is not ordered
* 0 character features
* 0 date features
Issues
* 0 constant features
* 0 duplicate cases
* 0 missing values
Recommendations
* Everything looks good
<- resample(Sonar) res
06-30-24 10:56:55 Input contains more than one columns; will stratify on last [resample]
.:Resampling Parameters
n.resamples: 10
resampler: strat.sub
stratify.var: y
train.p: 0.75
strat.n.bins: 4
06-30-24 10:56:55 Using max n bins possible = 2 [strat.sub]
06-30-24 10:56:55 Created 10 stratified subsamples [resample]
<- Sonar[res$Subsample_1, ]
sonar.train <- Sonar[-res$Subsample_1, ] sonar.test
2.3.2 Single model
<- s_Ranger(sonar.train, sonar.test) mod
06-30-24 10:56:55 Hello, egenn [s_Ranger]
06-30-24 10:56:55 Imbalanced classes: using Inverse Frequency Weighting [prepare_data]
.:Classification Input Summary
Training features: 155 x 60
Training outcome: 155 x 1
Testing features: 53 x 60
Testing outcome: 53 x 1
.:Parameters
n.trees: 1000
mtry: NULL
06-30-24 10:56:55 Training Random Forest (ranger) Classification with 1000 trees... [s_Ranger]
.:Ranger Classification Training Summary
Reference
Estimated M R
M 83 0
R 0 72
Overall
Sensitivity 1.0000
Specificity 1.0000
Balanced Accuracy 1.0000
PPV 1.0000
NPV 1.0000
F1 1.0000
Accuracy 1.0000
AUC 1.0000
Brier Score 0.0176
Positive Class: M
.:Ranger Classification Testing Summary
Reference
Estimated M R
M 25 11
R 3 14
Overall
Sensitivity 0.8929
Specificity 0.5600
Balanced Accuracy 0.7264
PPV 0.6944
NPV 0.8235
F1 0.7812
Accuracy 0.7358
AUC 0.8643
Brier Score 0.1652
Positive Class: M
06-30-24 10:56:55 Completed in 1.8e-03 minutes (Real: 0.11; User: 0.19; System: 0.02) [s_Ranger]
2.3.3 Crossvalidated Model
<- train_cv(Sonar) mod
06-30-24 10:56:55 Hello, egenn [train_cv]
.:Classification Input Summary
Training features: 208 x 60
Training outcome: 208 x 1
06-30-24 10:56:55 Training Ranger Random Forest on 10 stratified subsamples... [train_cv]
06-30-24 10:56:55 Outer resampling plan set to sequential [resLearn]
.:Cross-validated Ranger
Mean Balanced Accuracy of 10 stratified subsamples: 0.83
06-30-24 10:56:56 Completed in 0.01 minutes (Real: 0.78; User: 1.86; System: 0.30) [train_cv]
$describe() mod
Classification was performed using Ranger Random Forest. Model generalizability was assessed using 10 stratified subsamples. The mean Balanced Accuracy across all testing set resamples was 0.83.
$plot() mod
$plotROC() mod
$plotPR() mod