7  Supervised

import rtemis as rt
▄▄▄  ▄▄▄▄▄▄▄▄ .• ▌ ▄ ·. ▪  .▄▄ ·
▀▄ █·•██  ▀▄.▀··██ ▐███▪██ ▐█ ▀.
▐▀▀▄  ▐█.▪▐▀▀▪▄▐█ ▌▐▌▐█·▐█·▄▀▀▀█▄
▐█•█▌ ▐█▌·▐█▄▄▌██ ██▌▐█▌▐█▌▐█▄▪▐█
.▀  ▀ ▀▀▀  ▀▀▀ ▀▀  █▪▀▀▀▀▀▀ ▀▀▀▀ py
.:rtemispy v.0.2.0 🏝 macOS-13.4-arm64-arm-64bit

7.1 Read Data

Load the sonar data set from the UCI repository (downloaded locally):

dat = rt.read("~/Data/Sonar.csv")
06-20-23 17:47:04 ▶ Reading Sonar.csv... [read]
06-20-23 17:47:04 Got 208 rows & 61 columns [read]
06-20-23 17:47:04 Read in 0.0133 seconds [read]

7.2 Check Data

rt.check_data(dat)
DataFrame with 208 rows x 61 columns

Data types
  60 float columns.
  0 integer columns.
  1 character column.
  0 categorical columns.

Issues
  0 constant columns.
  0 duplicated rows.
  0 missing values total.

Recommendations
  Everything looks good.

There are 60 continuous features and 1 character. We want to convert the character variable to a categorical. We can either re-load the data using the argument string2cat=True or we can use the preprocess function with the same argument.

7.3 Preprocess Data

dat = rt.preprocess(dat, string2cat=True)
06-20-23 17:47:04 Converting string columns to categorical [preprocess]
rt.check_data(dat)
DataFrame with 208 rows x 61 columns

Data types
  60 float columns.
  0 integer columns.
  0 character columns.
  1 categorical column.

Issues
  0 constant columns.
  0 duplicated rows.
  0 missing values total.

Recommendations
  Everything looks good.

7.4 Resample Data

Create resample using resample():

res = rt.resample(dat, seed=2023)
06-20-23 17:47:04 Created 10 stratified subsamples [resample]

Spli train and testing data using split_train_test():

dat_train, dat_test = rt.split_train_test(dat, res[0])

7.5 Gradient Boosting with LightGBM

sonar_lgbm = rt.s_LightGBM(dat_train, dat_test)
06-20-23 17:47:04 Welcome, egenn 🌉 [s_LightGBM]
Input data summary:
│  Training: 155 x 61
└─  Testing: 53 x 61
Outcome: Class
06-20-23 17:47:04 Tuning LightGBM by grid search... [gridsearch]
06-20-23 17:47:04 Created 5 bootstraps [resample]
06-20-23 17:47:04 Grid search: Running 5 combinations [gridsearch]
06-20-23 17:47:05 Completed in 0.925 seconds [gridsearch]
06-20-23 17:47:05 Best LightGBM hyperparameters: [s_LightGBM]
{'max_nrounds': 231, 'num_leaves': 16, 'learning_rate': 0.01, 'lambda_l1': 0.0, 'lambda_l2': 0.0}
06-20-23 17:47:05 Training LightGBM with tuned hyperparameters [s_LightGBM]
[100]   training's binary_logloss: 0.39129
[200]   training's binary_logloss: 0.247466
Classification was performed using LightGBM.
│  Training Balanced Accuracy was 0.99.
└─  Testing Balanced Accuracy was 0.85.
06-20-23 17:47:05 Training complete. in 1.11 seconds [s_LightGBM]