`mplot3_x(iris$Sepal.Length, type = "density")`

# 4 Static Graphics

Visualization is a central part of any data analysis pipeline. It is hard to overemphasize the importance of visualizing your data. Ideally, you want to visualize data before and after any / most operations. Depending on the kind and amount of data you are working on, this can range from straightforward to quite challening. Here, we introduce some data visualization functions which are created using base **R** graphics. Some advantages of using base graphics are:

- They are easy to extend if you are familiar with base graphics / combine their output with that of other functions using base graphics
- They are very fast to draw. This becomes particularly important when monitoring learning algorithms live, or building shiny applications.

High-dimensional data can sometimes be indirectly visualized after dimensionality reduction.

## 4.1 Density and Histograms

`mplot3_x(iris$Sepal.Length, type = "hist")`

We can also directly plot grouped data by inputing a list. Note that partial matching allows us to just use `"d"`

for type:

```
set.seed(2019)
<- list(A = rnorm(500, mean = 0, sd = 1),
xl B = rnorm(200, mean = 3, sd = 1.5))
```

`mplot3_x(xl, "d")`

`mplot3_x(xl, "hist", hist.breaks = 24)`

`mplot3_x(split(iris$Sepal.Length, iris$Species), "d")`

`mplot3_x(iris)`

## 4.2 Scatter plots

Here we are going to look at the static `mplot3_xy()`

and `mplot3_xym()`

, and the interactive `dplot3_xy()`

.

Some synthetic data:

```
set.seed(2019)
<- rnorm(200)
x <- x^3 + rnorm(200, 3, 1.5) y
```

We create some synthetic data and plot using `mplot3_xy()`

. We can ask for any supervised learner to be used to fit the data. For linear relationships, that would be `glm`

, for non-linear fits there are many options, but `gam`

is a great one.

### 4.2.1 mplot3_xy

`mplot3_xy(x, y, fit = 'gam', se.fit = TRUE)`

`mplot3_xy()`

allows you to easily group data in a few different ways.

You pass x or y or both as a list of vectors:

```
set.seed(2019)
<- rnorm(200)
x <- x^2 + rnorm(200)
y1 <- -x^2 + 10 + rnorm(200)/4
y2 mplot3_xy(x, y = list(y1 = y1, y2 = y2), fit = 'gam')
```

Or you can use the `group`

argument, which will accept either a variable name, if `data`

is defined, or a factor vector:

```
<- rnorm(400)
x <- sample(400, 200)
id <- x[id]^2 + rnorm(200)
y1 <- x[-id]^3 + rnorm(200)
y2 <- rep(1, 400)
group -id] <- 2
group[<- rep(0, length(x))
y <- y1
y[id] -id] <- y2
y[<- data.frame(x, y, group)
dat mplot3_xy(x, y, data = dat, group = group, fit = "gam")
```

### 4.2.2 `mplot3_xym()`

This extension of `mplot3_xy()`

adds marginal density / histogram plots to a scatter plot:

```
set.seed(2019)
<- rnorm(200)
x <- x^3 + 12 + rnorm(200)
y mplot3_xym(x, y)
```

### 4.2.3 Fit custom functions

`mplot3_xy`

includes a **formula** argument as an alternative to **fit**. This allows the user to define the formula of the fitting function, if that is known. As an example, let’s look at power curves. Power curves can help us model a number of important relationships that occur in nature. Let’s see how we can plot these in **rtemis**.

#### 4.2.3.1 y = b * m ^ x

First, we create some synthetic data:

```
= 8102
set.seed <- rnorm(200)
x <- .8 * 2.7 ^ x
y.true <- y.true + .9 * rnorm(200) y
```

Let’s plot the data:

`mplot3_xy(x, y)`

Now, let’s add a fit line. There are two ways to add a fit line in `mplot3_xy`

:

- The
`fit`

argument, e.g.`fit = 'glm'`

- The
`formula`

argument, e.g.`formula = y ~ a * x + b`

In this case, a linear model (both `'lm'`

and `'glm'`

work) is not a good idea:

`mplot3_xy(x, y, fit = 'glm')`

A generalized additive model (GAM) is our best bet if we know nothing about the relationship between `x`

and `y`

. (`fit`

, is the third argument to `mplot3_xy`

, so we can skip naming it)

`mplot3_xy(x, y, 'gam')`

Even better, if we *do* know the type of relationship between `x`

and `y`

, we can provide a formula. This will be solved using the Nonlinear Least Squares learner (`s_NLS`

)

`mplot3_xy(x, y, formula = y ~ b * m ^ x)`

We can plot the true function along with the fit.

`<- s_NLS(x, y, formula = y ~ b * m ^ x)$fitted fitted `

```
01-07-24 00:23:27 Hello, egenn [s_NLS]
.:Regression Input Summary
Training features: 200 x 1
Training outcome: 200 x 1
Testing features: Not available
Testing outcome: Not available
01-07-24 00:23:27 Initializing all parameters as 0.1 [s_NLS]
01-07-24 00:23:27 Training NLS model... [s_NLS]
.:NLS Regression Training Summary
MSE = 0.68 (89.24%)
RMSE = 0.82 (67.19%)
MAE = 0.65 (54.06%)
r = 0.94 (p = 7.9e-98)
R sq = 0.89
01-07-24 00:23:27 Completed in 1.5e-04 minutes (Real: 0.01; User: 0.01; System: 1e-03) [s_NLS]
```

```
mplot3_xy(x, y = list(Observed = y, True = y.true, Fitted = fitted),
type = c('p', 'l', 'l'), marker.alpha = .85)
```

### 4.2.4 Scatterplot + Cluster

We already saw we can use any learner to draw a fit line in a scatter plot. You can similarly use any clutering algorithm to cluster the data and color them by cluster membership. Let’s use HOPACH (Van der Laan and Pollard 2003) to cluster the famous *iris* dataset. Learn more about [Clustering].

```
mplot3_xy(iris$Sepal.Length, iris$Petal.Length,
cluster = "hopach")
```

## 4.3 Heatmaps

```
<- rnormmat(20, 20, seed = 2018)
x <- cor(x) x.cor
```

`mplot3_heatmap(x.cor)`

Notice how `mplot3_heatmap`

’s colorbar defaults to 10 overlapping discs on either side of zero, representing a 10% change from one to the next.

Turn off hierarchical clustering and dendrogram:

`mplot3_heatmap(x.cor, Colv = NA, Rowv = NA)`

## 4.4 Barplots

```
mplot3_bar(VADeaths,
col = colorRampPalette(c("#82afd3", "#000f3a"))(nrow(VADeaths)),
group.names = rownames(VADeaths),
group.legend = TRUE)
```

## 4.5 Boxplots

Some synthetic data:

```
<- rnormmat(200, 4, return.df = TRUE, seed = 2019)
x colnames(x) <- c("mango", "banana", "tangerine", "sugar")
```

`mplot3_box(x)`

## 4.6 Mosaic Plots

Mosaic plots are a great way to visualize count data, e.g. from a contingency table.

Some synthetic data from R’s documentation:

```
<- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
party dimnames(party) <- list(gender = c("F", "M"),
party = c("Democrat","Independent", "Republican"))
```

`mplot3_mosaic(party)`

## 4.7 Decision Boundaries

The goal of a classifier is to establish a decision boundary in feature space separating the different outcome classes. While most feature spaces are high dimensional and cannot be directly visualized, it is can still be helpful to look at decision boundaries in low-dimensional problems. We can compare different algorithms or the effects of hyperparameter tuning for a given algorithm.

### 4.7.1 2D synthetic data

Let’s create some 2D synthetic data using the **mlbench** package, and plot them, coloring by group, using `mplot3_xy`

.

```
set.seed(2018)
<- mlbench::mlbench.2dnormals(200)
data2D <- data.frame(data2D$x, y = data2D$classes)
dat mplot3_xy(dat$X1, dat$X2, group = dat$y, marker.col = c("#18A3AC", "#F48024"))
```

### 4.7.2 Logistic Regression

`<- s_GLM(dat, verbose = FALSE, print.plot = FALSE) mod.glm `

`Warning in eval(family$initialize): non-integer #successes in a binomial glm!`

`mplot3_decision(mod.glm, dat)`

### 4.7.3 CART

```
<- s_CART(dat, verbose = FALSE, print.plot = FALSE)
mod.cart mplot3_decision(mod.cart, dat)
```

### 4.7.4 RF

```
<- s_Ranger(dat, verbose = FALSE, print.plot = FALSE)
mod.rf mplot3_decision(mod.rf, dat)
```

## 4.8 Multiplots with **mplot3**

**rtemis** provides a convenience function to plot multiple graphs together, `rtlayout`

. It’s based on the `graphics::layout`

function and integrates behind the scenes with all `mplot3`

functions. You specify number of rows and number of columns. Optional arguments allow you to arrange plots by row or by column and automatically create labels for each plot. As with most visualization functions in **rtemis**, there is an option to save to PDF. This means you can create a publication-quality multipanel plot in a few lines of code:

Start by defining n nrows and n columns, plot your plots using `mplot3`

functions, and close using `rtlayout()`

.

```
set.seed(2019)
<- runif(200, min = -20, max = 20)
x <- rnorm(200, mean = 0, sd = 4)
z <- .8 * x^2 + .6 * z^3 + rnorm(200)
y rtlayout(2, 2, byrow = TRUE, autolabel = TRUE)
mplot3_x(x, 'd')
mplot3_x(z, 'd')
mplot3_xy(x, y, fit = 'gam')
mplot3_xy(z, y, fit = 'gam')
```

`rtlayout()`