Graphics
Visualization is central to all data analysis, whether in (bio)statistics, epidemiology, machine learning, data science, etc.
It is hard to overstate the importance of good visualization.
Some of the goals of data visualization include:
- Data quality check: identify miscoding, outliers, etc.
- Check data transformations, preprocessing in general
- Exploratory data analysis
- Plot data used for hypothesis testing
- Visualization of clustering, decomposition results in unsupervised learning
- Performance plots in supervised learning
- Communication of study results
R has powerful graphical capabilities built in the core language. It contains two largely separate graphics systems:
-
base graphics in the
graphics
package, inherited from the S language -
grid graphics in the
grid
package: a “rewrite of the graphics layout capabilities”
There is limited support for interaction between the two. In practice, for a given application, choose one or the other. Note that there are no high-level (user-friendly) functions for the grid graphics system built into the base R distribution, but a few very popular packages have been built on top of it, the most commonly used one today being ggplot2.
Both base and grid graphics systems can produce beautiful, layered, high-quality graphics. It is possible to build custom functions (See 21 Writing Functions) using either system to produce most, if not all, types of plots.