iris_f <- iris[iris$Sepal.Length > 7.5, ]
5 Filtering Tabular Data
Active Learning Demo
5.1 Introduction
Filtering a dataset is the process of selecting a subset of cases, i.e. rows.
5.2 Comprehension check
Run the following code and, based on the output, answer the question below.
How many rows does mtcars_6 have?
5.3 Example
For example, to filter the iris
dataset to only include rows where the Species
column is equal to setosa
, we can use the following code:
5.4 Practice
Your turn: Complete the following code to filter the iris
dataset so that it only includes rows where Sepal.Length
is greater than 7.5.
Now, run the following block to check your answer:
Solution:
In base R, you can filter any tabular dataset (e.g. data.frame
or matrix
) using regular indexing. The syntax is data[condition, ]
, where condition
is a logical vector that specifies which rows to keep. In this case, we filtered the iris
dataset to only include rows where the Sepal.Length
column is greater than 7.5.
For more information, see ?Extract
.