5  Filtering Tabular Data

Active Learning Demo

Author

E.D. Gennatas

Modified

July 1, 2024

5.1 Introduction

Filtering a dataset is the process of selecting a subset of cases, i.e. rows.

5.2 Comprehension check

To β€œfilter” a dataset means selecting a subset of its

Run the following code and, based on the output, answer the question below.

How many rows does mtcars_6 have?

5.3 Example

For example, to filter the iris dataset to only include rows where the Species column is equal to setosa, we can use the following code:

5.4 Practice

Your turn: Complete the following code to filter the iris dataset so that it only includes rows where Sepal.Length is greater than 7.5.

Now, run the following block to check your answer:

Solution:

iris_f <- iris[iris$Sepal.Length > 7.5, ]

In base R, you can filter any tabular dataset (e.g. data.frame or matrix) using regular indexing. The syntax is data[condition, ], where condition is a logical vector that specifies which rows to keep. In this case, we filtered the iris dataset to only include rows where the Sepal.Length column is greater than 7.5.

For more information, see ?Extract.