22  Function Scoping

Functions exist in their own environment. This means that variables defined within a function are not available outside of it. This is known as “lexical scoping”.

x <- 3
y <- 4
fn <- function(x, y) {
  x <- 10*x
  y <- 20*y
  cat("Inside the function, x = ", x, " and y = ", y, "\n")
}

fn(x, y)
Inside the function, x =  30  and y =  80 
cat("Outside the function, x = ", x, " and y = ", y, "\n")
Outside the function, x =  3  and y =  4 

However, if a variable is referenced within a function but no local definition exists, the interpreter will look for the variable at the parent environment. It is best to ensure all objects needed within a function are specified as arguments and passed appropriately when the function is called.

In the following example, x is only defined outside the function definition, but referenced within it.

x <- 21

itfn <- function(y, lr = 1) {
  x + lr * y
}

itfn(3)
[1] 24

22.1 Function vs. for-loop

Let’s z-score the built-in mtcars dataset once with a for loop and once with a custom function. This links back to the example seen earlier in the for loop section. In practice, this would be performed with the scale() command.

Within the for loop, we are assigning columns directly to the object initialized before the loop. In the following example, we use print(environment()) to print the environment outside and inside the loop function to show that it is the same. This is purely for demonstration:

# initialize new object 'mtcars_z'
mtcars_z <- mtcars
{
  cat("environment outside for loop is: ")
  print(environment())
}
environment outside for loop is: <environment: R_GlobalEnv>

Note: the curly brackets in the above code block are used to force Quarto to print both lines together. You don’t need to do this in a regular R script.

# z-score one column at a time in a for loop
for (i in seq_len(ncol(mtcars))) {
  mtcars_z[, i] <- (mtcars[, i] - mean(mtcars[, i])) / sd(mtcars[, i])
  cat("environment inside for loop also is: ")
  print(environment())
}
environment inside for loop also is: <environment: R_GlobalEnv>
environment inside for loop also is: <environment: R_GlobalEnv>
environment inside for loop also is: <environment: R_GlobalEnv>
environment inside for loop also is: <environment: R_GlobalEnv>
environment inside for loop also is: <environment: R_GlobalEnv>
environment inside for loop also is: <environment: R_GlobalEnv>
environment inside for loop also is: <environment: R_GlobalEnv>
environment inside for loop also is: <environment: R_GlobalEnv>
environment inside for loop also is: <environment: R_GlobalEnv>
environment inside for loop also is: <environment: R_GlobalEnv>
environment inside for loop also is: <environment: R_GlobalEnv>

In contrast, all operations remain local within a function and the output must be returned:

ztransform <- function(x) {
  cat("environment inside function body is: ")
  print(environment())
  z <- as.data.frame(sapply(mtcars, function(i) (i - mean(i))/sd(i)))
  rownames(z) <- rownames(x)
  z
}
mtcars_z2 <- ztransform(mtcars)
environment inside function body is: <environment: 0x107d0d808>
cat("environment outside function body is: ")
environment outside function body is: 
<environment: R_GlobalEnv>

Notice how the environment outside and inside the loop function is the same, it is the Global environment, but the environment within the function is different. That is why any objects created or changed within a function must be returned if we want to make them available.