36  Reference Semantics

This chapter introduces the concept of reference semantics, which is used by the data.table package.

When you create an object in R, it is stored at some location in memory. The address() function in the data.table package returns the memory address of an object.

library(data.table)
options(datatable.print.class = TRUE)

36.1 Add column to data.frame

Let’s create a simple data.frame, df1:

df1 <- data.frame(
  ID = c(8001, 8002),
  GCS = c(15, 13)
)

…and print its address:

address(df1)
[1] "0x10c3a6688"

Right now, we don’t care what the actual address is - but we want to keep track when it changes.

Let’s add a new column to df1:

df1$HR <- c(80, 90)

…and print its address:

address(df1)
[1] "0x142876758"

The address has changed, even though we’re still working on the “same” df1 object.

36.2 Add column to data.table

Let’s create a simple data.table, dt1:

dt1 <- data.table(
  ID = c(8001, 8002),
  GCS = c(15, 13)
)

…and print its address:

address(dt1)
[1] "0x120e0ac00"

Let’s add a new column to dt1 in-place:

dt1[, HR := c(80, 90)]

…and print its address:

address(dt1)
[1] "0x120e0ac00"

The address remains the same.

What if we had used the data.frame syntax (which still works on a data.table) instead?

dt1$HR <- c(80, 90)
address(dt1)
[1] "0x143228400"

The address indeed changes, just like with data.frames.

Important

Making copies of large objects can be time-consuming and memory-intensive. Up to this point, we have seen that making changes to data.table by reference, changes the object in-place and does not create a new copy.

36.3 Caution with reference semantics

So far so good, we start to understand one reason why data.table is efficient. One very important thing to keep in mind is that when you do want to make a copy of a data.table, e.g. to create a different version of it, you must use data.tables’s copy().

Let’s see why.

36.3.1 Copying a data.frame

Let’s remind ourselves of the contents and address of df1:

df1
    ID GCS HR
1 8001  15 80
2 8002  13 90
address(df1)
[1] "0x142876758"

To make a copy of df1, we can simply assign it to a new object:

df2 <- df1
df2
    ID GCS HR
1 8001  15 80
2 8002  13 90
address(df2)
[1] "0x142876758"

The address of df2 is the same as df1, which means they are pointing to the same object in memory.

As we’ve already seen, if we edit df2, its address will change:

df2[1, 3] <- 75
df2
    ID GCS HR
1 8001  15 75
2 8002  13 90
address(df2)
[1] "0x120bd4788"

The contents and address of df2 have changed, but df1 remains the same, as you might expect:

df1
    ID GCS HR
1 8001  15 80
2 8002  13 90
address(df1)
[1] "0x142876758"

36.3.2 Copying a data.table

Let’s remind ourselves of the contenets and address of dt1:

dt1
      ID   GCS    HR
   <num> <num> <num>
1:  8001    15    80
2:  8002    13    90
address(dt1)
[1] "0x143228400"

Let’s see what happens if we assign dt1 to a new object:

dt2 <- dt1
dt2
      ID   GCS    HR
   <num> <num> <num>
1:  8001    15    80
2:  8002    13    90
address(dt2)
[1] "0x143228400"

So far it’s the same as with data.frames.

Let’s see what happens if we edit dt2 by reference:

dt2[1, HR := 75]
dt2
      ID   GCS    HR
   <num> <num> <num>
1:  8001    15    75
2:  8002    13    90
address(dt2)
[1] "0x143228400"

and let’s recheck dt1:

dt1
      ID   GCS    HR
   <num> <num> <num>
1:  8001    15    75
2:  8002    13    90

dt1 has changed as well, because dt1 and dt2 are still pointing to the same object in memory!

This is crucial to remember to avoid errors and confusion.

Important

When you want to make a copy of a data.table, you must use the copy() function.

Let’s see what happens if we use copy():

dt3 <- copy(dt1)
dt3
      ID   GCS    HR
   <num> <num> <num>
1:  8001    15    75
2:  8002    13    90
address(dt1)
[1] "0x143228400"
address(dt3)
[1] "0x1432dc400"

dt3 and dt1 are pointing to different objects in memory, so editing one does not affect the other.

dt3[1, HR := 100]
dt3
      ID   GCS    HR
   <num> <num> <num>
1:  8001    15   100
2:  8002    13    90
dt1
      ID   GCS    HR
   <num> <num> <num>
1:  8001    15    75
2:  8002    13    90

36.4 Resources