10 Conditional actions and loops

10.1 Conditional actions

Conditional actions in R can be determined through the usual if then else statements:

x <- 1; y <- 2
if(x>y){
    print("x is larger than y")
}else if(x<y){
    print("x is smaller than y")
}else{
    print("x is equal to y")
}

#> [1] "x is smaller than y"

Sometimes, it’s usefull to be able to do this in one line using ifelse(test, yes, no):

x <- 3:7
ifelse(x>5, "larger than 5", "lower than 5")

#> [1] "lower than 5"  "lower than 5"  "lower than 5"  "larger than 5"
#> [5] "larger than 5"

10.2 Loops…

Loops in R are provided through the usual for and while keywords:

# For loop
for(i in 1:100){
    # pass to next index directly
    if(i %in% c(3,8,5)) next 
    # break loop
    if(i==10) break
    print(i)
}

#> [1] 1
#> [1] 2
#> [1] 4
#> [1] 6
#> [1] 7
#> [1] 9

phrase <- c("hello", "world")
for(word in phrase){
    print(word)
}

#> [1] "hello"
#> [1] "world"

# While loop
i <- 1
while(i<8){
    print(i)
    i <- i+2
}

#> [1] 1
#> [1] 3
#> [1] 5
#> [1] 7

10.3 … and how to avoid them

However, since R is a vectorized language, it means that loops are to be avoided when possible because they are very inefficient:

forloop <- function(x){
    for(i in seq_along(x)){
        x[i] <- 2*x[i]
    }
    x
}
noforloop <- function(x){
    2*x
}
x <- runif(1e7)
microbenchmark::microbenchmark(
    forloop   = forloop(x),
    noforloop = noforloop(x),
    times = 10L
)

#> Unit: milliseconds
#>       expr        min         lq       mean     median         uq      max
#>    forloop 233.602748 236.442080 237.930712 237.776076 239.920315 241.8123
#>  noforloop   1.488997   5.497936   7.536677   6.752413   7.624114  21.0469
#>  neval cld
#>     10  a 
#>     10   b

10.3.1 The `apply` family

Avoiding loops should therefore be sought for when possible. R helps us in this way through the base functions apply(), sapply() and lapply(). Operations in the tidyverse are also a very good way of avoiding loops.

Take a look at the help on these functions, but the summary is that apply(df, direction, function) applies a function in the wanted direction (1 for rows, 2 for columns) of the given data.frame (or vector). Example:

library(tibble)
dt <- tibble(x=1:5, y=x^2, z=x^3);dt

#> # A tibble: 5 × 3
#>       x     y     z
#>   <int> <dbl> <dbl>
#> 1     1     1     1
#> 2     2     4     8
#> 3     3     9    27
#> 4     4    16    64
#> 5     5    25   125

apply(dt, 1, mean) # mean of the rows

#> [1]  1.000000  4.666667 13.000000 28.000000 51.666667

apply(dt, 2, mean) # mean of the columns

#>  x  y  z 
#>  3 11 45

# row/column means
rowMeans(dt)

#> [1]  1.000000  4.666667 13.000000 28.000000 51.666667

colMeans(dt)

#>  x  y  z 
#>  3 11 45

lapply() (and equivalently, sapply()) is basically the same thing but applied to lists and it returns a list (a vector):

my_list <- list(dt/3, dt/5);my_list

#> [[1]]
#>           x         y          z
#> 1 0.3333333 0.3333333  0.3333333
#> 2 0.6666667 1.3333333  2.6666667
#> 3 1.0000000 3.0000000  9.0000000
#> 4 1.3333333 5.3333333 21.3333333
#> 5 1.6666667 8.3333333 41.6666667
#> 
#> [[2]]
#>     x   y    z
#> 1 0.2 0.2  0.2
#> 2 0.4 0.8  1.6
#> 3 0.6 1.8  5.4
#> 4 0.8 3.2 12.8
#> 5 1.0 5.0 25.0

lapply(my_list, "[", 1, )  # print first row

#> [[1]]
#>           x         y         z
#> 1 0.3333333 0.3333333 0.3333333
#> 
#> [[2]]
#>     x   y   z
#> 1 0.2 0.2 0.2

sapply(my_list, rowSums)   # sum on rows

#>           [,1] [,2]
#> [1,]  1.000000  0.6
#> [2,]  4.666667  2.8
#> [3,] 13.000000  7.8
#> [4,] 28.000000 16.8
#> [5,] 51.666667 31.0

lapply(my_list, round, 1)  # round to first decimal

#> [[1]]
#>     x   y    z
#> 1 0.3 0.3  0.3
#> 2 0.7 1.3  2.7
#> 3 1.0 3.0  9.0
#> 4 1.3 5.3 21.3
#> 5 1.7 8.3 41.7
#> 
#> [[2]]
#>     x   y    z
#> 1 0.2 0.2  0.2
#> 2 0.4 0.8  1.6
#> 3 0.6 1.8  5.4
#> 4 0.8 3.2 12.8
#> 5 1.0 5.0 25.0

# For more complex operations, use it this way:
sapply(1:nrow(dt), function(i){
    dt$x[i] + dt$y[(i+2)%%nrow(dt)+1] - dt$z[(i+4)%%nrow(dt)+1]
})

#> [1]   16   19  -23  -56 -111

10.3.2 The `tidyverse` way

The package tidyverse offers numerous ways to avoid explicit for loops. To see how to do this, refer to the section on operations in the tidyverse.

10.4 Exercises

Exercise 1

Given x <- runif(1e3, min=-1, max=1), create a tibble like this one:

#> # A tibble: 1,000 × 2
#>          x y    
#>      <dbl> <chr>
#>  1  0.0580 x>0  
#>  2 -0.282  x<=0 
#>  3  0.411  x>0  
#>  4 -0.784  x<=0 
#>  5  0.532  x>0  
#>  6 -0.569  x<=0 
#>  7 -0.612  x<=0 
#>  8 -0.395  x<=0 
#>  9  0.583  x>0  
#> 10 -0.208  x<=0 
#> # ℹ 990 more rows

Exercise 2

Given:

LL <- list(A = runif(1e2),
           B = rnorm(1e3),
           C = data.frame(x=runif(1e2), y=runif(1e2))
           )

Print the sum of each element of LL in a list, in a vector.

Solution

LL <- list(A = runif(1e2),
           B = rnorm(1e3),
           C = data.frame(x=runif(1e2), y=runif(1e2))
           )
lapply(LL, sum)

#> $A
#> [1] 52.99975
#> 
#> $B
#> [1] 22.74111
#> 
#> $C
#> [1] 96.49105

unlist(lapply(LL, sum)); sapply(LL, sum)

#>        A        B        C 
#> 52.99975 22.74111 96.49105

#>        A        B        C 
#> 52.99975 22.74111 96.49105

Exercise 3

Download population.csv and load it into a data.frame
What is the total population over the years?
What is the mean population for each city?