Skip to content

Nested Futures Use More Memory Than They Should #709

@jestover

Description

@jestover

I've been running code with nested loops that keeps running into issues with memory usage and I have been trying to come up with a small example that potentially shows the problem. In the example I am just taking a random square matrix and creating a list of the columns. Obviously you wouldn't use a double loop to do this in R but it is hopefully a simple and clear example that shows when using purrr the double loop doesn't increase memory usage while with furrr and future.apply the memory usage explodes.

library(bench)
library(furrr)
library(future.apply)
library(purrr)

# purrr
single_loop <- function(x, n) {
  map(1:n, ~ x[, .x])
}

# future.apply
single_loop_a <- function(x, n) {
  future_lapply(1:n, FUN = function(i) x[, i])
}

# furrr
single_loop_f <- function(x, n) {
  future_map(1:n, ~ x[, .x])
}

# purrr
inner_loop <- function(i, n, x = x) {
  map_dbl(1:n, ~ x[.x, i])
}

outer_loop <- function(x, n) {
  map(1:n, ~ inner_loop(.x, n, x = x))
}

# future.apply
inner_loop_a <- function(i, n, x = x) {
  future_sapply(1:n, FUN = function(j) x[j, i])
}

outer_loop_a <- function(x, n) {
  future_lapply(1:n, FUN = function(i) inner_loop_a(i, n, x))
}

# furrr
inner_loop_f <- function(i, n, x = x) {
  future_map_dbl(1:n, ~ x[.x, i])
}

outer_loop_f <- function(x, n) {
  future_map(1:n, ~ inner_loop_f(.x, n, x = x))
}

n <- 100
x <- matrix(rnorm(n * n), nrow = n)

identical(single_loop(x, n), single_loop_f(x, n))
identical(single_loop(x, n), single_loop_a(x, n))
identical(single_loop(x, n), outer_loop(x, n))
identical(single_loop(x, n), outer_loop_a(x, n))
identical(single_loop(x, n), outer_loop_f(x, n))
# All return TRUE

plan(sequential)

# With a single loop memory usage is similar
bench::mark(single_loop(x, n))$mem_alloc
# 127KB
bench::mark(single_loop_a(x, n))$mem_alloc
# 243KB
bench::mark(single_loop_f(x, n))$mem_alloc
# 340KB

# With a double loop memory usage remains similar for purrr, but explodes 
# on the other two
bench::mark(outer_loop(x, n))$mem_alloc
# 83.6KB
bench::mark(outer_loop_a(x, n))$mem_alloc
# 11.8MB
bench::mark(outer_loop_f(x, n))$mem_alloc
# 21.1MB

# Try again with a larger matrix
n <- 5000
x <- matrix(rnorm(n * n), nrow = n)

bench::mark(single_loop(x, n))$mem_alloc
287MB
bench::mark(single_loop_a(x, n))$mem_alloc
287MB
bench::mark(single_loop_f(x, n))$mem_alloc
287MB

bench::mark(outer_loop(x, n))$mem_alloc
191MB
bench::mark(outer_loop_a(x, n))$mem_alloc
2.88GB
bench::mark(outer_loop_f(x, n))$mem_alloc
1.57GB

As you can see, using the double loop actually decreases memory usage for purrr, although it stays very similar, but causes memory usage to explode for furrr and future.apply. I ran this example on a 2023 MacBook, but the actual code that I am trying to fix has been running on a Linux cluster. I ran this example using furrr and future.apply because yesterday I logged a bug report about nested loops using future.callr and @HenrikBengtsson pointed out that it was only an issue with furrr. Please let me know if there is any additional information I can provide or help I can give in solving this issue and thanks for the wonderful collection of packages!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions