You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/03 15:23:21 UTC
[GitHub] [arrow-nanoarrow] paleolimbot commented on pull request #65: [R] Complete ptype inferences and array conversions
paleolimbot commented on PR #65:
URL: https://github.com/apache/arrow-nanoarrow/pull/65#issuecomment-1302277227
Still needs testing with more datasets, but these conversions are potentially much faster than the current arrow R package's:
``` r
# remotes::install_github("apache/arrow-nanoarrow/r#65")
library(nanoarrow)
# latest master (i.e., with latest ALTREP improvement PR)
library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
library(nycflights13)
flights <- nycflights13::flights
# until nanoarrow converts datetimes
flights$time_hour <- NULL
flights_table <- as_arrow_table(flights)
flights_array <- as_nanoarrow_array(flights_table)
n <- nrow(flights)
# with altrep, arrow is faster
bench::mark(
arrow_altrep = as.data.frame(as.data.frame(flights_table)),
nanoarrow = as.data.frame(as.data.frame(flights_array))
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 arrow_altrep 323µs 333.66µs 2923. 847KB 30.7
#> 2 nanoarrow 2.12ms 2.56ms 384. 25.8MB 370.
# with materialization nanoarrow is much faster?
bench::mark(
arrow_altrep = as.data.frame(as.data.frame(flights_table))[n:1, ],
nanoarrow = as.data.frame(as.data.frame(flights_array))[n:1, ],
min_iterations = 5
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 arrow_altrep 80.3ms 80.3ms 12.5 42.5MB 74.7
#> 2 nanoarrow 42.7ms 43ms 23.2 68.2MB 151.
# without altrep, nanoarrow is much much much faster?
withr::with_options(list(arrow.use_altrep = FALSE), {
bench::mark(
arrow_no_altrep = as.data.frame(as.data.frame(flights_table)),
nanoarrow = as.data.frame(as.data.frame(flights_array)),
min_iterations = 5
)
})
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 arrow_no_altrep 20.95ms 21.56ms 46.6 36MB 25.1
#> 2 nanoarrow 2.06ms 2.56ms 384. 25.7MB 136.
```
<sup>Created on 2022-11-03 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org