You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/03 15:23:21 UTC

[GitHub] [arrow-nanoarrow] paleolimbot commented on pull request #65: [R] Complete ptype inferences and array conversions

paleolimbot commented on PR #65:
URL: https://github.com/apache/arrow-nanoarrow/pull/65#issuecomment-1302277227

   Still needs testing with  more datasets, but these conversions are potentially much faster than the current arrow R package's:
   
   ``` r
   # remotes::install_github("apache/arrow-nanoarrow/r#65")
   library(nanoarrow)
   # latest master (i.e., with latest ALTREP improvement PR)
   library(arrow, warn.conflicts = FALSE)
   #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
   library(nycflights13)
   
   flights <- nycflights13::flights
   # until nanoarrow converts datetimes
   flights$time_hour <- NULL
   
   
   flights_table <- as_arrow_table(flights)
   flights_array <- as_nanoarrow_array(flights_table)
   n <- nrow(flights)
   
   # with altrep, arrow is faster
   bench::mark(
     arrow_altrep = as.data.frame(as.data.frame(flights_table)),
     nanoarrow = as.data.frame(as.data.frame(flights_array))
   )
   #> # A tibble: 2 × 6
   #>   expression        min   median `itr/sec` mem_alloc `gc/sec`
   #>   <bch:expr>   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
   #> 1 arrow_altrep    323µs 333.66µs     2923.     847KB     30.7
   #> 2 nanoarrow      2.12ms   2.56ms      384.    25.8MB    370.
   
   # with materialization nanoarrow is much faster?
   bench::mark(
     arrow_altrep = as.data.frame(as.data.frame(flights_table))[n:1, ],
     nanoarrow = as.data.frame(as.data.frame(flights_array))[n:1, ],
     min_iterations = 5
   )
   #> # A tibble: 2 × 6
   #>   expression        min   median `itr/sec` mem_alloc `gc/sec`
   #>   <bch:expr>   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
   #> 1 arrow_altrep   80.3ms   80.3ms      12.5    42.5MB     74.7
   #> 2 nanoarrow      42.7ms     43ms      23.2    68.2MB    151.
   
   # without altrep, nanoarrow is much much much faster?
   withr::with_options(list(arrow.use_altrep = FALSE), {
     bench::mark(
       arrow_no_altrep = as.data.frame(as.data.frame(flights_table)),
       nanoarrow = as.data.frame(as.data.frame(flights_array)),
       min_iterations = 5
     )
   })
   #> # A tibble: 2 × 6
   #>   expression           min   median `itr/sec` mem_alloc `gc/sec`
   #>   <bch:expr>      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
   #> 1 arrow_no_altrep  20.95ms  21.56ms      46.6      36MB     25.1
   #> 2 nanoarrow         2.06ms   2.56ms     384.     25.7MB    136.
   ```
   
   <sup>Created on 2022-11-03 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org