You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/10 19:48:17 UTC

[GitHub] [arrow] jonkeane commented on pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

jonkeane commented on pull request #10445:
URL: https://github.com/apache/arrow/pull/10445#issuecomment-858983224


   Ok, I've run some benchmarks on this branch and I'm seeing a huge speed up for floats + integers with `as.vector(array)`. 🎉 
   
   It might be out of scope for this PR, but chunked arrays don't see a similar speed up (which makes sense given they call `ArrayVector__as_vector` directly rather than routing through `Array__as_vector`, so they aren't being using alt rep). I can't quite tell from the cpp if `Table__to_dataframe` would _just work_ with alt rep as well if it worked with ChunkedArrays or if we would need to more to facilitate that.
   
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   
   x <- 1:1e3+ 1L
   v <- Array$create(x)
   x1 <- v$as_vector()  
   .Internal(inspect(x1))
   #> @7f9077f5a1a8 13 INTSXP g0c0 [REF(65535)] std::shared_ptr<arrow::Array, int32, NONULL> (len=1000, ptr=0x7f90975a9a08)
   
   
   v_chunked <- ChunkedArray$create(x)
   x2 <- v_chunked$as_vector()  
   .Internal(inspect(x2))
   #> @7f908312c000 13 INTSXP g0c7 [REF(2)] (len=1000, tl=0) 2,3,4,5,6,...
   ```
   
   <sup>Created on 2021-06-10 by the [reprex package](https://reprex.tidyverse.org) (v2.0.0)</sup>
   
   arrowbench results (using the new benchmarks in https://github.com/ursacomputing/arrowbench/pull/28): 
   [zero-copy-data-conversion.html.zip](https://github.com/apache/arrow/files/6633992/zero-copy-data-conversion.html.zip)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org