You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "paleolimbot (via GitHub)" <gi...@apache.org> on 2023/06/26 13:05:39 UTC

[GitHub] [arrow] paleolimbot commented on issue #36274: Expose a std::shared_ptr to R SEXP

paleolimbot commented on issue #36274:
URL: https://github.com/apache/arrow/issues/36274#issuecomment-1607431346

   The `std::shared_ptr<arrow::Table>` is a C++ pointer tied to a very specific version of Arrow C++ built with very specific compiler flags. Pointers like this are usually not exposed to other scripts or packages in R because it is difficult to guarantee stability. When you say "expose to R scripts"...do you mean that you have some Arrow C++ code linked to R using something like Rcpp?
   
   I think what you may be looking for is the C data interface. Arrow C++ can export a table as an ABI-stable stream of record batches. This is not *quite* the same as a table but will allow you to export the Table from the arrow R package and import it using C++ from elsewhere.
   
   ``` r
   # These are specific to my system (homebrew on MacOS M1)
   arrow_include <- "-I/opt/homebrew/Cellar/apache-arrow/12.0.0_1/include"
   arrow_libs <- "-L/opt/homebrew/Cellar/apache-arrow/12.0.0_1/lib -larrow"
   Sys.setenv("PKG_CXXFLAGS" = arrow_include)
   Sys.setenv("PKG_LIBS" = arrow_libs)
   
   cpp11::cpp_source(code = '
   #include <arrow/table.h>
   #include <arrow/c/bridge.h>
   #include <cpp11.hpp>
   
   using namespace arrow;
   
   // Version that returns a Result<> so we can use Arrow C++-style error handling
   // macros
   Result<int> count_rows_internal(SEXP array_stream_xptr) {
     auto array_stream = reinterpret_cast<struct ArrowArrayStream*>(
       R_ExternalPtrAddr(array_stream_xptr));
     
     ARROW_ASSIGN_OR_RAISE(auto reader, ImportRecordBatchReader(array_stream))
     
     std::shared_ptr<Table> table;
     ARROW_RETURN_NOT_OK(reader->ReadAll(&table));
     
     return table->num_rows();
   }
   
   // Version that uses cpp11 error handling
   [[cpp11::register]]
   int count_rows(SEXP array_stream_xptr) {
     Result<int> num_rows = count_rows_internal(array_stream_xptr);
     if (num_rows.ok()) {
       return *num_rows;
     } else {
       cpp11::stop("Arrow C++ error: %s", num_rows.status().ToString().c_str());
     }
   }
   
   ', cxx_std = "CXX17")
   
   
   library(arrow, warn.conflicts = FALSE)
   #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
   library(nanoarrow)
   
   tab <- arrow_table(x = 1:10)
   (array_stream <- as_nanoarrow_array_stream(tab))
   #> <nanoarrow_array_stream struct<x: int32>>
   #>  $ get_schema:function ()  
   #>  $ get_next  :function (schema = x$get_schema(), validate = TRUE)  
   #>  $ release   :function ()
   count_rows(array_stream)
   #> [1] 10
   ```
   
   <sup>Created on 2023-06-26 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org