You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Colin McLean <cm...@staffmail.ed.ac.uk> on 2020/10/12 08:49:41 UTC

[R and C++] passing arrow::Arrow from R to C++ for reading and writing?

Dear Arrow users,

  I was wondering if anyone can help me understand how I can create an  
arrow::Array object in R, then pass this into C++ (using the Rcpp  
library) for both reading and writing too? Similar what is done using  
the R bigmemory  
(https://privefl.github.io/blog/Tip-Optimize-your-Rcpp-loops/) or  
bigstatsr packages.

Kindest Regards,
Colin Mclean.

R script:
library(Rcpp)
library(arrow)

## compile c++ code
Sys.setenv("PKG_CXXFLAGS" = "-larrow")
sourceCpp("utils.cpp")

N = 10
X = arrow::Array$create(rep(0,N*N))

test( X$pointer() )


utils.cpp:
// define headers
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(arrow)]]

#include <arrow/api.h>
#include <arrow/array.h>
#include <arrow/array/array_base.h>
#include <Rcpp.h>
#include <stdio.h>
#include <iostream>
#include <string>


using namespace Rcpp;
using namespace std;

// [[Rcpp::export]]
void test( XPtr<arrow::Array> aAPtr ){

   cout << "read & write arrow::Array in test " << endl;
   cout << aAPtr << endl;

}

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: [R and C++] passing arrow::Arrow from R to C++ for reading and writing?

Posted by Colin McLean <cm...@staffmail.ed.ac.uk>.
Hi Neal,

  Thanks for your reply. I've attached below how I installed the arrow  
R package  on my system - first downloading and installing the C/C++  
code, then installing the R package contained in this [arrow.git]  
download.

  refs/points:
1) https://github.com/apache/arrow/tree/master/r
2) https://arrow.apache.org/docs/developers/cpp/building.html
3) requires R version 3.6 or higher.

[1]:
git clone https://github.com/apache/arrow.git
cd arrow/cpp
mkdir release
cd release

[2]:
sudo cmake ../ -DARROW_COMPUTE=ON -DARROW_CSV=ON -DARROW_DATASET=ON  
-DARROW_FILESYSTEM=ON -DARROW_JEMALLOC=ON -DARROW_JSON=ON  
-DARROW_PARQUET=ON -DARROW_BUILD_TYPE=release -DARROW_WITH_BROTLI=ON  
-DARROW_WITH_BZ2=ON -DARROW_WITH_LZ4=ON -DARROW_WITH_ZLIB=ON  
-DARROW_WITH_ZSTD=ON -DARROW_EXTRA_ERROR_CONTEXT=ON -DARROW_PLASMA=ON

[3]:
sudo make install

[4]:
#then run this to pick-up the new libs in /usr/local/lib
sudo ldconfig

[5]:
# and make sure LD_LIBRARY_PATH and R_LD_LIBRARY_PATH contains /usr/local/lib
echo $LD_LIBRARY_PATH
echo $R_LD_LIBRARY_PATH

[6]:
#then try installing the R arrow package
cd ../../r
R -e 'install.packages(c("devtools", "roxygen2", "pkgdown", "covr"));  
devtools::install_dev_deps()'
R CMD INSTALL .

#------------------------

  I believe the code I wrote works... in the sense that the  
value/address of the pointer passed is printed to screen - I also  
tested this by modifying the code to include the libraries suggested  
in your last email:

  ## compile c++ code
Sys.setenv("PKG_CXXFLAGS" = "-larrow_bundled_dependencies  
-larrow_dataset -lparquet -larrow")


  Running the R code, 'arrowTest.R', then generates:

  > source('arrowTest.R')

Attaching package: ‘arrow’

The following object is masked from ‘package:utils’:

     timestamp

in test
0x5652319be670
>


I guess my question was is what I'm doing correct? By this I mean from  
the code i have written, is (or how is) it possible to print the  
values of the arrow::Array, called 'aAPtr', that i've passed to  
function 'test' - and similarity write values to aAptr?

Thanks for your time to read my email, and for your reply.

All the Best,
Colin Mclean.


Quoting Neal Richardson <ne...@gmail.com> on Mon, 12 Oct  
2020 08:09:44 -0700:

> Hi Colin,
> Does the code you shared run? If not, how does it fail?
>
> One guess is that you're probably getting undefined symbols errors because
> you need more than just -larrow. See
> https://github.com/apache/arrow/blob/master/r/configure#L35 for others you
> need, and depending on how you installed arrow, you likely also need
> -larrow_bundled_dependencies.
>
> Neal
>
> On Mon, Oct 12, 2020 at 1:49 AM Colin McLean <cm...@staffmail.ed.ac.uk>
> wrote:
>
>> Dear Arrow users,
>>
>>   I was wondering if anyone can help me understand how I can create an
>> arrow::Array object in R, then pass this into C++ (using the Rcpp
>> library) for both reading and writing too? Similar what is done using
>> the R bigmemory
>> (https://privefl.github.io/blog/Tip-Optimize-your-Rcpp-loops/) or
>> bigstatsr packages.
>>
>> Kindest Regards,
>> Colin Mclean.
>>
>> R script:
>> library(Rcpp)
>> library(arrow)
>>
>> ## compile c++ code
>> Sys.setenv("PKG_CXXFLAGS" = "-larrow")
>> sourceCpp("utils.cpp")
>>
>> N = 10
>> X = arrow::Array$create(rep(0,N*N))
>>
>> test( X$pointer() )
>>
>>
>> utils.cpp:
>> // define headers
>> // [[Rcpp::plugins(cpp11)]]
>> // [[Rcpp::depends(arrow)]]
>>
>> #include <arrow/api.h>
>> #include <arrow/array.h>
>> #include <arrow/array/array_base.h>
>> #include <Rcpp.h>
>> #include <stdio.h>
>> #include <iostream>
>> #include <string>
>>
>>
>> using namespace Rcpp;
>> using namespace std;
>>
>> // [[Rcpp::export]]
>> void test( XPtr<arrow::Array> aAPtr ){
>>
>>    cout << "read & write arrow::Array in test " << endl;
>>    cout << aAPtr << endl;
>>
>> }
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: [R and C++] passing arrow::Arrow from R to C++ for reading and writing?

Posted by Neal Richardson <ne...@gmail.com>.
Hi Colin,
Does the code you shared run? If not, how does it fail?

One guess is that you're probably getting undefined symbols errors because
you need more than just -larrow. See
https://github.com/apache/arrow/blob/master/r/configure#L35 for others you
need, and depending on how you installed arrow, you likely also need
-larrow_bundled_dependencies.

Neal

On Mon, Oct 12, 2020 at 1:49 AM Colin McLean <cm...@staffmail.ed.ac.uk>
wrote:

> Dear Arrow users,
>
>   I was wondering if anyone can help me understand how I can create an
> arrow::Array object in R, then pass this into C++ (using the Rcpp
> library) for both reading and writing too? Similar what is done using
> the R bigmemory
> (https://privefl.github.io/blog/Tip-Optimize-your-Rcpp-loops/) or
> bigstatsr packages.
>
> Kindest Regards,
> Colin Mclean.
>
> R script:
> library(Rcpp)
> library(arrow)
>
> ## compile c++ code
> Sys.setenv("PKG_CXXFLAGS" = "-larrow")
> sourceCpp("utils.cpp")
>
> N = 10
> X = arrow::Array$create(rep(0,N*N))
>
> test( X$pointer() )
>
>
> utils.cpp:
> // define headers
> // [[Rcpp::plugins(cpp11)]]
> // [[Rcpp::depends(arrow)]]
>
> #include <arrow/api.h>
> #include <arrow/array.h>
> #include <arrow/array/array_base.h>
> #include <Rcpp.h>
> #include <stdio.h>
> #include <iostream>
> #include <string>
>
>
> using namespace Rcpp;
> using namespace std;
>
> // [[Rcpp::export]]
> void test( XPtr<arrow::Array> aAPtr ){
>
>    cout << "read & write arrow::Array in test " << endl;
>    cout << aAPtr << endl;
>
> }
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>