You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2021/02/26 17:58:00 UTC

[jira] [Commented] (ARROW-11735) [R] Allow parquet to be an optional component like S3

    [ https://issues.apache.org/jira/browse/ARROW-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291844#comment-17291844 ] 

Neal Richardson commented on ARROW-11735:
-----------------------------------------

See r/data-raw/codegen.R for how this comes together (docs in comments at the top of the file), and look at the bottom of r/src/filesystem.cpp for an example of how this is working for s3. Changes to make:

* Make sure that everything that touches things in the parquet C++ namespace (C++ classes that are {{parquet::something}}, header {{#include}} statements of parquet paths) are inside {{#if defined(ARROW_R_WITH_PARQUET)}}
* Change the {{// [[arrow::export]]}} annotations on those functions to be {{// [[parquet::export]]}}
* Add "parquet" to the features list at the top of data-raw/codegen.R
* This is the tricky part: in configure and configure.win, do a check similar to how we look for whether {{ARROW_S3}} was enabled in the C++ build, and if so add {{-DARROW_R_WITH_PARQUET}} to PKG_CFLAGS. This will be trickier on Windows because I don't think the rwinlib bundle includes ArrowOptions.cmake (but it will always have Parquet enabled). An alternative (perhaps more robust approach) would be to follow the TEST_CMD approach and test for parquet/s3 headers (current L179 of configure).
* In configure.win, you may also need to massage the PKG_LIBS since it has hard-coded that parquet and thrift are present (as they are in the rwinlib bundles). This is perhaps a separate issue from this (since this issue is particularly about Solaris and that's about doing different C++ dev builds on Windows.
* Once you've done this, the build should succeed without Parquet, but tests that involve Parquet will error. Revise skip_if_not_available() in r/tests/testthat/helper-skip.R to check {{arrow_with_parquet()}} if feature == "parquet" (this function is generated by codegen.R), then add {{skip_if_not_available("parquet")}} to all relevant tests.
* Search for {{arrow_with_s3()}} in the code and see where else we should do the same with arrow_with_parquet(). Among the uses:
* Add parquet to arrow_info() alongside where arrow_with_s3() is checked
* Wrap any parquet doc examples {{if (arrow_with_parquet())}}

> [R] Allow parquet to be an optional component like S3
> -----------------------------------------------------
>
>                 Key: ARROW-11735
>                 URL: https://issues.apache.org/jira/browse/ARROW-11735
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: R
>            Reporter: Neal Richardson
>            Priority: Major
>             Fix For: 4.0.0
>
>
> Parquet requires thrift and it seems that thrift (at least as of version 0.12) does not compile on Solaris:
> {code}
> /export/home/X1svPYR/Rtemp/RtmptF1MlN/file75097d284891/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/THttpServer.cpp: In member function virtual void apache::thrift::transport::THttpServer::parseHeader(char*):
> /export/home/X1svPYR/Rtemp/RtmptF1MlN/file75097d284891/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/THttpServer.cpp:50:74: error: strcasestr was not declared in this scope
>    #define THRIFT_strcasestr(haystack, needle) strcasestr(haystack, needle)
>                                                                           ^
> /export/home/X1svPYR/Rtemp/RtmptF1MlN/file75097d284891/thrift_ep-prefix/src/thrift_ep/lib/cpp/src/thrift/transport/THttpServer.cpp:62:9: note: in expansion of macro THRIFT_strcasestr
>      if (THRIFT_strcasestr(value, "chunked") != NULL) {
> {code}
> (along with some boost endian header deprecation warnings)
> We could debug/patch that, or we could also make Parquet an optional feature in the R bindings. That might have some value anyway so that one could build a lighter/minimal R package, if that were helpful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)