You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/04/04 15:40:43 UTC

[GitHub] [arrow] westonpace opened a new issue, #34887: [R] What should compute look like in an R minimal build?

westonpace opened a new issue, #34887:
URL: https://github.com/apache/arrow/issues/34887

   ### Describe the enhancement requested
   
   It is now possible to build Arrow without Acero.  An R minimal build will (once #34844 merges, assuming no changes) disable Acero.  This means that a lot of the functionality & tests will no longer work.
   
   Should we keep Acero out of the minimal build?  It's not super large (libarrow_acero is ~10% of the size of libarrow if ARROW_COMPUTE=ON)
   
   Currently, in the PR, I am skipping all dplyr tests.  This is mainly for simplicity so I can get the build working.  Some dplyr tests can still pass (even without Acero, compute functions still work).
   
   Do we want to try to more narrowly skip dplyr tests?  Do we want to say "if acero is disabled then don't try to do compute" or do we want to try and explain some nuance of "you can do these compute tasks without acero but these other compute tasks require acero"? 
   
   On the other hand, it is now possible to turn `ARROW_COMPUTE=OFF` and actually see some meaningful difference (all non-essential kernels are removed, cuts the size of the arrow binary almost in half).  Should we disable compute entirely in an R minimal build?
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on issue #34887: [R] What should compute look like in an R minimal build?

Posted by "nealrichardson (via GitHub)" <gi...@apache.org>.
nealrichardson commented on issue #34887:
URL: https://github.com/apache/arrow/issues/34887#issuecomment-1497532802

   As I understand the question, https://github.com/apache/arrow/pull/34844 allows you to build with the new `ARROW_ACERO_OFF`, and this issue is essentially: should it be possible to also build the R package with `ARROW_COMPUTE=OFF`, which currently is not possible?
   
   I would say: probably so, for the sake of completeness, but it doesn't seem high priority, not unless/until we get folks asking for it with actual use cases. 
   
   Our historical interest in the "minimal" build was to help with staying on CRAN, particularly on Solaris, which is no longer a requirement. It did not arise from popular demand--though there was demand for very minimal arrow bindings, as I recall `nanoarrow` was the solution we went with.
   
   All that said, if we could ship an ABI-stable core arrow C++ library and ship the fun compute functionality in a separate package, the "minimal" arrow R/C++ package would become a lot more interesting, IMO.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on issue #34887: [R] What should compute look like in an R minimal build?

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on issue #34887:
URL: https://github.com/apache/arrow/issues/34887#issuecomment-1497451144

   I haven't been following the refactor as closely as I should have; however, from a high level the "minimal" build is not all that important because it's fairly difficult for a user to actually get one (requires setting environment variables or linking to an existing minimal C++ build, which I have only ever managed to do accidentally). From that perspective, turning compute off on the "minimal" build is probably a good idea: anybody spending effort to get a smaller arrow R package build almost certainly does not need non-essential compute kernels. (At least one more opinion would be helpful here...I haven't been following issues for long enough to know what people are trying to do with minimal R builds).
   
   > Currently, in the PR, I am skipping all dplyr tests.
   
   I think that skipping dplyr tests if Acero is turned off is reasonable (or at least that attempting a finer-grained skipping strategy is unlikely to be worth our time).
   
   It sounds like the action item here might be to (1) make sure CMD check passes when ARROW_COMPUTE=OFF and (2) make ARROW_COMPUTE=OFF part of what you get with ARROW_R_MINIMAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #34887: [R] What should compute look like in an R minimal build?

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #34887:
URL: https://github.com/apache/arrow/issues/34887#issuecomment-1496980134

   > Question here. If R minimal build used to be with "ARROW_COMPUTE=ON", then with the Acero refactor it should be with "ARROW_COMPUTE=ON ARROW_ACERO=ON"? (so that the same functionality is maintained?)
   
   @icexelloss 
   
   You could argue either way.  You could say "a minimal build was R + compute" or you could argue "a minimal build is every feature turned off".  In the former you'd keep ARROW_ACERO=ON.  In the latter you would not.  That's one of the questions I have.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on issue #34887: [R] What should compute look like in an R minimal build?

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on issue #34887:
URL: https://github.com/apache/arrow/issues/34887#issuecomment-1497581622

   > > Question here. If R minimal build used to be with "ARROW_COMPUTE=ON", then with the Acero refactor it should be with "ARROW_COMPUTE=ON ARROW_ACERO=ON"? (so that the same functionality is maintained?)
   > 
   > @icexelloss
   > 
   > You could argue either way. You could say "a minimal build was R + compute" or you could argue "a minimal build is every feature turned off". In the former you'd keep ARROW_ACERO=ON. In the latter you would not. That's one of the questions I have.
   
   Got it - I understand what you are trying to solve now. Thanks for clarifying.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #34887: [R] What should compute look like in an R minimal build?

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #34887:
URL: https://github.com/apache/arrow/issues/34887#issuecomment-1502348376

   In https://github.com/apache/arrow/pull/34844 we ended up defaulting minimal to mean ACERO=ON (since minimal already enabled datasets).  Given this, and @nealrichardson 's feedback, I don't think this is urgent at all.  Should we find some use case in the future we can finish it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on issue #34887: [R] What should compute look like in an R minimal build?

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on issue #34887:
URL: https://github.com/apache/arrow/issues/34887#issuecomment-1496424303

   > An R minimal build will (once https://github.com/apache/arrow/pull/34844 merges, assuming no changes) disable Acero.
   
   Question here. If R minimal build used to be with "ARROW_COMPUTE=ON", then with the Acero refactor it should be with "ARROW_COMPUTE=ON ARROW_ACERO=ON"? (so that the same functionality is maintained?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org