You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by GitBox <gi...@apache.org> on 2023/01/04 13:24:06 UTC

[GitHub] [arrow] paleolimbot opened a new issue, #15188: `head()` before `collect()` is not generating the expected ExecPlan

paleolimbot opened a new issue, #15188:
URL: https://github.com/apache/arrow/issues/15188

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   The helpful ursabot conbench integration noted some regressions after #14518. That was a rather small change that affected usage of `head()` that I had thought was rare; however, it appears that it actually gets used frequently which was surprising because I had assumed that sink nodes with options (rather than a record batch reader) was supposed to be used.
   
   Some quick inspection suggests that we're generating a nested exec plan for every `head()` here:
   
   https://github.com/apache/arrow/blob/ec9a8a322b9486584e51c174b63774fd496783b0/r/R/dplyr.R#L230
   
   ...which means that all of these will go through a record batch reader instead of the appropriate sink node:
   
   https://github.com/apache/arrow/blob/ec9a8a322b9486584e51c174b63774fd496783b0/r/R/query-engine.R#L79-L86
   
   Reprex:
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
   
   as_arrow_table(mtcars) |> 
     arrow:::as_adq() |> 
     head(10) |> 
     show_exec_plan()
   #> Warning: The `ExecPlan` cannot be printed for a nested query.
   ```
   
   (this query should probably not be nested, but instead be collected with the appropriate sink node options?)
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on issue #15188: `head()` before `collect()` is not generating the expected ExecPlan

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic commented on issue #15188:
URL: https://github.com/apache/arrow/issues/15188#issuecomment-1568554437

   Closing this as I think this has been fixed in another ticket; when I run the repro on a relatively recent version of the package, I get this:
   
   ```
   > library(arrow, warn.conflicts = FALSE)
   > as_arrow_table(mtcars) |> 
   +   arrow:::as_adq() |> 
   +   head(10) |> 
   +   show_exec_plan()
   ExecPlan with 3 nodes:
   2:SinkNode{}
     1:FetchNode{offset=0 count=10}
       0:TableSourceNode{}
   ```
   Feel free to reopen if this is still not working as expected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic closed issue #15188: `head()` before `collect()` is not generating the expected ExecPlan

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic closed issue #15188: `head()` before `collect()` is not generating the expected ExecPlan
URL: https://github.com/apache/arrow/issues/15188


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org