You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/03 21:14:25 UTC

[GitHub] [arrow] westonpace commented on a change in pull request #10191: [R] [WIP] Use InMemoryDataset for Table/RecordBatch in dplyr code

westonpace commented on a change in pull request #10191:
URL: https://github.com/apache/arrow/pull/10191#discussion_r625372419



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -344,20 +344,21 @@ test_that("print a mutated table", {
       select(int) %>%
       mutate(twice = int * 2) %>%
       print(),
-'Table (query)
+'InMemoryDataset (query)
 int: int32
 twice: expr
 
 See $.data for the source Arrow object',
   fixed = TRUE)
 
   # Handling non-expressions/edge cases
+  skip("InMemoryDataset$Project() doesn't accept array (or could it?)")
   expect_output(
     Table$create(tbl) %>%
       select(int) %>%
       mutate(again = 1:10) %>%

Review comment:
       None of the examples on https://dplyr.tidyverse.org/reference/mutate.html actually use this form and I have a hard time understanding why someone might want to do this?
   
   Furthermore, this question https://stackoverflow.com/questions/60582562/another-length-error-using-dplyr-mutate-and-if-else shows some of the confusion you run into with something like this.
   
   From an SQL perspective the proper way to add in a new column would be to join.  This is sort of a "join without a common key" which raises a few eyebrows in this question: https://stackoverflow.com/questions/1198124/combine-two-tables-that-have-no-common-fields
   
   Also, would the vector be the same length as a single batch?  Or the entire table?  If it's the entire table then it's going to force the table to be processed in order which is undesirable as well.
   
   I think I'd want to see a valid use case before investing effort.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org