You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/12 15:28:58 UTC

[GitHub] [arrow] pachamaltese opened a new pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

pachamaltese opened a new pull request #9999:
URL: https://github.com/apache/arrow/pull/9999


   1st version


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r626081429



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -15,6 +15,25 @@
 # specific language governing permissions and limitations
 # under the License.
 
+# developer note:
+# when adapting tests from the dplyr package, note that expect_dplyr_equal()
+# works differently from expect_equal(). the first argument is a dplyr
+# expression containing the placeholder `input`, and the second argument
+# is the R data frame to pass in as that input. So for example, adapt a test
+# from dplyr as follows:
+#
+# dplyr:
+# expect_equal(
+#   mutate(data.frame(x = 1, y = 1), z = 1, x = NULL, y = NULL),
+#   data.frame(z = 1)
+# )
+#
+# arrow:
+# expect_dplyr_equal(
+#   input %>% mutate(z = 1, x = NULL, y = NULL) %>% collect(),
+#   data.frame(x = 1, y = 1)
+# )

Review comment:
       Should this go here or go in https://github.com/apache/arrow/blob/ed4f79c77e5d34d5996fa287df0934dd2f93a12a/r/tests/testthat/helper-expectation.R#L73 where we define `expect_dplyr_equal()`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r619291340



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +406,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = 2) %>% collect(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_dplyr_equal(
+    input %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df
+  )
+})
+
+# similar to # similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L31-L35
+# SEE LINE 175 IN THIS SCRIPT!
+# THIS WON'T WORK
+# https://issues.apache.org/jira/browse/ARROW-11705
+#  Error in UseMethod("mutate") :
+# no applicable method for 'mutate' applied to an object of class "c('double', 'numeric')"
+# test_that("length-1 vectors are recycled (#152)", {
+#   df <- tibble(x = 1:4)
+#
+#   expect_dplyr_equal(
+#     collect(mutate(input, y = 1))$y,
+#     rep(1, 4)
+#   )
+# })

Review comment:
       Remove this since there's already a test for scalar value recycling around line 175 as you noted here
   ```suggestion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pachamaltese commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
pachamaltese commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r624249023



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -32,59 +47,23 @@ test_that("mutate() is lazy", {
   )
 })
 
-test_that("basic mutate", {
-  expect_dplyr_equal(
-    input %>%
-      select(int, chr) %>%
-      filter(int > 5) %>%
-      mutate(int = int + 6L) %>%
-      collect(),
-    tbl
-  )
-})
-
-test_that("mutate() with NULL inputs", {
-  expect_dplyr_equal(
-    input %>%
-      mutate(int = NULL) %>%
-      collect(),
-    tbl
-  )
-})
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10

Review comment:
       100% right, last week those matched




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r640647148



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -32,59 +51,23 @@ test_that("mutate() is lazy", {
   )
 })
 
-test_that("basic mutate", {
-  expect_dplyr_equal(
-    input %>%
-      select(int, chr) %>%
-      filter(int > 5) %>%
-      mutate(int = int + 6L) %>%
-      collect(),
-    tbl
-  )
-})
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
 
-test_that("mutate() with NULL inputs", {
-  expect_dplyr_equal(
-    input %>%
-      mutate(int = NULL) %>%
-      collect(),
-    tbl
-  )
+  expect_dplyr_equal(input %>% mutate() %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)

Review comment:
       The PR for ARROW-11769 is merged now, so disregard my above comment; there is no need to rewrite these tests. But please do add a comment explaining that the purpose of the `gtbl` tests is to check that `mutate()` returns its input regardless of whether its input is grouped.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r620579012



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +412,154 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = c(3, 4)) %>% collect() %>% rownames(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_equal(
+    df %>% Table$create() %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df %>% Table$create() %>% mutate(x2 = x + 0, x3 = x2 + 1) %>% collect()
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L37-L54
+test_that("can remove variables with NULL (dplyr #462)", {
+  df <- tibble(x = 1:3, y = 1:3)
+  gf <- group_by(df, x)
+
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), df)
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), gf)
+
+  # even if it doesn't exist
+  expect_dplyr_equal(input %>% mutate(z = NULL) %>% collect(), df)
+  # or was just created
+  expect_dplyr_equal(input %>% mutate(z = rep(1, nrow(input)), z = NULL) %>% collect(), df)
+
+  # regression test for https://github.com/tidyverse/dplyr/issues/4974
+  expect_dplyr_equal(
+    input %>% mutate(z = 1, x = NULL, y = NULL) %>% collect(),
+    data.frame(x = 1, y = 1)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L71-L75
+# test_that("assignments don't overwrite variables (dplyr #315)", {
+#   expect_dplyr_equal(
+#     tibble(x = 1, y = 2) %>% mutate(z = {x <- 10; x}) %>% collect(),
+#     tibble(x = 1, y = 2, z = 10)
+#   )
+# })
+# NOT SURE ABOUT THIS!
+test_that("assignments don't overwrite variables (dplyr #315)", {
+  expect_dplyr_equal(
+    input %>% mutate(z = {x <- 10; x}) %>% collect(),
+    tibble(x = 1, y = 2, z = 10)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L77-L81
+# NOT SURE ABOUT THIS!
+# test_that("can mutate a data frame with zero columns and `NULL` column names", {
+#   df <- vctrs::new_data_frame(n = 2L)
+#   colnames(df) <- NULL
+#   expect_dplyr_equal(
+#     input %>% mutate(x = 1) %>% collect(),
+#     df
+#   )
+# })
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L95-L100
+# glue is a dependency of tidyselect
+test_that("glue() is supported", {
+  expect_dplyr_equal(
+    input %>% mutate(y = glue::glue("")) %>% collect(),
+    tibble(x = 1, y = glue::glue(""))
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L102-L106
+test_that("mutate disambiguates NA and NaN (#1448)", {
+  expect_dplyr_equal(
+    input %>% mutate(y = x * 1) %>% select(y) %>% collect(),
+    tibble(x = c(1, NA, NaN))
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L102-L106
+# this is somewhat "contained" in the previous test
+# test_that("mutate handles data frame columns", {
+#   expect_dplyr_equal(
+#     input %>% mutate(new_col = data.frame(x = 1:3)) %>% select(new_col) %>% collect(),
+#     data.frame(x = 1:3)
+#   )
+#
+#   # mutate() on grouped data not supported in Arrow; this will be pulling data into R
+#   # expect_dplyr_equal(
+#   #   input %>%
+#   #     group_by(x) %>%
+#   #     mutate(new_col = x) %>%
+#   #     ungroup() %>%
+#   #     select(new_col) %>%
+#   #     collect(),
+#   #   data.frame(x = 1:3)
+#   # )
+#
+#   # ROWWISE IS NOT IMPLEMENTED
+#   # expect_dplyr_equal(
+#   #   input %>%
+#   #     rowwise(x) %>%
+#   #     mutate(new_col = x) %>%
+#   #     ungroup() %>%
+#   #     select(new_col) %>%
+#   #     collect(),
+#   #   data.frame(x = 1:3)
+#   # )
+# })
+
+# QUESTIONS SO FAR ----
+
+# https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L56-L59
+# https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L108-L115
+# does it make sense to create expect_dplyr_named() to mimic the behaviour from dply tests?

Review comment:
       I don't think we need a function `expect_dplyr_named()`. For the few cases where we want to explicitly test the names, I think we can just use `dplyr::expect_named()`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r619431106



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +406,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = 2) %>% collect(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_dplyr_equal(
+    input %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df
+  )

Review comment:
       For this one, you can't use `expect_dplyr_equal()` if you want to test what the test at https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L24-L28 tests. Better to just use `expect_equal()` for this particular one:
   ```suggestion
     df <- data.frame(x = 1, y = 2)
     expect_equal(
       df %>% Table$create() %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
       df %>% Table$create() %>% mutate(x2 = x + 0, x3 = x2 + 1) %>% collect()
     )
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r619319730



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +406,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = 2) %>% collect(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_dplyr_equal(
+    input %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df
+  )
+})
+
+# similar to # similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L31-L35
+# SEE LINE 175 IN THIS SCRIPT!
+# THIS WON'T WORK
+# https://issues.apache.org/jira/browse/ARROW-11705
+#  Error in UseMethod("mutate") :
+# no applicable method for 'mutate' applied to an object of class "c('double', 'numeric')"
+# test_that("length-1 vectors are recycled (#152)", {
+#   df <- tibble(x = 1:4)
+#
+#   expect_dplyr_equal(
+#     collect(mutate(input, y = 1))$y,
+#     rep(1, 4)
+#   )
+# })
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L37-L54
+test_that("can remove variables with NULL (dplyr #462)", {
+  df <- tibble(x = 1:3, y = 1:3)
+  gf <- group_by(df, x)
+
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), df[1])
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), gf[1])

Review comment:
       Need to remove the `[1]` after `df` and `gf`:
   ```suggestion
     expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), df)
     expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), gf)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r619324040



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +406,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = 2) %>% collect(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_dplyr_equal(
+    input %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df
+  )
+})
+
+# similar to # similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L31-L35
+# SEE LINE 175 IN THIS SCRIPT!
+# THIS WON'T WORK
+# https://issues.apache.org/jira/browse/ARROW-11705
+#  Error in UseMethod("mutate") :
+# no applicable method for 'mutate' applied to an object of class "c('double', 'numeric')"
+# test_that("length-1 vectors are recycled (#152)", {
+#   df <- tibble(x = 1:4)
+#
+#   expect_dplyr_equal(
+#     collect(mutate(input, y = 1))$y,
+#     rep(1, 4)
+#   )
+# })
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L37-L54
+test_that("can remove variables with NULL (dplyr #462)", {
+  df <- tibble(x = 1:3, y = 1:3)
+  gf <- group_by(df, x)
+
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), df[1])
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), gf[1])
+
+  # even if it doesn't exist
+  expect_dplyr_equal(input %>% mutate(z = NULL) %>% collect(), df)
+  # or was just created
+  expect_dplyr_equal(input %>% mutate(z = 1, z = NULL) %>% collect(), df)

Review comment:
       This would trigger the scalar recycling issue if `z` weren't removed. Fix it like this: 
   ```suggestion
     expect_dplyr_equal(input %>% mutate(z = rep(1, nrow(input)), z = NULL) %>% collect(), df)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r626085188



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -32,59 +51,23 @@ test_that("mutate() is lazy", {
   )
 })
 
-test_that("basic mutate", {
-  expect_dplyr_equal(
-    input %>%
-      select(int, chr) %>%
-      filter(int > 5) %>%
-      mutate(int = int + 6L) %>%
-      collect(),
-    tbl
-  )
-})
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
 
-test_that("mutate() with NULL inputs", {
-  expect_dplyr_equal(
-    input %>%
-      mutate(int = NULL) %>%
-      collect(),
-    tbl
-  )
+  expect_dplyr_equal(input %>% mutate() %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)

Review comment:
       This might be in the comments elsewhere, but could you describe a little more about the grouped table you have here and what it's testing?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pachamaltese commented on pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
pachamaltese commented on pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#issuecomment-825373762


   pending: why it fails on windows


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r620557471



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +412,154 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = c(3, 4)) %>% collect() %>% rownames(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_equal(
+    df %>% Table$create() %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df %>% Table$create() %>% mutate(x2 = x + 0, x3 = x2 + 1) %>% collect()
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L37-L54
+test_that("can remove variables with NULL (dplyr #462)", {
+  df <- tibble(x = 1:3, y = 1:3)
+  gf <- group_by(df, x)
+
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), df)
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), gf)
+
+  # even if it doesn't exist
+  expect_dplyr_equal(input %>% mutate(z = NULL) %>% collect(), df)
+  # or was just created
+  expect_dplyr_equal(input %>% mutate(z = rep(1, nrow(input)), z = NULL) %>% collect(), df)
+
+  # regression test for https://github.com/tidyverse/dplyr/issues/4974
+  expect_dplyr_equal(
+    input %>% mutate(z = 1, x = NULL, y = NULL) %>% collect(),
+    data.frame(x = 1, y = 1)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L71-L75
+# test_that("assignments don't overwrite variables (dplyr #315)", {
+#   expect_dplyr_equal(
+#     tibble(x = 1, y = 2) %>% mutate(z = {x <- 10; x}) %>% collect(),
+#     tibble(x = 1, y = 2, z = 10)
+#   )
+# })
+# NOT SURE ABOUT THIS!
+test_that("assignments don't overwrite variables (dplyr #315)", {
+  expect_dplyr_equal(
+    input %>% mutate(z = {x <- 10; x}) %>% collect(),
+    tibble(x = 1, y = 2, z = 10)

Review comment:
       ```suggestion
       tibble(x = 1, y = 2)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r620576267



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +412,154 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = c(3, 4)) %>% collect() %>% rownames(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_equal(
+    df %>% Table$create() %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df %>% Table$create() %>% mutate(x2 = x + 0, x3 = x2 + 1) %>% collect()
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L37-L54
+test_that("can remove variables with NULL (dplyr #462)", {
+  df <- tibble(x = 1:3, y = 1:3)
+  gf <- group_by(df, x)
+
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), df)
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), gf)
+
+  # even if it doesn't exist
+  expect_dplyr_equal(input %>% mutate(z = NULL) %>% collect(), df)
+  # or was just created
+  expect_dplyr_equal(input %>% mutate(z = rep(1, nrow(input)), z = NULL) %>% collect(), df)
+
+  # regression test for https://github.com/tidyverse/dplyr/issues/4974
+  expect_dplyr_equal(
+    input %>% mutate(z = 1, x = NULL, y = NULL) %>% collect(),
+    data.frame(x = 1, y = 1)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L71-L75
+# test_that("assignments don't overwrite variables (dplyr #315)", {
+#   expect_dplyr_equal(
+#     tibble(x = 1, y = 2) %>% mutate(z = {x <- 10; x}) %>% collect(),
+#     tibble(x = 1, y = 2, z = 10)
+#   )
+# })
+# NOT SURE ABOUT THIS!
+test_that("assignments don't overwrite variables (dplyr #315)", {
+  expect_dplyr_equal(
+    input %>% mutate(z = {x <- 10; x}) %>% collect(),
+    tibble(x = 1, y = 2, z = 10)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L77-L81
+# NOT SURE ABOUT THIS!
+# test_that("can mutate a data frame with zero columns and `NULL` column names", {
+#   df <- vctrs::new_data_frame(n = 2L)
+#   colnames(df) <- NULL
+#   expect_dplyr_equal(
+#     input %>% mutate(x = 1) %>% collect(),
+#     df
+#   )
+# })
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L95-L100
+# glue is a dependency of tidyselect
+test_that("glue() is supported", {
+  expect_dplyr_equal(
+    input %>% mutate(y = glue::glue("")) %>% collect(),
+    tibble(x = 1, y = glue::glue(""))
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L102-L106
+test_that("mutate disambiguates NA and NaN (#1448)", {
+  expect_dplyr_equal(
+    input %>% mutate(y = x * 1) %>% select(y) %>% collect(),
+    tibble(x = c(1, NA, NaN))
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L102-L106
+# this is somewhat "contained" in the previous test
+# test_that("mutate handles data frame columns", {
+#   expect_dplyr_equal(
+#     input %>% mutate(new_col = data.frame(x = 1:3)) %>% select(new_col) %>% collect(),
+#     data.frame(x = 1:3)
+#   )
+#
+#   # mutate() on grouped data not supported in Arrow; this will be pulling data into R
+#   # expect_dplyr_equal(
+#   #   input %>%
+#   #     group_by(x) %>%
+#   #     mutate(new_col = x) %>%
+#   #     ungroup() %>%
+#   #     select(new_col) %>%
+#   #     collect(),
+#   #   data.frame(x = 1:3)
+#   # )
+#
+#   # ROWWISE IS NOT IMPLEMENTED
+#   # expect_dplyr_equal(
+#   #   input %>%
+#   #     rowwise(x) %>%
+#   #     mutate(new_col = x) %>%
+#   #     ungroup() %>%
+#   #     select(new_col) %>%
+#   #     collect(),
+#   #   data.frame(x = 1:3)
+#   # )

Review comment:
       Instead of trying to adapt these two expectations to succeed in the arrow package, I would recommend writing them in a way that current fails, and adding a `skip()` before them.
   
   The first expectation in this test succeeds so you can keep that as is.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pachamaltese commented on pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
pachamaltese commented on pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#issuecomment-827009715


   a few questions
   
   should we skip tests on grouped data (not supported in Arrow) or pass it to R with warning?
   https://github.com/pachamaltese/arrow/blob/arrow11755v2/r/tests/testthat/test-dplyr-mutate.R#L557
   
   I'm 0 sure about tests with 0 rows, how would you sketch that?
   https://github.com/pachamaltese/arrow/blob/arrow11755v2/r/tests/testthat/test-dplyr-mutate.R#L522
   
   the same about not overwriting variables, does this look ok? 
   https://github.com/pachamaltese/arrow/blob/arrow11755v2/r/tests/testthat/test-dplyr-mutate.R#L514


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r626893022



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -15,6 +15,25 @@
 # specific language governing permissions and limitations
 # under the License.
 
+# developer note:
+# when adapting tests from the dplyr package, note that expect_dplyr_equal()
+# works differently from expect_equal(). the first argument is a dplyr
+# expression containing the placeholder `input`, and the second argument
+# is the R data frame to pass in as that input. So for example, adapt a test
+# from dplyr as follows:
+#
+# dplyr:
+# expect_equal(
+#   mutate(data.frame(x = 1, y = 1), z = 1, x = NULL, y = NULL),
+#   data.frame(z = 1)
+# )
+#
+# arrow:
+# expect_dplyr_equal(
+#   input %>% mutate(z = 1, x = NULL, y = NULL) %>% collect(),
+#   data.frame(x = 1, y = 1)
+# )

Review comment:
       Good idea; that would probably be a better place to put this "developer note" comment




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r619285308



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +406,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+

Review comment:
       Add a test equivalent to the one you deleted at L55-62:
   ```suggestion
     expect_dplyr_equal(input %>% mutate() %>% collect(), tbl)
     expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pachadotdev commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
pachadotdev commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r642601213



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -32,59 +51,23 @@ test_that("mutate() is lazy", {
   )
 })
 
-test_that("basic mutate", {
-  expect_dplyr_equal(
-    input %>%
-      select(int, chr) %>%
-      filter(int > 5) %>%
-      mutate(int = int + 6L) %>%
-      collect(),
-    tbl
-  )
-})
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
 
-test_that("mutate() with NULL inputs", {
-  expect_dplyr_equal(
-    input %>%
-      mutate(int = NULL) %>%
-      collect(),
-    tbl
-  )
+  expect_dplyr_equal(input %>% mutate() %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)

Review comment:
       thanks for letting me know this!
   I shall edit accordingly




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r614237603



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +415,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+# THIS WON'T WORK
+# test_that("rownames preserved", {
+#   df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+#   expect_dplyr_equal(input %>% mutate(y = 2) %>% collect(), df)
+# })
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)

Review comment:
       This tibble is assigned but not used




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r614239998



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +415,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+# THIS WON'T WORK
+# test_that("rownames preserved", {
+#   df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+#   expect_dplyr_equal(input %>% mutate(y = 2) %>% collect(), df)
+# })
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {

Review comment:
       In these examples, `expect_dplyr_equal()` is not used correctly.
   
   The second argument should be the _input_, not the expected output.
   
   The `expect_dplyr_equal()` function does not take expected output as one of its arguments. Instead, it just checks for consistent results when the dplyr functions are applied to an Arrow `Table`, an Arrow `RecordBatch`, and an R `data.frame`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r619326033



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +406,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = 2) %>% collect(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_dplyr_equal(
+    input %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df
+  )
+})
+
+# similar to # similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L31-L35
+# SEE LINE 175 IN THIS SCRIPT!
+# THIS WON'T WORK
+# https://issues.apache.org/jira/browse/ARROW-11705
+#  Error in UseMethod("mutate") :
+# no applicable method for 'mutate' applied to an object of class "c('double', 'numeric')"
+# test_that("length-1 vectors are recycled (#152)", {
+#   df <- tibble(x = 1:4)
+#
+#   expect_dplyr_equal(
+#     collect(mutate(input, y = 1))$y,
+#     rep(1, 4)
+#   )
+# })
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L37-L54
+test_that("can remove variables with NULL (dplyr #462)", {
+  df <- tibble(x = 1:3, y = 1:3)
+  gf <- group_by(df, x)
+
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), df[1])
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), gf[1])
+
+  # even if it doesn't exist
+  expect_dplyr_equal(input %>% mutate(z = NULL) %>% collect(), df)
+  # or was just created
+  expect_dplyr_equal(input %>% mutate(z = 1, z = NULL) %>% collect(), df)
+
+  # regression test for https://github.com/tidyverse/dplyr/issues/4974
+  expect_dplyr_equal(
+    data.frame(x = 1, y = 1) %>% mutate(z = 1, x = NULL, y = NULL) %>% collect(),
+    data.frame(z = 1)
+  )

Review comment:
       Looks like you missed a few of these; they are still using `expect_dplyr_equal()` the wrong way.
   ```suggestion
     expect_dplyr_equal(
       input %>% mutate(z = 1, x = NULL, y = NULL) %>% collect(),
       data.frame(x = 1, y = 1)
     )
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r619432125



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +406,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = 2) %>% collect(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_dplyr_equal(
+    input %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df
+  )
+})
+
+# similar to # similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L31-L35
+# SEE LINE 175 IN THIS SCRIPT!
+# THIS WON'T WORK
+# https://issues.apache.org/jira/browse/ARROW-11705
+#  Error in UseMethod("mutate") :
+# no applicable method for 'mutate' applied to an object of class "c('double', 'numeric')"
+# test_that("length-1 vectors are recycled (#152)", {
+#   df <- tibble(x = 1:4)
+#
+#   expect_dplyr_equal(
+#     collect(mutate(input, y = 1))$y,
+#     rep(1, 4)
+#   )
+# })
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L37-L54
+test_that("can remove variables with NULL (dplyr #462)", {
+  df <- tibble(x = 1:3, y = 1:3)
+  gf <- group_by(df, x)
+
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), df[1])
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), gf[1])
+
+  # even if it doesn't exist
+  expect_dplyr_equal(input %>% mutate(z = NULL) %>% collect(), df)
+  # or was just created
+  expect_dplyr_equal(input %>% mutate(z = 1, z = NULL) %>% collect(), df)
+
+  # regression test for https://github.com/tidyverse/dplyr/issues/4974
+  expect_dplyr_equal(
+    data.frame(x = 1, y = 1) %>% mutate(z = 1, x = NULL, y = NULL) %>% collect(),
+    data.frame(z = 1)
+  )

Review comment:
       The ones below here also need to be fixed to use `expect_dplyr_equal()` properly




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r620576267



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +412,154 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = c(3, 4)) %>% collect() %>% rownames(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_equal(
+    df %>% Table$create() %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df %>% Table$create() %>% mutate(x2 = x + 0, x3 = x2 + 1) %>% collect()
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L37-L54
+test_that("can remove variables with NULL (dplyr #462)", {
+  df <- tibble(x = 1:3, y = 1:3)
+  gf <- group_by(df, x)
+
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), df)
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), gf)
+
+  # even if it doesn't exist
+  expect_dplyr_equal(input %>% mutate(z = NULL) %>% collect(), df)
+  # or was just created
+  expect_dplyr_equal(input %>% mutate(z = rep(1, nrow(input)), z = NULL) %>% collect(), df)
+
+  # regression test for https://github.com/tidyverse/dplyr/issues/4974
+  expect_dplyr_equal(
+    input %>% mutate(z = 1, x = NULL, y = NULL) %>% collect(),
+    data.frame(x = 1, y = 1)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L71-L75
+# test_that("assignments don't overwrite variables (dplyr #315)", {
+#   expect_dplyr_equal(
+#     tibble(x = 1, y = 2) %>% mutate(z = {x <- 10; x}) %>% collect(),
+#     tibble(x = 1, y = 2, z = 10)
+#   )
+# })
+# NOT SURE ABOUT THIS!
+test_that("assignments don't overwrite variables (dplyr #315)", {
+  expect_dplyr_equal(
+    input %>% mutate(z = {x <- 10; x}) %>% collect(),
+    tibble(x = 1, y = 2, z = 10)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L77-L81
+# NOT SURE ABOUT THIS!
+# test_that("can mutate a data frame with zero columns and `NULL` column names", {
+#   df <- vctrs::new_data_frame(n = 2L)
+#   colnames(df) <- NULL
+#   expect_dplyr_equal(
+#     input %>% mutate(x = 1) %>% collect(),
+#     df
+#   )
+# })
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L95-L100
+# glue is a dependency of tidyselect
+test_that("glue() is supported", {
+  expect_dplyr_equal(
+    input %>% mutate(y = glue::glue("")) %>% collect(),
+    tibble(x = 1, y = glue::glue(""))
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L102-L106
+test_that("mutate disambiguates NA and NaN (#1448)", {
+  expect_dplyr_equal(
+    input %>% mutate(y = x * 1) %>% select(y) %>% collect(),
+    tibble(x = c(1, NA, NaN))
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L102-L106
+# this is somewhat "contained" in the previous test
+# test_that("mutate handles data frame columns", {
+#   expect_dplyr_equal(
+#     input %>% mutate(new_col = data.frame(x = 1:3)) %>% select(new_col) %>% collect(),
+#     data.frame(x = 1:3)
+#   )
+#
+#   # mutate() on grouped data not supported in Arrow; this will be pulling data into R
+#   # expect_dplyr_equal(
+#   #   input %>%
+#   #     group_by(x) %>%
+#   #     mutate(new_col = x) %>%
+#   #     ungroup() %>%
+#   #     select(new_col) %>%
+#   #     collect(),
+#   #   data.frame(x = 1:3)
+#   # )
+#
+#   # ROWWISE IS NOT IMPLEMENTED
+#   # expect_dplyr_equal(
+#   #   input %>%
+#   #     rowwise(x) %>%
+#   #     mutate(new_col = x) %>%
+#   #     ungroup() %>%
+#   #     select(new_col) %>%
+#   #     collect(),
+#   #   data.frame(x = 1:3)
+#   # )

Review comment:
       Instead of trying to adapt these two expectations to succeed in the arrow package, I would recommend writing them in a way that current fails, and adding a `skip()` before them. The first expectation succeeds so you can keep that as is.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r614340728



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +415,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+# THIS WON'T WORK
+# test_that("rownames preserved", {
+#   df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+#   expect_dplyr_equal(input %>% mutate(y = 2) %>% collect(), df)
+# })
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    tibble(x = 1, y = 2, z = 3)

Review comment:
       ```suggestion
       df
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r624245151



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -15,6 +15,21 @@
 # specific language governing permissions and limitations
 # under the License.
 
+# because expect_dplyr_equal involves no assertion (wrong initial thought)
+# we need to "translate" dplyr tests as:

Review comment:
       ```suggestion
   # developer note:
   # when adapting tests from the dplyr package, note that expect_dplyr_equal()
   # works differently from expect_equal(). the first argument is a dplyr
   # expression containing the placeholder `input`, and the second argument
   # is the R data frame to pass in as that input. So for example, adapt a test
   # from dplyr as follows:
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r624227933



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -185,6 +164,238 @@ test_that("mutate with single value for recycling", {
   )
 })
 
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames are preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = c(3, 4)) %>% collect() %>% rownames(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations are applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_equal(
+    df %>% Table$create() %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df %>% Table$create() %>% mutate(x2 = x + 0, x3 = x2 + 1) %>% collect()
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L37-L54
+test_that("can remove variables with NULL (dplyr #462)", {
+  df <- tibble(x = 1:3, y = 1:3)
+  gf <- group_by(df, x)
+
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), df)
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), gf)
+
+  # even if it doesn't exist
+  expect_dplyr_equal(input %>% mutate(z = NULL) %>% collect(), df)
+  # or was just created
+  expect_dplyr_equal(input %>% mutate(z = rep(1, nrow(input)), z = NULL) %>% collect(), df)
+
+  # regression test for https://github.com/tidyverse/dplyr/issues/4974
+  expect_dplyr_equal(
+    input %>% mutate(z = 1, x = NULL, y = NULL) %>% collect(),
+    data.frame(x = 1, y = 1)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L71-L75
+test_that("assignments don't overwrite variables (dplyr #315)", {
+  expect_dplyr_equal(
+    input %>% mutate(z = {x <- 10; x}) %>% collect(),
+    tibble(x = 1, y = 2)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L77-L81
+test_that("can mutate a data frame with zero columns and `NULL` column names", {
+  df <- vctrs::new_data_frame(n = 2L)
+  colnames(df) <- NULL
+  expect_dplyr_equal(
+    input %>% mutate(x = c(1,2)) %>% collect(),
+    df
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L102-L106
+test_that("mutate disambiguates NA and NaN (#1448)", {
+  expect_dplyr_equal(
+    input %>% mutate(y = x * 1) %>% select(y) %>% collect(),
+    tibble(x = c(1, NA, NaN))
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L117-L127
+test_that("mutate handles data frame columns", {
+  expect_dplyr_equal(
+    input %>% mutate(new_col = data.frame(x = 1:3)) %>% select(new_col) %>% collect(),
+    data.frame(x = 1:3)
+  )
+
+  # mutate() on grouped data not supported in Arrow; this will be pulling data back into R
+  expect_warning(expect_dplyr_equal(
+    input %>%
+      group_by(x) %>%
+      mutate(new_col = x) %>%
+      ungroup() %>%
+      select(new_col) %>%
+      collect(),
+    data.frame(x = 1:3)
+  ), "not supported in Arrow")
+
+  skip("rowwise() is not (yet) implemented in Arrow")
+  expect_dplyr_equal(
+    input %>%
+      rowwise(x) %>%
+      mutate(new_col = x) %>%
+      ungroup() %>%
+      select(new_col) %>%
+      collect(),
+    data.frame(x = 1:3)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L129-L142
+test_that("unnamed data frames are automatically unspliced", {
+  # this works in arrow
+  expect_dplyr_equal(
+    input %>% mutate(tibble(b = 2)) %>% collect(),
+    tibble(a = 1)
+  )
+
+  # not this
+  expect_error(expect_dplyr_equal(
+    input %>% mutate(tibble(b = 2), tibble(b = 3)) %>% collect(),
+    tibble(a = 1)
+  ))
+
+  # nor this
+  expect_error(expect_dplyr_equal(
+    input %>% mutate(tibble(b = 2), c = b) %>% collect(),
+    tibble(a = 1)
+  ))
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L144-L148
+test_that("named data frames are packed", {
+  expect_warning(expect_dplyr_equal(
+    input %>% mutate(y = tibble(x = a)) %>% collect(),
+    tibble(a = 1)
+  ), "not supported in Arrow")
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L150-L158
+test_that("ts class columns in arrow", {

Review comment:
       The tests that this is based on ("ts class columns in arrow") are not applicable to arrow, so I think this can be removed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r620564586



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +412,154 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = c(3, 4)) %>% collect() %>% rownames(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_equal(
+    df %>% Table$create() %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df %>% Table$create() %>% mutate(x2 = x + 0, x3 = x2 + 1) %>% collect()
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L37-L54
+test_that("can remove variables with NULL (dplyr #462)", {
+  df <- tibble(x = 1:3, y = 1:3)
+  gf <- group_by(df, x)
+
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), df)
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), gf)
+
+  # even if it doesn't exist
+  expect_dplyr_equal(input %>% mutate(z = NULL) %>% collect(), df)
+  # or was just created
+  expect_dplyr_equal(input %>% mutate(z = rep(1, nrow(input)), z = NULL) %>% collect(), df)
+
+  # regression test for https://github.com/tidyverse/dplyr/issues/4974
+  expect_dplyr_equal(
+    input %>% mutate(z = 1, x = NULL, y = NULL) %>% collect(),
+    data.frame(x = 1, y = 1)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L71-L75
+# test_that("assignments don't overwrite variables (dplyr #315)", {
+#   expect_dplyr_equal(
+#     tibble(x = 1, y = 2) %>% mutate(z = {x <- 10; x}) %>% collect(),
+#     tibble(x = 1, y = 2, z = 10)
+#   )
+# })
+# NOT SURE ABOUT THIS!
+test_that("assignments don't overwrite variables (dplyr #315)", {
+  expect_dplyr_equal(
+    input %>% mutate(z = {x <- 10; x}) %>% collect(),
+    tibble(x = 1, y = 2, z = 10)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L77-L81
+# NOT SURE ABOUT THIS!
+# test_that("can mutate a data frame with zero columns and `NULL` column names", {
+#   df <- vctrs::new_data_frame(n = 2L)
+#   colnames(df) <- NULL
+#   expect_dplyr_equal(
+#     input %>% mutate(x = 1) %>% collect(),

Review comment:
       I think this is a good test to include. But please change `mutate(x = 1)` to `mutate(x = c(1, 2))` or the scalar recycling problem will cause it to fail.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r614349300



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +415,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+# THIS WON'T WORK
+# test_that("rownames preserved", {
+#   df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+#   expect_dplyr_equal(input %>% mutate(y = 2) %>% collect(), df)
+# })
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    tibble(x = 1, y = 2, z = 3)

Review comment:
       Please change all the other tests to work in this way (like we discussed). Thanks!

##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +415,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+# THIS WON'T WORK
+# test_that("rownames preserved", {
+#   df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+#   expect_dplyr_equal(input %>% mutate(y = 2) %>% collect(), df)
+# })
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    tibble(x = 1, y = 2, z = 3)

Review comment:
       Please change all the other `expect_dplyr_equal()` tests in this PR to work in this way (like we discussed). Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#issuecomment-818159524


   https://issues.apache.org/jira/browse/ARROW-11755


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r619416026



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +406,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = 2) %>% collect(), df)

Review comment:
       When you add a skipped test like this, take care to ensure that it only fails/errors for the stated reason. This one would also error because of the scalar recycling issue. Fix that like this:
   ```suggestion
     expect_dplyr_equal(input %>% mutate(y = c(3, 4)) %>% collect(), df)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r619294602



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +406,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = 2) %>% collect(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_dplyr_equal(
+    input %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df
+  )
+})
+
+# similar to # similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L31-L35
+# SEE LINE 175 IN THIS SCRIPT!
+# THIS WON'T WORK
+# https://issues.apache.org/jira/browse/ARROW-11705
+#  Error in UseMethod("mutate") :
+# no applicable method for 'mutate' applied to an object of class "c('double', 'numeric')"
+# test_that("length-1 vectors are recycled (#152)", {
+#   df <- tibble(x = 1:4)
+#
+#   expect_dplyr_equal(
+#     collect(mutate(input, y = 1))$y,
+#     rep(1, 4)
+#   )
+# })
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L37-L54

Review comment:
       Looks good; please remove the existing test at line 46 (`"mutate() with NULL inputs"`) because this one is better and duplicates it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r614183416



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +415,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here

Review comment:
       Please delete the existing test at L55-62 and add another line in this test to check empty `%>% mutate()` on `tbl`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r614183416



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +415,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here

Review comment:
       Please delete the existing test at L55-62 and another line in this test to check empty `%>% mutate()` on `tbl`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r626890421



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -32,59 +47,23 @@ test_that("mutate() is lazy", {
   )
 })
 
-test_that("basic mutate", {
-  expect_dplyr_equal(
-    input %>%
-      select(int, chr) %>%
-      filter(int > 5) %>%
-      mutate(int = int + 6L) %>%
-      collect(),
-    tbl
-  )
-})
-
-test_that("mutate() with NULL inputs", {
-  expect_dplyr_equal(
-    input %>%
-      mutate(int = NULL) %>%
-      collect(),
-    tbl
-  )
-})
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10

Review comment:
       @pachamaltese It looks like you fixed all the other URLs except this particular one




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r620557471



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +412,154 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = c(3, 4)) %>% collect() %>% rownames(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_equal(
+    df %>% Table$create() %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df %>% Table$create() %>% mutate(x2 = x + 0, x3 = x2 + 1) %>% collect()
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L37-L54
+test_that("can remove variables with NULL (dplyr #462)", {
+  df <- tibble(x = 1:3, y = 1:3)
+  gf <- group_by(df, x)
+
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), df)
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), gf)
+
+  # even if it doesn't exist
+  expect_dplyr_equal(input %>% mutate(z = NULL) %>% collect(), df)
+  # or was just created
+  expect_dplyr_equal(input %>% mutate(z = rep(1, nrow(input)), z = NULL) %>% collect(), df)
+
+  # regression test for https://github.com/tidyverse/dplyr/issues/4974
+  expect_dplyr_equal(
+    input %>% mutate(z = 1, x = NULL, y = NULL) %>% collect(),
+    data.frame(x = 1, y = 1)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L71-L75
+# test_that("assignments don't overwrite variables (dplyr #315)", {
+#   expect_dplyr_equal(
+#     tibble(x = 1, y = 2) %>% mutate(z = {x <- 10; x}) %>% collect(),
+#     tibble(x = 1, y = 2, z = 10)
+#   )
+# })
+# NOT SURE ABOUT THIS!
+test_that("assignments don't overwrite variables (dplyr #315)", {
+  expect_dplyr_equal(
+    input %>% mutate(z = {x <- 10; x}) %>% collect(),
+    tibble(x = 1, y = 2, z = 10)

Review comment:
       This looks good, but use the input as the second argument.
   ```suggestion
       tibble(x = 1, y = 2)
   ```
   Please also delete the comments above




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pachamaltese commented on pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
pachamaltese commented on pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#issuecomment-830421130


   nice! so this took shape finally


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pachamaltese commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
pachamaltese commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r620710383



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +412,154 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = c(3, 4)) %>% collect() %>% rownames(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_equal(
+    df %>% Table$create() %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df %>% Table$create() %>% mutate(x2 = x + 0, x3 = x2 + 1) %>% collect()
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L37-L54
+test_that("can remove variables with NULL (dplyr #462)", {
+  df <- tibble(x = 1:3, y = 1:3)
+  gf <- group_by(df, x)
+
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), df)
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), gf)
+
+  # even if it doesn't exist
+  expect_dplyr_equal(input %>% mutate(z = NULL) %>% collect(), df)
+  # or was just created
+  expect_dplyr_equal(input %>% mutate(z = rep(1, nrow(input)), z = NULL) %>% collect(), df)
+
+  # regression test for https://github.com/tidyverse/dplyr/issues/4974
+  expect_dplyr_equal(
+    input %>% mutate(z = 1, x = NULL, y = NULL) %>% collect(),
+    data.frame(x = 1, y = 1)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L71-L75
+# test_that("assignments don't overwrite variables (dplyr #315)", {
+#   expect_dplyr_equal(
+#     tibble(x = 1, y = 2) %>% mutate(z = {x <- 10; x}) %>% collect(),
+#     tibble(x = 1, y = 2, z = 10)
+#   )
+# })
+# NOT SURE ABOUT THIS!
+test_that("assignments don't overwrite variables (dplyr #315)", {
+  expect_dplyr_equal(
+    input %>% mutate(z = {x <- 10; x}) %>% collect(),
+    tibble(x = 1, y = 2, z = 10)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L77-L81
+# NOT SURE ABOUT THIS!
+# test_that("can mutate a data frame with zero columns and `NULL` column names", {
+#   df <- vctrs::new_data_frame(n = 2L)
+#   colnames(df) <- NULL
+#   expect_dplyr_equal(
+#     input %>% mutate(x = 1) %>% collect(),

Review comment:
       toda la razón !!!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r626901825



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -32,59 +51,23 @@ test_that("mutate() is lazy", {
   )
 })
 
-test_that("basic mutate", {
-  expect_dplyr_equal(
-    input %>%
-      select(int, chr) %>%
-      filter(int > 5) %>%
-      mutate(int = int + 6L) %>%
-      collect(),
-    tbl
-  )
-})
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
 
-test_that("mutate() with NULL inputs", {
-  expect_dplyr_equal(
-    input %>%
-      mutate(int = NULL) %>%
-      collect(),
-    tbl
-  )
+  expect_dplyr_equal(input %>% mutate() %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)

Review comment:
       The intent is to check that `mutate()` returns its input regardless of whether its input is grouped. @pachamaltese could you please add a concise comment saying that?
   
   Also: because of ARROW-11769, the grouping added to `gtbl` by `group_by()` is actually lost when it's converted to a `Table` or `RecordBatch`. The tests pass regardless of this because `expect_dplyr_equal()` calls `expect_equivalent()` which ignores grouping. But we should fix it regardless. To do this, remove the `gtbl` assignment above and rewrite the two tests involving `gtbl` like this:
   ```r
      expect_dplyr_equal(input %>% group_by(dbl2) %>% mutate() %>% collect())
      expect_dplyr_equal(input %>% group_by(dbl2) %>% mutate(!!!list()) %>% collect())
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r619416026



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +406,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = 2) %>% collect(), df)

Review comment:
       When you add a skipped test like this, take care to ensure that it only fails/errors for the stated reason. This one would also error because of the scalar recycling issue. This fixes that.
   
   Also, this test actually passes, but for the wrong reason: `expect_dplyr_equal()` calls `testthat::expect_equivalent()` which ignores differences in rownames. The simplest way to resolve this is to explicitly check the rownames:
   ```suggestion
     expect_dplyr_equal(input %>% mutate(y = c(3, 4)) %>% collect() %>% rownames(), df)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pachamaltese closed pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
pachamaltese closed pull request #9999:
URL: https://github.com/apache/arrow/pull/9999


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane closed pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
jonkeane closed pull request #9999:
URL: https://github.com/apache/arrow/pull/9999


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r624247956



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -32,59 +47,23 @@ test_that("mutate() is lazy", {
   )
 })
 
-test_that("basic mutate", {
-  expect_dplyr_equal(
-    input %>%
-      select(int, chr) %>%
-      filter(int > 5) %>%
-      mutate(int = int + 6L) %>%
-      collect(),
-    tbl
-  )
-})
-
-test_that("mutate() with NULL inputs", {
-  expect_dplyr_equal(
-    input %>%
-      mutate(int = NULL) %>%
-      collect(),
-    tbl
-  )
-})
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10

Review comment:
       These "similar to" comments are helpful, and I think it's worth keeping them in, but:
   - At least one of them points to the wrong line numbers, so please check them and fix
   - They will no longer point to the correct line numbers if `test-mutate.r` in dplyr changes, so please change the URLs to use a commit hash in place of `master`. For example, this URL could change to:
   `https://github.com/tidyverse/dplyr/blob/04454209ea069939d3335c43846c85c725547a89/tests/testthat/test-mutate.r#L1-L10`. That URL will always point to the correct test even if `test-mutate.r` changes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r624226610



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -185,6 +164,238 @@ test_that("mutate with single value for recycling", {
   )
 })
 
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames are preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = c(3, 4)) %>% collect() %>% rownames(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations are applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_equal(
+    df %>% Table$create() %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df %>% Table$create() %>% mutate(x2 = x + 0, x3 = x2 + 1) %>% collect()
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L37-L54
+test_that("can remove variables with NULL (dplyr #462)", {
+  df <- tibble(x = 1:3, y = 1:3)
+  gf <- group_by(df, x)
+
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), df)
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), gf)
+
+  # even if it doesn't exist
+  expect_dplyr_equal(input %>% mutate(z = NULL) %>% collect(), df)
+  # or was just created
+  expect_dplyr_equal(input %>% mutate(z = rep(1, nrow(input)), z = NULL) %>% collect(), df)
+
+  # regression test for https://github.com/tidyverse/dplyr/issues/4974
+  expect_dplyr_equal(
+    input %>% mutate(z = 1, x = NULL, y = NULL) %>% collect(),
+    data.frame(x = 1, y = 1)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L71-L75
+test_that("assignments don't overwrite variables (dplyr #315)", {
+  expect_dplyr_equal(
+    input %>% mutate(z = {x <- 10; x}) %>% collect(),
+    tibble(x = 1, y = 2)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L77-L81
+test_that("can mutate a data frame with zero columns and `NULL` column names", {
+  df <- vctrs::new_data_frame(n = 2L)
+  colnames(df) <- NULL
+  expect_dplyr_equal(
+    input %>% mutate(x = c(1,2)) %>% collect(),
+    df
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L102-L106
+test_that("mutate disambiguates NA and NaN (#1448)", {
+  expect_dplyr_equal(
+    input %>% mutate(y = x * 1) %>% select(y) %>% collect(),
+    tibble(x = c(1, NA, NaN))
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L117-L127
+test_that("mutate handles data frame columns", {
+  expect_dplyr_equal(
+    input %>% mutate(new_col = data.frame(x = 1:3)) %>% select(new_col) %>% collect(),
+    data.frame(x = 1:3)
+  )
+
+  # mutate() on grouped data not supported in Arrow; this will be pulling data back into R
+  expect_warning(expect_dplyr_equal(
+    input %>%
+      group_by(x) %>%
+      mutate(new_col = x) %>%
+      ungroup() %>%
+      select(new_col) %>%
+      collect(),
+    data.frame(x = 1:3)
+  ), "not supported in Arrow")
+
+  skip("rowwise() is not (yet) implemented in Arrow")
+  expect_dplyr_equal(
+    input %>%
+      rowwise(x) %>%
+      mutate(new_col = x) %>%
+      ungroup() %>%
+      select(new_col) %>%
+      collect(),
+    data.frame(x = 1:3)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L129-L142
+test_that("unnamed data frames are automatically unspliced", {
+  # this works in arrow
+  expect_dplyr_equal(
+    input %>% mutate(tibble(b = 2)) %>% collect(),
+    tibble(a = 1)
+  )
+
+  # not this
+  expect_error(expect_dplyr_equal(
+    input %>% mutate(tibble(b = 2), tibble(b = 3)) %>% collect(),
+    tibble(a = 1)
+  ))
+
+  # nor this
+  expect_error(expect_dplyr_equal(
+    input %>% mutate(tibble(b = 2), c = b) %>% collect(),
+    tibble(a = 1)
+  ))
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L144-L148
+test_that("named data frames are packed", {

Review comment:
       I think this test ("named data frames are packed") should be removed; it's not testing anything about arrow besides that the data is pulled into an R data frame and the dplyr operation is performed on that




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r614190740



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +415,160 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+# THIS WON'T WORK

Review comment:
       Could you please uncomment this test and add `skip("Row names are not preserved")` as the first line?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r620571842



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +412,154 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = c(3, 4)) %>% collect() %>% rownames(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_equal(
+    df %>% Table$create() %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df %>% Table$create() %>% mutate(x2 = x + 0, x3 = x2 + 1) %>% collect()
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L37-L54
+test_that("can remove variables with NULL (dplyr #462)", {
+  df <- tibble(x = 1:3, y = 1:3)
+  gf <- group_by(df, x)
+
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), df)
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), gf)
+
+  # even if it doesn't exist
+  expect_dplyr_equal(input %>% mutate(z = NULL) %>% collect(), df)
+  # or was just created
+  expect_dplyr_equal(input %>% mutate(z = rep(1, nrow(input)), z = NULL) %>% collect(), df)
+
+  # regression test for https://github.com/tidyverse/dplyr/issues/4974
+  expect_dplyr_equal(
+    input %>% mutate(z = 1, x = NULL, y = NULL) %>% collect(),
+    data.frame(x = 1, y = 1)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L71-L75
+# test_that("assignments don't overwrite variables (dplyr #315)", {
+#   expect_dplyr_equal(
+#     tibble(x = 1, y = 2) %>% mutate(z = {x <- 10; x}) %>% collect(),
+#     tibble(x = 1, y = 2, z = 10)
+#   )
+# })
+# NOT SURE ABOUT THIS!
+test_that("assignments don't overwrite variables (dplyr #315)", {
+  expect_dplyr_equal(
+    input %>% mutate(z = {x <- 10; x}) %>% collect(),
+    tibble(x = 1, y = 2, z = 10)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L77-L81
+# NOT SURE ABOUT THIS!
+# test_that("can mutate a data frame with zero columns and `NULL` column names", {
+#   df <- vctrs::new_data_frame(n = 2L)
+#   colnames(df) <- NULL
+#   expect_dplyr_equal(
+#     input %>% mutate(x = 1) %>% collect(),
+#     df
+#   )
+# })
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L95-L100
+# glue is a dependency of tidyselect
+test_that("glue() is supported", {
+  expect_dplyr_equal(
+    input %>% mutate(y = glue::glue("")) %>% collect(),
+    tibble(x = 1, y = glue::glue(""))
+  )
+})

Review comment:
       I think you should remove this test of `glue()` because it is not really testing anything meaningful about the arrow package.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r620578035



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -415,3 +412,154 @@ test_that("mutate and write_dataset", {
       summarize(mean = mean(integer))
   )
 })
+
+# PACHA ADDITIONS ----
+# READ THIS CAREFULLY PLEASE, IT'S MY 1ST DAY WRITING THIS KIND OF SENSITIVE TESTS
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+# the rest of that test belongs in L55-62 here
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
+
+  expect_dplyr_equal(input %>% mutate() %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L12-L6
+test_that("rownames preserved", {
+  skip("Row names are not preserved")
+  df <- data.frame(x = c(1, 2), row.names = c("a", "b"))
+  expect_dplyr_equal(input %>% mutate(y = c(3, 4)) %>% collect() %>% rownames(), df)
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L18-L29
+test_that("mutations applied progressively", {
+  df <- tibble(x = 1)
+
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(x = x + 1, x = x + 1) %>% collect(),
+    df
+  )
+  expect_dplyr_equal(
+    input %>% mutate(y = x + 1, z = y + 1) %>% collect(),
+    df
+  )
+
+  df <- data.frame(x = 1, y = 2)
+  expect_equal(
+    df %>% Table$create() %>% mutate(x2 = x, x3 = x2 + 1) %>% collect(),
+    df %>% Table$create() %>% mutate(x2 = x + 0, x3 = x2 + 1) %>% collect()
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L37-L54
+test_that("can remove variables with NULL (dplyr #462)", {
+  df <- tibble(x = 1:3, y = 1:3)
+  gf <- group_by(df, x)
+
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), df)
+  expect_dplyr_equal(input %>% mutate(y = NULL) %>% collect(), gf)
+
+  # even if it doesn't exist
+  expect_dplyr_equal(input %>% mutate(z = NULL) %>% collect(), df)
+  # or was just created
+  expect_dplyr_equal(input %>% mutate(z = rep(1, nrow(input)), z = NULL) %>% collect(), df)
+
+  # regression test for https://github.com/tidyverse/dplyr/issues/4974
+  expect_dplyr_equal(
+    input %>% mutate(z = 1, x = NULL, y = NULL) %>% collect(),
+    data.frame(x = 1, y = 1)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L71-L75
+# test_that("assignments don't overwrite variables (dplyr #315)", {
+#   expect_dplyr_equal(
+#     tibble(x = 1, y = 2) %>% mutate(z = {x <- 10; x}) %>% collect(),
+#     tibble(x = 1, y = 2, z = 10)
+#   )
+# })
+# NOT SURE ABOUT THIS!
+test_that("assignments don't overwrite variables (dplyr #315)", {
+  expect_dplyr_equal(
+    input %>% mutate(z = {x <- 10; x}) %>% collect(),
+    tibble(x = 1, y = 2, z = 10)
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L77-L81
+# NOT SURE ABOUT THIS!
+# test_that("can mutate a data frame with zero columns and `NULL` column names", {
+#   df <- vctrs::new_data_frame(n = 2L)
+#   colnames(df) <- NULL
+#   expect_dplyr_equal(
+#     input %>% mutate(x = 1) %>% collect(),
+#     df
+#   )
+# })
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L95-L100
+# glue is a dependency of tidyselect
+test_that("glue() is supported", {
+  expect_dplyr_equal(
+    input %>% mutate(y = glue::glue("")) %>% collect(),
+    tibble(x = 1, y = glue::glue(""))
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L102-L106
+test_that("mutate disambiguates NA and NaN (#1448)", {
+  expect_dplyr_equal(
+    input %>% mutate(y = x * 1) %>% select(y) %>% collect(),
+    tibble(x = c(1, NA, NaN))
+  )
+})
+
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L102-L106
+# this is somewhat "contained" in the previous test
+# test_that("mutate handles data frame columns", {
+#   expect_dplyr_equal(
+#     input %>% mutate(new_col = data.frame(x = 1:3)) %>% select(new_col) %>% collect(),
+#     data.frame(x = 1:3)
+#   )
+#
+#   # mutate() on grouped data not supported in Arrow; this will be pulling data into R
+#   # expect_dplyr_equal(
+#   #   input %>%
+#   #     group_by(x) %>%
+#   #     mutate(new_col = x) %>%
+#   #     ungroup() %>%
+#   #     select(new_col) %>%
+#   #     collect(),
+#   #   data.frame(x = 1:3)
+#   # )
+#
+#   # ROWWISE IS NOT IMPLEMENTED
+#   # expect_dplyr_equal(
+#   #   input %>%
+#   #     rowwise(x) %>%
+#   #     mutate(new_col = x) %>%
+#   #     ungroup() %>%
+#   #     select(new_col) %>%
+#   #     collect(),
+#   #   data.frame(x = 1:3)
+#   # )
+# })
+
+# QUESTIONS SO FAR ----
+
+# https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L56-L59
+# https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L108-L115
+# does it make sense to create expect_dplyr_named() to mimic the behaviour from dply tests?
+
+# https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L61-L69
+# https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L83-L91
+# https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L129-L142
+# does it make sense to create expect_dplyr_identical() to mimic the behaviour from dplyr tests?

Review comment:
       I don't think we need a function `expect_dplyr_identical()`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r626080217



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -32,59 +47,23 @@ test_that("mutate() is lazy", {
   )
 })
 
-test_that("basic mutate", {
-  expect_dplyr_equal(
-    input %>%
-      select(int, chr) %>%
-      filter(int > 5) %>%
-      mutate(int = int + 6L) %>%
-      collect(),
-    tbl
-  )
-})
-
-test_that("mutate() with NULL inputs", {
-  expect_dplyr_equal(
-    input %>%
-      mutate(int = NULL) %>%
-      collect(),
-    tbl
-  )
-})
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10

Review comment:
       I don't think you addressed the second point Ian made here. You can get a permanent link like Ian mentions and use the current commit hash in the URL. One easy way to do this is to use "copy permalink" on the three dots menu when you click next to the line number in the github UI




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9999:
URL: https://github.com/apache/arrow/pull/9999#discussion_r626901825



##########
File path: r/tests/testthat/test-dplyr-mutate.R
##########
@@ -32,59 +51,23 @@ test_that("mutate() is lazy", {
   )
 })
 
-test_that("basic mutate", {
-  expect_dplyr_equal(
-    input %>%
-      select(int, chr) %>%
-      filter(int > 5) %>%
-      mutate(int = int + 6L) %>%
-      collect(),
-    tbl
-  )
-})
+# similar to https://github.com/tidyverse/dplyr/blob/master/tests/testthat/test-mutate.r#L1-L10
+test_that("empty mutate returns input", {
+  # dbl2 = 5, so I'm grouping by a constant
+  gtbl <- group_by(tbl, dbl2)
 
-test_that("mutate() with NULL inputs", {
-  expect_dplyr_equal(
-    input %>%
-      mutate(int = NULL) %>%
-      collect(),
-    tbl
-  )
+  expect_dplyr_equal(input %>% mutate() %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), tbl)
+  expect_dplyr_equal(input %>% mutate() %>% collect(), gtbl)
+  expect_dplyr_equal(input %>% mutate(!!!list()) %>% collect(), gtbl)

Review comment:
       The intent is to check that `mutate()` returns its input regardless of whether its input is grouped. @pachamaltese could you please add a concise comment saying that?
   
   Also: because of ARROW-11769, the grouping added to `gtbl` by `group_by()` is actually lost when it's converted to a `Table` or `RecordBatch`. The tests pass regardless of this because `expect_dplyr_equal()` calls `expect_equivalent()` which ignores grouping. But we should fix it regardless. To do this, remove the `gtbl` assignment above and rewrite the two tests involving `gtbl` like this:
   ```r
      expect_dplyr_equal(input %>% group_by(dbl2) %>% mutate() %>% collect(), tbl)
      expect_dplyr_equal(input %>% group_by(dbl2) %>% mutate(!!!list()) %>% collect(), tbl)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org