You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/08/02 22:57:44 UTC

[GitHub] [arrow] thisisnic opened a new pull request, #13786: ARROW-11699: [R] Implement dplyr::across()

thisisnic opened a new pull request, #13786:
URL: https://github.com/apache/arrow/pull/13786

   Currently a fairly minimal draft


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on pull request #13786: ARROW-11699: [R] Implement dplyr::across()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on PR #13786:
URL: https://github.com/apache/arrow/pull/13786#issuecomment-1209922101

   I've opened https://issues.apache.org/jira/browse/ARROW-17362 as a follow-up to implement `across()` for `summarise()` too. It doesn't quite work as-is, and I don't want to overcomplicate this PR for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r945814037


##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # Check for any expressions starting with across
+  for (quo_i in seq_along(quos_in)) {
+    quo_in <- quos_in[quo_i]
+    quo_expr <- quo_get_expr(quo_in[[1]])
+    quo_env <- quo_get_env(quo_in[[1]])
+
+    if (is_call(quo_expr, "across")) {
+      new_quos <- list()
+      across_call <- call_match(quo_expr, dplyr::across, defaults = TRUE)
+
+      if (!all(names(across_call[-1]) %in% c(".cols", ".fns", ".names"))) {
+        abort("`...` argument to `across()` is deprecated in dplyr and not supported in Arrow")
+      }
+
+      # ARROW-17364: add support for .names argument
+      if (!is.null(across_call[[".names"]])) {
+        abort("`.names` argument to `across()` not yet supported in Arrow")
+      }
+
+      # use select() to get the columns so we can take advantage of tidyselect
+      source_cols <- names(dplyr::select(.data, !!across_call[[".cols"]]))
+      funcs <- across_call[[".fns"]]
+
+      # calling across() with .fns = NULL returns all columns unchanged
+      if (is_empty(funcs)) {
+        return()
+      }
+
+      if (!is_list(funcs) && as.character(funcs)[[1]] == "~") {
+        abort(
+          paste(
+            "purrr-style lambda functions as `.fns` argument to `across()`",

Review Comment:
   I don't think that works; `as_mapper()` get us the function, but what I want is the name of the function so that I can make a quosure, which later turns into something which a later step turns into something that Arrow can work with.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on PR #13786:
URL: https://github.com/apache/arrow/pull/13786#issuecomment-1231853241

   Haven't looked at the PR lately but, as I understand it, `expand_across()` takes a list of quosures and returns a list of quosures. So you should be able to unit-test all of its behavior in isolation, and then you want one or two tests in `mutate()` that it does the right thing in practice (integration tests). Of course, doing everything in `mutate()` has its conveniences; you could make it easier to write the unit tests with a test helper, something like:
   
   ```
   expect_across_equal <- function(actual, expected) {
     expect_identical(expand_across(actual), expected)
   }
   
   expect_across_equal(
     quos(lgl, across(is.numeric, ~ . * 5)), 
     quos(lgl, int * 5, dbl * 5)
   )
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r952232030


##########
r/R/dplyr-mutate.R:
##########
@@ -24,7 +24,9 @@ mutate.arrow_dplyr_query <- function(.data,
                                      .before = NULL,
                                      .after = NULL) {
   call <- match.call()
-  exprs <- ensure_named_exprs(quos(...))
+
+  expression_list <- unfold_across(.data, quos(...))

Review Comment:
   It's not as straightforward as I need to delve into the internals of `do_arrow_summarize()` and swap some bits around.  Given this PR is not tiny and I'd be wanting to add a lot of tests for all 3, I'd like to keep them separate to try to keep things simpler for review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r945649469


##########
r/tests/testthat/test-dplyr-mutate.R:
##########
@@ -589,3 +588,106 @@ test_that("mutate() and transmute() with namespaced functions", {
     tbl
   )
 })
+
+test_that("Can use across() within mutate()", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(c(dbl, dbl2), round)) %>%
+      collect(),
+    tbl
+  )
+
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(c(dbl, dbl2), list(exp, sqrt))) %>%
+      collect(),
+    tbl
+  )
+
+  # across() arguments not in default order
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(.fns = round, c(dbl, dbl2))) %>%
+      collect(),
+    tbl
+  )
+
+  # ARROW-17364: .names argument not yet supported for across()
+  expect_error(
+    tbl %>%
+      arrow_table() %>%
+      mutate(across(c(dbl, dbl2), round, .names = "{.col}.{.fn}")) %>%
+      collect(),
+    regexp = "`.names` argument to `across()` not yet supported in Arrow",
+    fixed = TRUE
+  )
+
+  # ellipses (...) are a deprecated argument
+  expect_error(
+    tbl %>%
+      arrow_table() %>%
+      mutate(across(c(dbl, dbl2), round, digits = -1)) %>%
+      collect(),
+    regexp = "`...` argument to `across()` is deprecated in dplyr and not supported in Arrow",
+    fixed = TRUE
+  )
+
+  # alternative ways of specifying .fns - as a list
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(1:dbl2, list(round))) %>%
+      collect(),
+    tbl
+  )
+

Review Comment:
   1. `across(1:dbl2, list("fun1" = round, "fun2" = exp))`
   Good catch!  I've now fixed it to properly check for the name from the list and not just the function name.
   
   2. ` across(1:dbl2, list("fun1" = round(this_is_not_cool(something_else)), "fun2" = exp))`
   This doesn't work in R either. We get the "not supported in Arrow" error, which then pushes it to R, which then gives the error it would have given anyway. I've added a test in for it, but it's a bit clunky and I'm not sure if we need it. What do you think? (It's the test which has `round(sqrt(dbl))` in it).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r936124019


##########
r/tests/testthat/test-dplyr-mutate.R:
##########
@@ -279,16 +279,16 @@ test_that("dplyr::mutate's examples", {
   # Examples we don't support should succeed
   # but warn that they're pulling data into R to do so
 
-  # across and autosplicing: ARROW-11699
   compare_dplyr_binding(
     .input %>%
       select(name, homeworld, species) %>%
       mutate(across(!name, as.factor)) %>%
       collect(),
     starwars,
-    warning = "Expression across.*not supported in Arrow"
   )

Review Comment:
   This test now fails with the following error:
   
   ```
   Error in `collect()`:
   ! Invalid: ExecuteScalarExpression cannot Execute non-scalar expression dictionary_encode(homeworld, {null_encoding_behavior=MASK})
   /home/nic2/arrow/cpp/src/arrow/compute/exec/project_node.cc:91  ExecuteScalarExpression(simplified_expr, target, plan()->exec_context())
   /home/nic2/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:559  iterator_.Next()
   /home/nic2/arrow/cpp/src/arrow/record_batch.cc:337  ReadNext(&batch)
   /home/nic2/arrow/cpp/src/arrow/record_batch.cc:351  ToRecordBatches()
   Run `rlang::last_error()` to see where the error occurred.
   ```
   TODO: investigate more, open JIRA, wrap test in expect_error with comment linking to JIRA



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r945822159


##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # Check for any expressions starting with across
+  for (quo_i in seq_along(quos_in)) {

Review Comment:
   Added now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r945660572


##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # Check for any expressions starting with across
+  for (quo_i in seq_along(quos_in)) {
+    quo_in <- quos_in[quo_i]
+    quo_expr <- quo_get_expr(quo_in[[1]])
+
+    if (is_call(quo_expr, "across")) {
+      new_quos <- list()
+      across_call <- match.call(dplyr::across, quo_expr)
+
+      if (!all(names(across_call[-1]) %in% c(".cols", ".fns", ".names"))) {
+        abort("`...` argument to `across()` is deprecated in dplyr and not supported in Arrow")
+      }
+
+      # ARROW-17364: add support for .names argument
+      if (!is.null(across_call[[".names"]])) {
+        abort("`.names` argument to `across()` not yet supported in Arrow")
+      }
+
+      # use select() to get the column names so we can take advantage of tidyselect
+      cols <- names(select(.data, !!across_call[[".cols"]]))
+      funcs <- as.character(across_call[[".fns"]])
+
+      # calling across() with .fns = NULL returns all columns unchanged
+      if (is_empty(funcs)) {
+        return()
+      }
+
+      if (funcs[[1]] == "~") {
+        abort(
+          paste(
+            "purrr-style lambda functions as `.fns` argument to `across()`",
+            "not yet supported in Arrow"
+          )
+        )
+      }
+
+      # if only 1 function, we overwrite the old columns with the new values
+      if (length(funcs) == 1) {
+        # work out the quosures from the call
+        col_syms <- syms(cols)
+        new_quos <- map(col_syms, ~ quo(!!call2(funcs, .x)))
+        new_quos <- set_names(new_quos, cols)
+      } else {
+        # remove `c()` and `list()` which have been used to specify functions
+        extracted_funcs <- funcs[map_lgl(funcs, ~ !.x %in% c("c", "list"))]
+
+        func_list <- ensure_named_funcs(extracted_funcs)
+        new_quos <- quosures_from_func_list(func_list, cols)
+      }
+
+      quos_out <- append(quos_out, new_quos)
+    } else {
+      quos_out <- append(quos_out, quo_in)
+    }
+  }
+
+  quos_out
+}
+
+# if the function is unnamed (an empty character), use the index instead
+ensure_named_funcs <- function(funcs) {
+  func_list <- as.list(funcs)
+  func_names <- names(funcs) %||% rep("", length(funcs))
+  func_indices <- seq_along(funcs)
+  names(func_list) <- map2_chr(func_names, func_indices, max)
+  func_list
+}
+
+# given a named list of functions and column names, create a list of new quosures
+quosures_from_func_list <- function(func_list, cols) {
+  func_list_full <- rep(func_list, length(cols))
+  cols_list_full <- rep(cols, each = length(func_list))
+
+  # get names of new quosures
+  new_quo_names <- map2_chr(
+    names(func_list_full), cols_list_full,
+    ~ paste(.y, .x, sep = "_")
+  )

Review Comment:
   I've deferred it to another ticket - [ARROW-17364](https://issues.apache.org/jira/browse/ARROW-17364)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r942167570


##########
r/tests/testthat/test-dplyr-mutate.R:
##########
@@ -279,16 +279,16 @@ test_that("dplyr::mutate's examples", {
   # Examples we don't support should succeed
   # but warn that they're pulling data into R to do so
 
-  # across and autosplicing: ARROW-11699
   compare_dplyr_binding(
     .input %>%
       select(name, homeworld, species) %>%
       mutate(across(!name, as.factor)) %>%
       collect(),
     starwars,
-    warning = "Expression across.*not supported in Arrow"
   )

Review Comment:
   I've opened [a ticket](https://issues.apache.org/jira/browse/ARROW-17371) and [associated PR](https://github.com/apache/arrow/pull/13836) to remove the existing mapping.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r954131915


##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {

Review Comment:
   (FWIW it was already pretty similar by chance, but I used a lot of the code there to update things like weird edge cases and how the naming stuff works)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r952279131


##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # Check for any expressions starting with across
+  for (quo_i in seq_along(quos_in)) {
+    quo_in <- quos_in[quo_i]
+    quo_expr <- quo_get_expr(quo_in[[1]])
+    quo_env <- quo_get_env(quo_in[[1]])
+
+    if (is_call(quo_expr, "across")) {
+      new_quos <- list()
+      across_call <- call_match(quo_expr, dplyr::across, defaults = TRUE)
+
+      if (!all(names(across_call[-1]) %in% c(".cols", ".fns", ".names"))) {
+        abort("`...` argument to `across()` is deprecated in dplyr and not supported in Arrow")
+      }
+
+      # ARROW-17364: add support for .names argument
+      if (!is.null(across_call[[".names"]])) {
+        abort("`.names` argument to `across()` not yet supported in Arrow")
+      }
+
+      # use select() to get the columns so we can take advantage of tidyselect
+      source_cols <- names(dplyr::select(.data, !!across_call[[".cols"]]))
+      funcs <- across_call[[".fns"]]
+
+      # calling across() with .fns = NULL returns all columns unchanged
+      if (is_empty(funcs)) {
+        return()
+      }
+
+      if (!is_list(funcs) && as.character(funcs)[[1]] == "~") {
+        abort(
+          paste(
+            "purrr-style lambda functions as `.fns` argument to `across()`",

Review Comment:
   This doesn't make sense until ARROW-14071 is merged so deferring it for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on PR #13786:
URL: https://github.com/apache/arrow/pull/13786#issuecomment-1231659849

   > I think you nailed it - the `mutate()` tests are about matching dplyr's functionality (e.g., they will fail if dplyr changes something and makes this code out of sync); the `expand_across()` tests are about `expand_across()`; and we need both.
   > 
   > I would personally just copy/paste all the places where `across()` appears in your mutate tests but `expect_identical(expand_across(quo(across(...)), ...)` instead. If you feel this is out of scope for this PR I am OK with that, too (but I think it's worth trying).
   
   I absolutely agree with the argument that this is a big function and it'd be worth having tests for it, though also wondering if copying and pasting every single test might be a bit excessive - we don't necessarily test intermediate functions in some of our other dplyr verbs (though perhaps we should).  I might just check with @nealrichardson or @jonkeane to see if they see any middle ground in what needs to be tested - if there isn't, I'm happy to copy and paste everything and approach it that way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r959320472


##########
r/R/dplyr-across.R:
##########
@@ -0,0 +1,166 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+expand_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # retrieve items using their values to preserve naming of quos other than across
+  for (quo_i in seq_along(quos_in)) {
+    quo_in <- quos_in[quo_i]
+    quo_expr <- quo_get_expr(quo_in[[1]])
+    quo_env <- quo_get_env(quo_in[[1]])
+
+    if (is_call(quo_expr, "across")) {
+      new_quos <- list()
+
+      across_call <- match.call(
+        definition = dplyr::across,
+        call = quo_expr,
+        expand.dots = FALSE,
+        envir = quo_env
+      )
+
+      if (!all(names(across_call[-1]) %in% c(".cols", ".fns", ".names"))) {
+        abort("`...` argument to `across()` is deprecated in dplyr and not supported in Arrow")
+      }
+
+      if (!is.null(across_call[[".cols"]])) {
+        cols <- across_call[[".cols"]]
+      } else {
+        cols <- quote(everything())
+      }
+
+      setup <- across_setup(
+        cols = !!as_quosure(cols, quo_env),
+        fns = across_call[[".fns"]],
+        names = across_call[[".names"]],
+        .caller_env = quo_env,
+        mask = .data,
+        inline = TRUE
+      )
+
+      # calling across() with .fns = NULL returns all columns unchanged
+      if (is_empty(setup$fns)) {
+        # this needs to be updated to match dplyr's version
+        return()
+      }
+
+      if (!is_list(setup$fns) && as.character(setup$fns)[[1]] == "~") {
+        abort(
+          paste(
+            "purrr-style lambda functions as `.fns` argument to `across()`",
+            "not yet supported in Arrow"
+          )
+        )
+      }
+
+      new_quos <- quosures_from_func_list(setup, quo_env)
+
+      quos_out <- append(quos_out, new_quos)
+    } else {
+      quos_out <- append(quos_out, quo_in)
+    }
+  }
+
+  quos_out
+}
+
+# given a named list of functions and column names, create a list of new quosures
+quosures_from_func_list <- function(setup, quo_env) {
+  func_list_full <- rep(setup$fns, length(setup$vars))
+  cols_list_full <- rep(setup$vars, each = length(setup$fns))
+
+  # get new quosures
+  new_quo_list <- map2(
+    func_list_full, cols_list_full,
+    ~ quo(!!call2(.x, sym(.y)))
+  )
+
+  quosures <- set_names(new_quo_list, setup$names)
+  map(quosures, ~ quo_set_env(.x, quo_env))
+}
+
+across_setup <- function(cols, fns, names, .caller_env, mask, inline = FALSE) {
+  cols <- enquo(cols)
+
+  vars <- names(dplyr::select(mask, !!cols))
+
+  if (is.null(fns)) {
+    if (!is.null(names)) {
+      glue_mask <- across_glue_mask(.caller_env, .col = vars, .fn = "1")
+      names <- vctrs::vec_as_names(glue::glue(names, .envir = glue_mask), repair = "check_unique")
+    } else {
+      names <- vars
+    }
+    value <- list(vars = vars, fns = fns, names = names)
+    return(value)
+  }
+
+  # apply `.names` smart default
+  if (is.function(fns) || is_formula(fns) || is.name(fns)) {
+    names <- names %||% "{.col}"
+    fns <- list("1" = fns)
+  } else {
+    names <- names %||% "{.col}_{.fn}"
+    fns <- call_args(fns)
+  }
+
+  if (any(map_lgl(fns, is_formula))) {
+    abort(
+      paste(
+        "purrr-style lambda functions as `.fns` argument to `across()`",
+        "not yet supported in Arrow"
+      )
+    )
+  }
+
+  if (!is.list(fns)) {
+    msg <- c("`.fns` must be NULL, a function, a formula, or a list of functions/formulas.")
+    abort(msg)
+  }
+
+  # make sure fns has names, use number to replace unnamed
+  if (is.null(names(fns))) {
+    names_fns <- seq_along(fns)
+  } else {
+    names_fns <- names(fns)
+    empties <- which(names_fns == "")
+    if (length(empties)) {
+      names_fns[empties] <- empties
+    }
+  }
+
+  glue_mask <- across_glue_mask(.caller_env,
+    .col = rep(vars, each = length(fns)),
+    .fn  = rep(names_fns, length(vars))
+  )
+  names <- vctrs::vec_as_names(glue::glue(names, .envir = glue_mask), repair = "check_unique")
+
+  if (!inline) {
+    fns <- map(fns, as_function)
+  }
+
+  list(vars = vars, fns = fns, names = names)
+}
+
+across_glue_mask <- function(.col, .fn, .caller_env) {
+  glue_mask <- env(.caller_env, .col = .col, .fn = .fn)
+  env_bind_active(
+    glue_mask,
+    col = function() glue_mask$.col, fn = function() glue_mask$.fn
+  )
+  glue_mask

Review Comment:
   Got this from `dplyr::across()` and don't 100% understand it, but changing it to your version doesn't make any tests fails, so have updated it for now. May be due to the way that `dplyr` subsequently executes these functions, which is fundamentally different to how we do it in `arrow`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
paleolimbot commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r959554332


##########
r/tests/testthat/test-dplyr-mutate.R:
##########
@@ -589,3 +588,106 @@ test_that("mutate() and transmute() with namespaced functions", {
     tbl
   )
 })
+
+test_that("Can use across() within mutate()", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(c(dbl, dbl2), round)) %>%
+      collect(),
+    tbl
+  )
+
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(c(dbl, dbl2), list(exp, sqrt))) %>%
+      collect(),
+    tbl
+  )
+
+  # across() arguments not in default order
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(.fns = round, c(dbl, dbl2))) %>%
+      collect(),
+    tbl
+  )
+
+  # ARROW-17364: .names argument not yet supported for across()
+  expect_error(
+    tbl %>%
+      arrow_table() %>%
+      mutate(across(c(dbl, dbl2), round, .names = "{.col}.{.fn}")) %>%
+      collect(),
+    regexp = "`.names` argument to `across()` not yet supported in Arrow",
+    fixed = TRUE
+  )
+
+  # ellipses (...) are a deprecated argument
+  expect_error(
+    tbl %>%
+      arrow_table() %>%
+      mutate(across(c(dbl, dbl2), round, digits = -1)) %>%
+      collect(),
+    regexp = "`...` argument to `across()` is deprecated in dplyr and not supported in Arrow",
+    fixed = TRUE
+  )
+
+  # alternative ways of specifying .fns - as a list
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(1:dbl2, list(round))) %>%
+      collect(),
+    tbl
+  )
+

Review Comment:
   Sorry for missing this comment...I think it's a good test to have in there because you're walking the syntax tree yourself. When I wrote this, I was worried somebody would get something like `object of type LANGSXP is not subsettable` or something.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r936160255


##########
r/tests/testthat/test-dplyr-mutate.R:
##########
@@ -279,16 +279,16 @@ test_that("dplyr::mutate's examples", {
   # Examples we don't support should succeed
   # but warn that they're pulling data into R to do so
 
-  # across and autosplicing: ARROW-11699
   compare_dplyr_binding(
     .input %>%
       select(name, homeworld, species) %>%
       mutate(across(!name, as.factor)) %>%
       collect(),
     starwars,
-    warning = "Expression across.*not supported in Arrow"
   )

Review Comment:
   Though we should probably make sure that `as.factor` isn't mapped to anything in the dplyr funcs for now, so that this fails in the right way rather than try to collect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r945704963


##########
r/tests/testthat/test-dplyr-mutate.R:
##########
@@ -589,3 +588,127 @@ test_that("mutate() and transmute() with namespaced functions", {
     tbl
   )
 })
+
+test_that("Can use across() within mutate()", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(c(dbl, dbl2), round)) %>%
+      collect(),
+    tbl
+  )
+
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(c(dbl, dbl2), list(exp, sqrt))) %>%
+      collect(),
+    tbl
+  )
+
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(c(dbl, dbl2), list("fun1" = round, "fun2" = sqrt))) %>%
+      collect(),
+    tbl
+  )
+
+  # this is valid is neither R nor Arrow
+  expect_error(
+    expect_warning(
+      compare_dplyr_binding(
+        .input %>%
+          arrow_table() %>%
+          mutate(across(c(dbl, dbl2), list("fun1" = round(sqrt(dbl))))) %>%
+          collect(),
+        tbl,
+        warning = TRUE
+      )
+    )
+  )
+
+  # across() arguments not in default order
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(.fns = round, c(dbl, dbl2))) %>%
+      collect(),
+    tbl
+  )
+
+  # ARROW-17364: .names argument not yet supported for across()
+  expect_error(
+    tbl %>%
+      arrow_table() %>%
+      mutate(across(c(dbl, dbl2), round, .names = "{.col}.{.fn}")) %>%
+      collect(),
+    regexp = "`.names` argument to `across()` not yet supported in Arrow",
+    fixed = TRUE
+  )
+
+  # ellipses (...) are a deprecated argument
+  expect_error(
+    tbl %>%
+      arrow_table() %>%
+      mutate(across(c(dbl, dbl2), round, digits = -1)) %>%
+      collect(),
+    regexp = "`...` argument to `across()` is deprecated in dplyr and not supported in Arrow",
+    fixed = TRUE
+  )
+
+  # alternative ways of specifying .fns - as a list
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(1:dbl2, list(round))) %>%
+      collect(),
+    tbl
+  )
+
+  # supply .fns as a one-item vector
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(1:dbl2, c(round))) %>%
+      collect(),
+    tbl
+  )
+
+  # ARROW-17366: purrr-style lambda functions not yet supported
+  expect_error(
+    compare_dplyr_binding(
+      .input %>%
+        mutate(across(1:dbl2, ~ round(.x, digits = -1))) %>%
+        collect(),
+      tbl
+    ),
+    regexp = "purrr-style lambda functions as `.fns` argument to `across()` not yet supported in Arrow",
+    fixed = TRUE
+  )
+
+  # .fns = NULL, the default
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(1:dbl2, NULL)) %>%
+      collect(),
+    tbl
+  )
+
+  expect_error(
+    compare_dplyr_binding(
+      .input %>%
+        mutate(across(where(is.double))) %>%
+        collect(),
+      tbl
+    ),
+    "Unsupported selection helper"

Review Comment:
   Why is this unsupported? Do we have JIRA(s) for this? Seems like it should work.



##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # Check for any expressions starting with across
+  for (quo_i in seq_along(quos_in)) {

Review Comment:
   I don't see any tests where there is more than `mutate(across(...))`, i.e. no other quosures in this list. We should probably test the corners of how across interacts with other mutate expressions.



##########
r/R/dplyr-mutate.R:
##########
@@ -24,7 +24,9 @@ mutate.arrow_dplyr_query <- function(.data,
                                      .before = NULL,
                                      .after = NULL) {
   call <- match.call()
-  exprs <- ensure_named_exprs(quos(...))
+
+  expression_list <- unfold_across(.data, quos(...))

Review Comment:
   I know you've deferred this from filter and summarize, but is that necessary? Is it more involved than adding in this `unfold_across` in them? There's no other modifications to mutate() so it seems reasonable to guess that it would be that straightforward in the other functions too.



##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {

Review Comment:
   How does this compare/contrast to how dplyr does this? 



##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # Check for any expressions starting with across
+  for (quo_i in seq_along(quos_in)) {
+    quo_in <- quos_in[quo_i]
+    quo_expr <- quo_get_expr(quo_in[[1]])
+    quo_env <- quo_get_env(quo_in[[1]])
+
+    if (is_call(quo_expr, "across")) {
+      new_quos <- list()
+      across_call <- call_match(quo_expr, dplyr::across, defaults = TRUE)
+
+      if (!all(names(across_call[-1]) %in% c(".cols", ".fns", ".names"))) {
+        abort("`...` argument to `across()` is deprecated in dplyr and not supported in Arrow")
+      }
+
+      # ARROW-17364: add support for .names argument
+      if (!is.null(across_call[[".names"]])) {
+        abort("`.names` argument to `across()` not yet supported in Arrow")
+      }
+
+      # use select() to get the columns so we can take advantage of tidyselect
+      source_cols <- names(dplyr::select(.data, !!across_call[[".cols"]]))
+      funcs <- across_call[[".fns"]]
+
+      # calling across() with .fns = NULL returns all columns unchanged
+      if (is_empty(funcs)) {
+        return()
+      }
+
+      if (!is_list(funcs) && as.character(funcs)[[1]] == "~") {
+        abort(
+          paste(
+            "purrr-style lambda functions as `.fns` argument to `across()`",

Review Comment:
   Why not? We already import `purrr::as_mapper`, is this more complicated than `funcs <- map(funcs, as_mapper)`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r936158922


##########
r/tests/testthat/test-dplyr-mutate.R:
##########
@@ -279,16 +279,16 @@ test_that("dplyr::mutate's examples", {
   # Examples we don't support should succeed
   # but warn that they're pulling data into R to do so
 
-  # across and autosplicing: ARROW-11699
   compare_dplyr_binding(
     .input %>%
       select(name, homeworld, species) %>%
       mutate(across(!name, as.factor)) %>%
       collect(),
     starwars,
-    warning = "Expression across.*not supported in Arrow"
   )

Review Comment:
   There already is at least one issue for `as.factor` in Acero: ARROW-12632, and I think there was another. 
   
   I'd change this test to use a different function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on PR #13786:
URL: https://github.com/apache/arrow/pull/13786#issuecomment-1231628515

   > This is so close! I'm definitely in favor of deferring `filter()` and `summarise()` to another PR (although it will be confusing if they are not implemented before the next release). I really like how you split up the functions in dplyr-across.R...made it easy to review!
   > 
   > The _one_ thing I think this needs is a set of tests directly on `expand_across()`. I think you can very easily convert the existing tests to something like:
   > 
   > ```r
   > library(rlang, warn.conflicts = FALSE)
   > library(arrow, warn.conflicts = FALSE)
   > #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
   > 
   > testthat::expect_identical(
   >   arrow:::expand_across(
   >     data.frame(dbl = double(), dbl2 = double()),
   >     rlang::quos(across(c(dbl, dbl2), round))
   >   ),
   >   list(
   >     dbl = quo(round(dbl)),
   >     dbl2 = quo(round(dbl2))
   >   )
   > )
   > ```
   > 
   > Created on 2022-08-26 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)
   > 
   > (With apologies for not suggesting that in my first review! I think those tests would help to communicate exactly what `expand_across()` is doing and highlight the fact that you did a really good job separating the implementation into highly testable pieces.)
   
   @paleolimbot Thanks for the review!  Quick question for you; I want to keep in the tests which have `across()` inside of `mutate()` as it makes it a lot easier to see "how are we matching dplyr's functionality" which I imagine will come up if when there are bug reports about this.  Any tips as to which functionality to include in the `expand_across()` tests vs. the `mutate(across(...)` tests? Or is this going to be a pain, and I should just swap them all over?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r944522607


##########
r/tests/testthat/test-dplyr-mutate.R:
##########
@@ -589,3 +588,106 @@ test_that("mutate() and transmute() with namespaced functions", {
     tbl
   )
 })
+
+test_that("Can use across() within mutate()", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(c(dbl, dbl2), round)) %>%
+      collect(),
+    tbl
+  )
+
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(c(dbl, dbl2), list(exp, sqrt))) %>%
+      collect(),
+    tbl
+  )
+
+  # across() arguments not in default order
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(.fns = round, c(dbl, dbl2))) %>%
+      collect(),
+    tbl
+  )
+
+  # ARROW-17364: .names argument not yet supported for across()
+  expect_error(
+    tbl %>%
+      arrow_table() %>%
+      mutate(across(c(dbl, dbl2), round, .names = "{.col}.{.fn}")) %>%
+      collect(),
+    regexp = "`.names` argument to `across()` not yet supported in Arrow",
+    fixed = TRUE
+  )
+
+  # ellipses (...) are a deprecated argument
+  expect_error(
+    tbl %>%
+      arrow_table() %>%
+      mutate(across(c(dbl, dbl2), round, digits = -1)) %>%
+      collect(),
+    regexp = "`...` argument to `across()` is deprecated in dplyr and not supported in Arrow",
+    fixed = TRUE
+  )
+
+  # alternative ways of specifying .fns - as a list
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(1:dbl2, list(round))) %>%
+      collect(),
+    tbl
+  )
+

Review Comment:
   Good catches there, have added in a couple of failing tests for those that I will make pass.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r946734930


##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {

Review Comment:
   > How does this compare/contrast to how dplyr does this?
   
   For clarity, why do you ask? 
   
   Or is this less of a question, and actually you're just tactfully saying "hey Nic, might be worth looking through `dplyr::across()` as you could just do it how it's done there instead of coming up with it all over again with bits missing"?  (If so, thanks for pointing this out, as I keep forgetting this!)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r936158922


##########
r/tests/testthat/test-dplyr-mutate.R:
##########
@@ -279,16 +279,16 @@ test_that("dplyr::mutate's examples", {
   # Examples we don't support should succeed
   # but warn that they're pulling data into R to do so
 
-  # across and autosplicing: ARROW-11699
   compare_dplyr_binding(
     .input %>%
       select(name, homeworld, species) %>%
       mutate(across(!name, as.factor)) %>%
       collect(),
     starwars,
-    warning = "Expression across.*not supported in Arrow"
   )

Review Comment:
   There already is at least one issue for `as.factor` in Acero: ARROW-12632, and I think there was another. 
   
   I'd change this test to use a different function that is supported.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13786: ARROW-11699: [R] Implement dplyr::across()

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13786:
URL: https://github.com/apache/arrow/pull/13786#issuecomment-1203292487

   https://issues.apache.org/jira/browse/ARROW-11699


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on PR #13786:
URL: https://github.com/apache/arrow/pull/13786#issuecomment-1232646949

   @paleolimbot I've updated the tests as per your and Neals' useful suggestions. Mind giving it another review now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on pull request #13786: ARROW-11699: [R] Implement dplyr::across()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on PR #13786:
URL: https://github.com/apache/arrow/pull/13786#issuecomment-1203884371

   Will come back to this once I'm back from holiday, but as a reminder to my future self, where I was up to:
   * I have a minimal working example that runs successfully in `mutate()`
   * I wanted to incorporate the `.names` argument from `across()` and so had refactor the column naming code out into its own function, to pass that into later
   * I tried to implement `across()` in `summarise()` - this broke things with the `.groups` argument and I don't yet understand what/why, other than that `.groups` works differently with Datasets versus Tables - there may be some context in https://github.com/apache/arrow/issues/11184
   * it's also worth considering whether it's worth just focussing on `across()` for `mutate()` as a first step, and extending to `summarize()` later


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r945659781


##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # Check for any expressions starting with across
+  for (quo_i in seq_along(quos_in)) {
+    quo_in <- quos_in[quo_i]
+    quo_expr <- quo_get_expr(quo_in[[1]])
+
+    if (is_call(quo_expr, "across")) {
+      new_quos <- list()
+      across_call <- match.call(dplyr::across, quo_expr)
+
+      if (!all(names(across_call[-1]) %in% c(".cols", ".fns", ".names"))) {
+        abort("`...` argument to `across()` is deprecated in dplyr and not supported in Arrow")
+      }
+
+      # ARROW-17364: add support for .names argument
+      if (!is.null(across_call[[".names"]])) {
+        abort("`.names` argument to `across()` not yet supported in Arrow")
+      }
+
+      # use select() to get the column names so we can take advantage of tidyselect
+      cols <- names(select(.data, !!across_call[[".cols"]]))
+      funcs <- as.character(across_call[[".fns"]])
+
+      # calling across() with .fns = NULL returns all columns unchanged
+      if (is_empty(funcs)) {
+        return()
+      }
+
+      if (funcs[[1]] == "~") {
+        abort(
+          paste(
+            "purrr-style lambda functions as `.fns` argument to `across()`",
+            "not yet supported in Arrow"
+          )
+        )
+      }
+
+      # if only 1 function, we overwrite the old columns with the new values
+      if (length(funcs) == 1) {
+        # work out the quosures from the call
+        col_syms <- syms(cols)
+        new_quos <- map(col_syms, ~ quo(!!call2(funcs, .x)))
+        new_quos <- set_names(new_quos, cols)
+      } else {
+        # remove `c()` and `list()` which have been used to specify functions
+        extracted_funcs <- funcs[map_lgl(funcs, ~ !.x %in% c("c", "list"))]
+
+        func_list <- ensure_named_funcs(extracted_funcs)
+        new_quos <- quosures_from_func_list(func_list, cols)
+      }
+
+      quos_out <- append(quos_out, new_quos)

Review Comment:
   Agreed, updated now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r945861915


##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {

Review Comment:
   Didn't spend too much time looking at it, as weird things were happening with breakpoints that I didn't understand when I was trying to run it locally, but now I'm looking at it, it's pretty similar. Will take another look to see if there's any bits of the original implementation I can add in here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic closed pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic closed pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()
URL: https://github.com/apache/arrow/pull/13786


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r958461322


##########
r/R/dplyr-across.R:
##########
@@ -0,0 +1,166 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+expand_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # retrieve items using their values to preserve naming of quos other than across
+  for (quo_i in seq_along(quos_in)) {
+    quo_in <- quos_in[quo_i]
+    quo_expr <- quo_get_expr(quo_in[[1]])
+    quo_env <- quo_get_env(quo_in[[1]])
+
+    if (is_call(quo_expr, "across")) {
+      new_quos <- list()
+
+      across_call <- match.call(
+        definition = dplyr::across,
+        call = quo_expr,
+        expand.dots = FALSE,
+        envir = quo_env
+      )
+
+      if (!all(names(across_call[-1]) %in% c(".cols", ".fns", ".names"))) {
+        abort("`...` argument to `across()` is deprecated in dplyr and not supported in Arrow")
+      }
+
+      if (!is.null(across_call[[".cols"]])) {
+        cols <- across_call[[".cols"]]
+      } else {
+        cols <- quote(everything())
+      }
+
+      setup <- across_setup(
+        cols = !!as_quosure(cols, quo_env),
+        fns = across_call[[".fns"]],
+        names = across_call[[".names"]],
+        .caller_env = quo_env,
+        mask = .data,
+        inline = TRUE
+      )
+
+      # calling across() with .fns = NULL returns all columns unchanged
+      if (is_empty(setup$fns)) {
+        # this needs to be updated to match dplyr's version

Review Comment:
   I've updated the comment to be clearer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r952280780


##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {

Review Comment:
   OK, have now updated this using `dplyr::across()` as reference. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r942167570


##########
r/tests/testthat/test-dplyr-mutate.R:
##########
@@ -279,16 +279,16 @@ test_that("dplyr::mutate's examples", {
   # Examples we don't support should succeed
   # but warn that they're pulling data into R to do so
 
-  # across and autosplicing: ARROW-11699
   compare_dplyr_binding(
     .input %>%
       select(name, homeworld, species) %>%
       mutate(across(!name, as.factor)) %>%
       collect(),
     starwars,
-    warning = "Expression across.*not supported in Arrow"
   )

Review Comment:
   I've changed the test, and then opened [a ticket](https://issues.apache.org/jira/browse/ARROW-17371) and [associated PR](https://github.com/apache/arrow/pull/13836) to remove the existing mapping.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r943625393


##########
r/R/dplyr-mutate.R:
##########
@@ -24,7 +24,9 @@ mutate.arrow_dplyr_query <- function(.data,
                                      .before = NULL,
                                      .after = NULL) {
   call <- match.call()
-  exprs <- ensure_named_exprs(quos(...))
+
+  expression_list <- unfold_across(.data, quos(...))
+  exprs <- ensure_named_exprs(expression_list)

Review Comment:
   Good spot, didn't see it could be used in `filter()`!  Opened a ticket now: [ARROW-17387](https://issues.apache.org/jira/browse/ARROW-17387)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r945778107


##########
r/tests/testthat/test-dplyr-mutate.R:
##########
@@ -589,3 +588,127 @@ test_that("mutate() and transmute() with namespaced functions", {
     tbl
   )
 })
+
+test_that("Can use across() within mutate()", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(c(dbl, dbl2), round)) %>%
+      collect(),
+    tbl
+  )
+
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(c(dbl, dbl2), list(exp, sqrt))) %>%
+      collect(),
+    tbl
+  )
+
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(c(dbl, dbl2), list("fun1" = round, "fun2" = sqrt))) %>%
+      collect(),
+    tbl
+  )
+
+  # this is valid is neither R nor Arrow
+  expect_error(
+    expect_warning(
+      compare_dplyr_binding(
+        .input %>%
+          arrow_table() %>%
+          mutate(across(c(dbl, dbl2), list("fun1" = round(sqrt(dbl))))) %>%
+          collect(),
+        tbl,
+        warning = TRUE
+      )
+    )
+  )
+
+  # across() arguments not in default order
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(.fns = round, c(dbl, dbl2))) %>%
+      collect(),
+    tbl
+  )
+
+  # ARROW-17364: .names argument not yet supported for across()
+  expect_error(
+    tbl %>%
+      arrow_table() %>%
+      mutate(across(c(dbl, dbl2), round, .names = "{.col}.{.fn}")) %>%
+      collect(),
+    regexp = "`.names` argument to `across()` not yet supported in Arrow",
+    fixed = TRUE
+  )
+
+  # ellipses (...) are a deprecated argument
+  expect_error(
+    tbl %>%
+      arrow_table() %>%
+      mutate(across(c(dbl, dbl2), round, digits = -1)) %>%
+      collect(),
+    regexp = "`...` argument to `across()` is deprecated in dplyr and not supported in Arrow",
+    fixed = TRUE
+  )
+
+  # alternative ways of specifying .fns - as a list
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(1:dbl2, list(round))) %>%
+      collect(),
+    tbl
+  )
+
+  # supply .fns as a one-item vector
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(1:dbl2, c(round))) %>%
+      collect(),
+    tbl
+  )
+
+  # ARROW-17366: purrr-style lambda functions not yet supported
+  expect_error(
+    compare_dplyr_binding(
+      .input %>%
+        mutate(across(1:dbl2, ~ round(.x, digits = -1))) %>%
+        collect(),
+      tbl
+    ),
+    regexp = "purrr-style lambda functions as `.fns` argument to `across()` not yet supported in Arrow",
+    fixed = TRUE
+  )
+
+  # .fns = NULL, the default
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(1:dbl2, NULL)) %>%
+      collect(),
+    tbl
+  )
+
+  expect_error(
+    compare_dplyr_binding(
+      .input %>%
+        mutate(across(where(is.double))) %>%
+        collect(),
+      tbl
+    ),
+    "Unsupported selection helper"

Review Comment:
   `where()` isn't yet supported - have added a comment which links to the JIRA for it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r945796704


##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # Check for any expressions starting with across
+  for (quo_i in seq_along(quos_in)) {
+    quo_in <- quos_in[quo_i]
+    quo_expr <- quo_get_expr(quo_in[[1]])
+    quo_env <- quo_get_env(quo_in[[1]])
+
+    if (is_call(quo_expr, "across")) {
+      new_quos <- list()
+      across_call <- call_match(quo_expr, dplyr::across, defaults = TRUE)
+
+      if (!all(names(across_call[-1]) %in% c(".cols", ".fns", ".names"))) {
+        abort("`...` argument to `across()` is deprecated in dplyr and not supported in Arrow")
+      }
+
+      # ARROW-17364: add support for .names argument
+      if (!is.null(across_call[[".names"]])) {
+        abort("`.names` argument to `across()` not yet supported in Arrow")
+      }
+
+      # use select() to get the columns so we can take advantage of tidyselect
+      source_cols <- names(dplyr::select(.data, !!across_call[[".cols"]]))
+      funcs <- across_call[[".fns"]]
+
+      # calling across() with .fns = NULL returns all columns unchanged
+      if (is_empty(funcs)) {
+        return()
+      }
+
+      if (!is_list(funcs) && as.character(funcs)[[1]] == "~") {
+        abort(
+          paste(
+            "purrr-style lambda functions as `.fns` argument to `across()`",

Review Comment:
   Yep, if I try to do that, I get the error:
   ```
   Error in as_mapper(funcs) : 
     Formula must carry an environment.
   ```
   Currently trying to work out how to fix that, `f_env()` looks promising!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r945796704


##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # Check for any expressions starting with across
+  for (quo_i in seq_along(quos_in)) {
+    quo_in <- quos_in[quo_i]
+    quo_expr <- quo_get_expr(quo_in[[1]])
+    quo_env <- quo_get_env(quo_in[[1]])
+
+    if (is_call(quo_expr, "across")) {
+      new_quos <- list()
+      across_call <- call_match(quo_expr, dplyr::across, defaults = TRUE)
+
+      if (!all(names(across_call[-1]) %in% c(".cols", ".fns", ".names"))) {
+        abort("`...` argument to `across()` is deprecated in dplyr and not supported in Arrow")
+      }
+
+      # ARROW-17364: add support for .names argument
+      if (!is.null(across_call[[".names"]])) {
+        abort("`.names` argument to `across()` not yet supported in Arrow")
+      }
+
+      # use select() to get the columns so we can take advantage of tidyselect
+      source_cols <- names(dplyr::select(.data, !!across_call[[".cols"]]))
+      funcs <- across_call[[".fns"]]
+
+      # calling across() with .fns = NULL returns all columns unchanged
+      if (is_empty(funcs)) {
+        return()
+      }
+
+      if (!is_list(funcs) && as.character(funcs)[[1]] == "~") {
+        abort(
+          paste(
+            "purrr-style lambda functions as `.fns` argument to `across()`",

Review Comment:
   Yep, if I try to do that, I get the error:
   ```
   Error in as_mapper(funcs) : 
     Formula must carry an environment.
   ```
   Currently trying to work out how to fix that, but not sure yet.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
paleolimbot commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r943592958


##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # Check for any expressions starting with across
+  for (quo_i in seq_along(quos_in)) {
+    quo_in <- quos_in[quo_i]
+    quo_expr <- quo_get_expr(quo_in[[1]])
+
+    if (is_call(quo_expr, "across")) {
+      new_quos <- list()
+      across_call <- match.call(dplyr::across, quo_expr)
+
+      if (!all(names(across_call[-1]) %in% c(".cols", ".fns", ".names"))) {
+        abort("`...` argument to `across()` is deprecated in dplyr and not supported in Arrow")
+      }
+
+      # ARROW-17364: add support for .names argument
+      if (!is.null(across_call[[".names"]])) {
+        abort("`.names` argument to `across()` not yet supported in Arrow")
+      }
+
+      # use select() to get the column names so we can take advantage of tidyselect
+      cols <- names(select(.data, !!across_call[[".cols"]]))
+      funcs <- as.character(across_call[[".fns"]])
+
+      # calling across() with .fns = NULL returns all columns unchanged
+      if (is_empty(funcs)) {
+        return()
+      }
+
+      if (funcs[[1]] == "~") {
+        abort(
+          paste(
+            "purrr-style lambda functions as `.fns` argument to `across()`",
+            "not yet supported in Arrow"
+          )
+        )
+      }
+
+      # if only 1 function, we overwrite the old columns with the new values
+      if (length(funcs) == 1) {
+        # work out the quosures from the call
+        col_syms <- syms(cols)
+        new_quos <- map(col_syms, ~ quo(!!call2(funcs, .x)))
+        new_quos <- set_names(new_quos, cols)
+      } else {
+        # remove `c()` and `list()` which have been used to specify functions
+        extracted_funcs <- funcs[map_lgl(funcs, ~ !.x %in% c("c", "list"))]
+
+        func_list <- ensure_named_funcs(extracted_funcs)
+        new_quos <- quosures_from_func_list(func_list, cols)
+      }
+
+      quos_out <- append(quos_out, new_quos)

Review Comment:
   I'm still working my head around this but I think that *somewhere* you need to `rlang::quo_set_env(quo_out, rlang::quo_get_env(quo_in))` to make sure that symbol references that are *not* columns are fetched from the calling environment. I'm struggling to come up with an example where that can happen so maybe this isn't relevant here, but it seems like somehow `rlang::quo_get_env(quo_in))` should be passed on to the output quosures? 



##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # Check for any expressions starting with across
+  for (quo_i in seq_along(quos_in)) {
+    quo_in <- quos_in[quo_i]
+    quo_expr <- quo_get_expr(quo_in[[1]])
+
+    if (is_call(quo_expr, "across")) {
+      new_quos <- list()
+      across_call <- match.call(dplyr::across, quo_expr)
+
+      if (!all(names(across_call[-1]) %in% c(".cols", ".fns", ".names"))) {
+        abort("`...` argument to `across()` is deprecated in dplyr and not supported in Arrow")
+      }
+
+      # ARROW-17364: add support for .names argument
+      if (!is.null(across_call[[".names"]])) {
+        abort("`.names` argument to `across()` not yet supported in Arrow")
+      }
+
+      # use select() to get the column names so we can take advantage of tidyselect
+      cols <- names(select(.data, !!across_call[[".cols"]]))

Review Comment:
   ```suggestion
         cols <- names(dplyr::select(.data, !!across_call[[".cols"]]))
   ```



##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # Check for any expressions starting with across
+  for (quo_i in seq_along(quos_in)) {
+    quo_in <- quos_in[quo_i]
+    quo_expr <- quo_get_expr(quo_in[[1]])
+
+    if (is_call(quo_expr, "across")) {
+      new_quos <- list()
+      across_call <- match.call(dplyr::across, quo_expr)
+
+      if (!all(names(across_call[-1]) %in% c(".cols", ".fns", ".names"))) {
+        abort("`...` argument to `across()` is deprecated in dplyr and not supported in Arrow")
+      }
+
+      # ARROW-17364: add support for .names argument
+      if (!is.null(across_call[[".names"]])) {
+        abort("`.names` argument to `across()` not yet supported in Arrow")
+      }
+
+      # use select() to get the column names so we can take advantage of tidyselect
+      cols <- names(select(.data, !!across_call[[".cols"]]))
+      funcs <- as.character(across_call[[".fns"]])
+
+      # calling across() with .fns = NULL returns all columns unchanged
+      if (is_empty(funcs)) {
+        return()
+      }
+
+      if (funcs[[1]] == "~") {
+        abort(
+          paste(
+            "purrr-style lambda functions as `.fns` argument to `across()`",
+            "not yet supported in Arrow"
+          )
+        )
+      }
+
+      # if only 1 function, we overwrite the old columns with the new values
+      if (length(funcs) == 1) {
+        # work out the quosures from the call
+        col_syms <- syms(cols)
+        new_quos <- map(col_syms, ~ quo(!!call2(funcs, .x)))
+        new_quos <- set_names(new_quos, cols)
+      } else {
+        # remove `c()` and `list()` which have been used to specify functions
+        extracted_funcs <- funcs[map_lgl(funcs, ~ !.x %in% c("c", "list"))]
+
+        func_list <- ensure_named_funcs(extracted_funcs)
+        new_quos <- quosures_from_func_list(func_list, cols)
+      }
+
+      quos_out <- append(quos_out, new_quos)
+    } else {
+      quos_out <- append(quos_out, quo_in)
+    }
+  }
+
+  quos_out
+}
+
+# if the function is unnamed (an empty character), use the index instead
+ensure_named_funcs <- function(funcs) {
+  func_list <- as.list(funcs)
+  func_names <- names(funcs) %||% rep("", length(funcs))
+  func_indices <- seq_along(funcs)
+  names(func_list) <- map2_chr(func_names, func_indices, max)
+  func_list
+}
+
+# given a named list of functions and column names, create a list of new quosures
+quosures_from_func_list <- function(func_list, cols) {
+  func_list_full <- rep(func_list, length(cols))
+  cols_list_full <- rep(cols, each = length(func_list))
+
+  # get names of new quosures
+  new_quo_names <- map2_chr(
+    names(func_list_full), cols_list_full,
+    ~ paste(.y, .x, sep = "_")
+  )

Review Comment:
   Can you implement the `.names` argument here? There's almost certainly a cleaner way to do this but something like:
   
   ``` r
   withr::with_environment(as.environment(list(.col = "something", .fn = "something-else")), glue::glue("{.col}_{.fn}"))
   #> something_something-else
   ```
   
   <sup>Created on 2022-08-11 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup> 



##########
r/tests/testthat/test-dplyr-mutate.R:
##########
@@ -589,3 +588,106 @@ test_that("mutate() and transmute() with namespaced functions", {
     tbl
   )
 })
+
+test_that("Can use across() within mutate()", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(c(dbl, dbl2), round)) %>%
+      collect(),
+    tbl
+  )
+
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(c(dbl, dbl2), list(exp, sqrt))) %>%
+      collect(),
+    tbl
+  )
+
+  # across() arguments not in default order
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(.fns = round, c(dbl, dbl2))) %>%
+      collect(),
+    tbl
+  )
+
+  # ARROW-17364: .names argument not yet supported for across()
+  expect_error(
+    tbl %>%
+      arrow_table() %>%
+      mutate(across(c(dbl, dbl2), round, .names = "{.col}.{.fn}")) %>%
+      collect(),
+    regexp = "`.names` argument to `across()` not yet supported in Arrow",
+    fixed = TRUE
+  )
+
+  # ellipses (...) are a deprecated argument
+  expect_error(
+    tbl %>%
+      arrow_table() %>%
+      mutate(across(c(dbl, dbl2), round, digits = -1)) %>%
+      collect(),
+    regexp = "`...` argument to `across()` is deprecated in dplyr and not supported in Arrow",
+    fixed = TRUE
+  )
+
+  # alternative ways of specifying .fns - as a list
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(1:dbl2, list(round))) %>%
+      collect(),
+    tbl
+  )
+

Review Comment:
   It seems like this will work but what about `across(1:dbl2, list("fun1" = round, "fun2" = exp))`? Does putting something obnoxious like `across(1:dbl2, list("fun1" = round(this_is_not_cool(something_else)), "fun2" = exp))` result in an interpretable error?



##########
r/R/dplyr-mutate.R:
##########
@@ -24,7 +24,9 @@ mutate.arrow_dplyr_query <- function(.data,
                                      .before = NULL,
                                      .after = NULL) {
   call <- match.call()
-  exprs <- ensure_named_exprs(quos(...))
+
+  expression_list <- unfold_across(.data, quos(...))
+  exprs <- ensure_named_exprs(expression_list)

Review Comment:
   Should this also get copied to `filter()` (or maybe that's already another ticket?)



##########
r/tests/testthat/test-dplyr-mutate.R:
##########
@@ -589,3 +588,106 @@ test_that("mutate() and transmute() with namespaced functions", {
     tbl
   )
 })
+
+test_that("Can use across() within mutate()", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(c(dbl, dbl2), round)) %>%
+      collect(),
+    tbl
+  )
+
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(c(dbl, dbl2), list(exp, sqrt))) %>%
+      collect(),
+    tbl
+  )
+
+  # across() arguments not in default order
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(.fns = round, c(dbl, dbl2))) %>%
+      collect(),
+    tbl
+  )
+
+  # ARROW-17364: .names argument not yet supported for across()
+  expect_error(
+    tbl %>%
+      arrow_table() %>%
+      mutate(across(c(dbl, dbl2), round, .names = "{.col}.{.fn}")) %>%
+      collect(),
+    regexp = "`.names` argument to `across()` not yet supported in Arrow",
+    fixed = TRUE
+  )
+
+  # ellipses (...) are a deprecated argument
+  expect_error(
+    tbl %>%
+      arrow_table() %>%
+      mutate(across(c(dbl, dbl2), round, digits = -1)) %>%
+      collect(),
+    regexp = "`...` argument to `across()` is deprecated in dplyr and not supported in Arrow",
+    fixed = TRUE
+  )
+
+  # alternative ways of specifying .fns - as a list
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(1:dbl2, list(round))) %>%
+      collect(),
+    tbl
+  )
+
+  # supply .fns as a one-item vector
+  compare_dplyr_binding(
+    .input %>%
+      mutate(across(1:dbl2, c(round))) %>%
+      collect(),
+    tbl
+  )
+
+  # ARROW-17366: purrr-style lmabda functions not yet supported

Review Comment:
   ```suggestion
     # ARROW-17366: purrr-style lambda functions not yet supported
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r945659781


##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # Check for any expressions starting with across
+  for (quo_i in seq_along(quos_in)) {
+    quo_in <- quos_in[quo_i]
+    quo_expr <- quo_get_expr(quo_in[[1]])
+
+    if (is_call(quo_expr, "across")) {
+      new_quos <- list()
+      across_call <- match.call(dplyr::across, quo_expr)
+
+      if (!all(names(across_call[-1]) %in% c(".cols", ".fns", ".names"))) {
+        abort("`...` argument to `across()` is deprecated in dplyr and not supported in Arrow")
+      }
+
+      # ARROW-17364: add support for .names argument
+      if (!is.null(across_call[[".names"]])) {
+        abort("`.names` argument to `across()` not yet supported in Arrow")
+      }
+
+      # use select() to get the column names so we can take advantage of tidyselect
+      cols <- names(select(.data, !!across_call[[".cols"]]))
+      funcs <- as.character(across_call[[".fns"]])
+
+      # calling across() with .fns = NULL returns all columns unchanged
+      if (is_empty(funcs)) {
+        return()
+      }
+
+      if (funcs[[1]] == "~") {
+        abort(
+          paste(
+            "purrr-style lambda functions as `.fns` argument to `across()`",
+            "not yet supported in Arrow"
+          )
+        )
+      }
+
+      # if only 1 function, we overwrite the old columns with the new values
+      if (length(funcs) == 1) {
+        # work out the quosures from the call
+        col_syms <- syms(cols)
+        new_quos <- map(col_syms, ~ quo(!!call2(funcs, .x)))
+        new_quos <- set_names(new_quos, cols)
+      } else {
+        # remove `c()` and `list()` which have been used to specify functions
+        extracted_funcs <- funcs[map_lgl(funcs, ~ !.x %in% c("c", "list"))]
+
+        func_list <- ensure_named_funcs(extracted_funcs)
+        new_quos <- quosures_from_func_list(func_list, cols)
+      }
+
+      quos_out <- append(quos_out, new_quos)

Review Comment:
   Agreed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on PR #13786:
URL: https://github.com/apache/arrow/pull/13786#issuecomment-1223701444

   This PR now also implements the `.names` argument.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
paleolimbot commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r956300622


##########
r/R/dplyr-across.R:
##########
@@ -0,0 +1,166 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+expand_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # retrieve items using their values to preserve naming of quos other than across
+  for (quo_i in seq_along(quos_in)) {
+    quo_in <- quos_in[quo_i]
+    quo_expr <- quo_get_expr(quo_in[[1]])
+    quo_env <- quo_get_env(quo_in[[1]])
+
+    if (is_call(quo_expr, "across")) {
+      new_quos <- list()
+
+      across_call <- match.call(
+        definition = dplyr::across,
+        call = quo_expr,
+        expand.dots = FALSE,
+        envir = quo_env
+      )
+
+      if (!all(names(across_call[-1]) %in% c(".cols", ".fns", ".names"))) {
+        abort("`...` argument to `across()` is deprecated in dplyr and not supported in Arrow")
+      }
+
+      if (!is.null(across_call[[".cols"]])) {
+        cols <- across_call[[".cols"]]
+      } else {
+        cols <- quote(everything())
+      }
+
+      setup <- across_setup(
+        cols = !!as_quosure(cols, quo_env),
+        fns = across_call[[".fns"]],
+        names = across_call[[".names"]],
+        .caller_env = quo_env,
+        mask = .data,
+        inline = TRUE
+      )
+
+      # calling across() with .fns = NULL returns all columns unchanged
+      if (is_empty(setup$fns)) {
+        # this needs to be updated to match dplyr's version

Review Comment:
   Is this a TODO for this PR or a future PR?



##########
r/R/dplyr-across.R:
##########
@@ -0,0 +1,166 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+expand_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # retrieve items using their values to preserve naming of quos other than across
+  for (quo_i in seq_along(quos_in)) {
+    quo_in <- quos_in[quo_i]
+    quo_expr <- quo_get_expr(quo_in[[1]])
+    quo_env <- quo_get_env(quo_in[[1]])
+
+    if (is_call(quo_expr, "across")) {
+      new_quos <- list()
+
+      across_call <- match.call(
+        definition = dplyr::across,
+        call = quo_expr,
+        expand.dots = FALSE,
+        envir = quo_env
+      )
+
+      if (!all(names(across_call[-1]) %in% c(".cols", ".fns", ".names"))) {
+        abort("`...` argument to `across()` is deprecated in dplyr and not supported in Arrow")
+      }
+
+      if (!is.null(across_call[[".cols"]])) {
+        cols <- across_call[[".cols"]]
+      } else {
+        cols <- quote(everything())
+      }
+
+      setup <- across_setup(
+        cols = !!as_quosure(cols, quo_env),
+        fns = across_call[[".fns"]],
+        names = across_call[[".names"]],
+        .caller_env = quo_env,
+        mask = .data,
+        inline = TRUE
+      )
+
+      # calling across() with .fns = NULL returns all columns unchanged
+      if (is_empty(setup$fns)) {
+        # this needs to be updated to match dplyr's version
+        return()
+      }
+
+      if (!is_list(setup$fns) && as.character(setup$fns)[[1]] == "~") {
+        abort(
+          paste(
+            "purrr-style lambda functions as `.fns` argument to `across()`",
+            "not yet supported in Arrow"
+          )
+        )
+      }
+
+      new_quos <- quosures_from_func_list(setup, quo_env)
+
+      quos_out <- append(quos_out, new_quos)
+    } else {
+      quos_out <- append(quos_out, quo_in)
+    }
+  }
+
+  quos_out
+}
+
+# given a named list of functions and column names, create a list of new quosures
+quosures_from_func_list <- function(setup, quo_env) {
+  func_list_full <- rep(setup$fns, length(setup$vars))
+  cols_list_full <- rep(setup$vars, each = length(setup$fns))
+
+  # get new quosures
+  new_quo_list <- map2(
+    func_list_full, cols_list_full,
+    ~ quo(!!call2(.x, sym(.y)))
+  )
+
+  quosures <- set_names(new_quo_list, setup$names)
+  map(quosures, ~ quo_set_env(.x, quo_env))
+}
+
+across_setup <- function(cols, fns, names, .caller_env, mask, inline = FALSE) {
+  cols <- enquo(cols)
+
+  vars <- names(dplyr::select(mask, !!cols))
+
+  if (is.null(fns)) {
+    if (!is.null(names)) {
+      glue_mask <- across_glue_mask(.caller_env, .col = vars, .fn = "1")
+      names <- vctrs::vec_as_names(glue::glue(names, .envir = glue_mask), repair = "check_unique")
+    } else {
+      names <- vars
+    }
+    value <- list(vars = vars, fns = fns, names = names)
+    return(value)
+  }
+
+  # apply `.names` smart default
+  if (is.function(fns) || is_formula(fns) || is.name(fns)) {
+    names <- names %||% "{.col}"
+    fns <- list("1" = fns)
+  } else {
+    names <- names %||% "{.col}_{.fn}"
+    fns <- call_args(fns)
+  }
+
+  if (any(map_lgl(fns, is_formula))) {
+    abort(
+      paste(
+        "purrr-style lambda functions as `.fns` argument to `across()`",
+        "not yet supported in Arrow"
+      )
+    )
+  }
+
+  if (!is.list(fns)) {
+    msg <- c("`.fns` must be NULL, a function, a formula, or a list of functions/formulas.")
+    abort(msg)
+  }
+
+  # make sure fns has names, use number to replace unnamed
+  if (is.null(names(fns))) {
+    names_fns <- seq_along(fns)
+  } else {
+    names_fns <- names(fns)
+    empties <- which(names_fns == "")
+    if (length(empties)) {
+      names_fns[empties] <- empties
+    }
+  }
+
+  glue_mask <- across_glue_mask(.caller_env,
+    .col = rep(vars, each = length(fns)),
+    .fn  = rep(names_fns, length(vars))
+  )
+  names <- vctrs::vec_as_names(glue::glue(names, .envir = glue_mask), repair = "check_unique")
+
+  if (!inline) {
+    fns <- map(fns, as_function)
+  }
+
+  list(vars = vars, fns = fns, names = names)
+}
+
+across_glue_mask <- function(.col, .fn, .caller_env) {
+  glue_mask <- env(.caller_env, .col = .col, .fn = .fn)
+  env_bind_active(
+    glue_mask,
+    col = function() glue_mask$.col, fn = function() glue_mask$.fn
+  )
+  glue_mask

Review Comment:
   Is this different than
   
   ```
   glue_mask <- env(.caller_env, .col = .col, .fn = .fn, col = .col, fn = .fn)
   ```
   
   ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r955295593


##########
r/R/dplyr-mutate.R:
##########
@@ -151,3 +153,95 @@ ensure_named_exprs <- function(exprs) {
   names(exprs)[unnamed] <- map_chr(exprs[unnamed], format_expr)
   exprs
 }
+
+# Take the input quos and unfold any instances of across()
+# into individual quosures
+unfold_across <- function(.data, quos_in) {
+  quos_out <- list()
+  # Check for any expressions starting with across
+  for (quo_i in seq_along(quos_in)) {
+    quo_in <- quos_in[quo_i]
+    quo_expr <- quo_get_expr(quo_in[[1]])
+
+    if (is_call(quo_expr, "across")) {
+      new_quos <- list()
+      across_call <- match.call(dplyr::across, quo_expr)
+
+      if (!all(names(across_call[-1]) %in% c(".cols", ".fns", ".names"))) {
+        abort("`...` argument to `across()` is deprecated in dplyr and not supported in Arrow")
+      }
+
+      # ARROW-17364: add support for .names argument
+      if (!is.null(across_call[[".names"]])) {
+        abort("`.names` argument to `across()` not yet supported in Arrow")
+      }
+
+      # use select() to get the column names so we can take advantage of tidyselect
+      cols <- names(select(.data, !!across_call[[".cols"]]))
+      funcs <- as.character(across_call[[".fns"]])
+
+      # calling across() with .fns = NULL returns all columns unchanged
+      if (is_empty(funcs)) {
+        return()
+      }
+
+      if (funcs[[1]] == "~") {
+        abort(
+          paste(
+            "purrr-style lambda functions as `.fns` argument to `across()`",
+            "not yet supported in Arrow"
+          )
+        )
+      }
+
+      # if only 1 function, we overwrite the old columns with the new values
+      if (length(funcs) == 1) {
+        # work out the quosures from the call
+        col_syms <- syms(cols)
+        new_quos <- map(col_syms, ~ quo(!!call2(funcs, .x)))
+        new_quos <- set_names(new_quos, cols)
+      } else {
+        # remove `c()` and `list()` which have been used to specify functions
+        extracted_funcs <- funcs[map_lgl(funcs, ~ !.x %in% c("c", "list"))]
+
+        func_list <- ensure_named_funcs(extracted_funcs)
+        new_quos <- quosures_from_func_list(func_list, cols)
+      }
+
+      quos_out <- append(quos_out, new_quos)
+    } else {
+      quos_out <- append(quos_out, quo_in)
+    }
+  }
+
+  quos_out
+}
+
+# if the function is unnamed (an empty character), use the index instead
+ensure_named_funcs <- function(funcs) {
+  func_list <- as.list(funcs)
+  func_names <- names(funcs) %||% rep("", length(funcs))
+  func_indices <- seq_along(funcs)
+  names(func_list) <- map2_chr(func_names, func_indices, max)
+  func_list
+}
+
+# given a named list of functions and column names, create a list of new quosures
+quosures_from_func_list <- function(func_list, cols) {
+  func_list_full <- rep(func_list, length(cols))
+  cols_list_full <- rep(cols, each = length(func_list))
+
+  # get names of new quosures
+  new_quo_names <- map2_chr(
+    names(func_list_full), cols_list_full,
+    ~ paste(.y, .x, sep = "_")
+  )

Review Comment:
   Now implemented here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13786: ARROW-11699: [R] Implement dplyr::across()

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13786:
URL: https://github.com/apache/arrow/pull/13786#issuecomment-1203292495

   :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13786: ARROW-11699: [R] Implement dplyr::across()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13786:
URL: https://github.com/apache/arrow/pull/13786#discussion_r936124019


##########
r/tests/testthat/test-dplyr-mutate.R:
##########
@@ -279,16 +279,16 @@ test_that("dplyr::mutate's examples", {
   # Examples we don't support should succeed
   # but warn that they're pulling data into R to do so
 
-  # across and autosplicing: ARROW-11699
   compare_dplyr_binding(
     .input %>%
       select(name, homeworld, species) %>%
       mutate(across(!name, as.factor)) %>%
       collect(),
     starwars,
-    warning = "Expression across.*not supported in Arrow"
   )

Review Comment:
   This test now fails with the following error:
   
   ```
   Error in `collect()`:
   ! Invalid: ExecuteScalarExpression cannot Execute non-scalar expression dictionary_encode(homeworld, {null_encoding_behavior=MASK})
   /home/nic2/arrow/cpp/src/arrow/compute/exec/project_node.cc:91  ExecuteScalarExpression(simplified_expr, target, plan()->exec_context())
   /home/nic2/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:559  iterator_.Next()
   /home/nic2/arrow/cpp/src/arrow/record_batch.cc:337  ReadNext(&batch)
   /home/nic2/arrow/cpp/src/arrow/record_batch.cc:351  ToRecordBatches()
   Run `rlang::last_error()` to see where the error occurred.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on pull request #13786: ARROW-11699: [R] Implement dplyr::across() for mutate()

Posted by GitBox <gi...@apache.org>.
paleolimbot commented on PR #13786:
URL: https://github.com/apache/arrow/pull/13786#issuecomment-1231639562

   I think you nailed it - the `mutate()` tests are about matching dplyr's functionality (e.g., they will fail if dplyr changes something and makes this code out of sync); the `expand_across()` tests are about `expand_across()`; and we need both.
   
   I would personally just copy/paste all the places where `across()` appears in your mutate tests but `expect_identical(expand_across(quo(across(...)), ...)` instead. If you feel this is out of scope for this PR I am OK with that, too (but I think it's worth trying).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org