You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/15 14:23:33 UTC

[GitHub] [arrow] thisisnic opened a new pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

thisisnic opened a new pull request #10724:
URL: https://github.com/apache/arrow/pull/10724


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671396058



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,45 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  if (!is.null(missing)) {
+    arrow_not_supported("missing argument")
+  }
+
+  # TODO: if_else doesn't yet support factors/dictionaries this can be removed when
+  # ARROW-13358 merges
+  warn_r_types <- is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  warn_expression_types_true <- inherits(true, "Expression") &&  nse_funcs$is.factor(true)
+  warn_expression_types_false <- inherits(false, "Expression") && nse_funcs$is.factor(false)
+
+  if (warn_r_types | warn_expression_types_true | warn_expression_types_false) {
+    warning("Factors are currently converted to chracters in if_else and ifelse", call. = FALSE)
+  }

Review comment:
       Is this unique to if_else or a more general issue? If the latter, it might be better just documented somewhere (probably, tbh, in a vignette we haven't written yet). 
   
   If we're keeping the warning, I'm not sure we should (only) say "factor" or "character" since it's actually about the Arrow types.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r683043704



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       > We don't have a way to emit warnings
   Only partially related but this reminded me to create ARROW-13566 which might help with these sorts of situations

##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       > We don't have a way to emit warnings
   
   Only partially related but this reminded me to create ARROW-13566 which might help with these sorts of situations




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671359140



##########
File path: r/tests/testthat/helper-expectation.R
##########
@@ -91,7 +91,7 @@ expect_dplyr_equal <- function(expr,
 
   if (isTRUE(warning)) {
     # Special-case the simple warning:
-    warning <- "not supported in Arrow; pulling data into R"
+    warning <- "not supported (in|by) Arrow; pulling data into R"

Review comment:
       Please add a TODO note (and optionally make a JIRA)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671402622



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,45 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  if (!is.null(missing)) {
+    arrow_not_supported("missing argument")
+  }
+
+  # TODO: if_else doesn't yet support factors/dictionaries this can be removed when
+  # ARROW-13358 merges
+  warn_r_types <- is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  warn_expression_types_true <- inherits(true, "Expression") &&  nse_funcs$is.factor(true)
+  warn_expression_types_false <- inherits(false, "Expression") && nse_funcs$is.factor(false)
+
+  if (warn_r_types | warn_expression_types_true | warn_expression_types_false) {
+    warning("Factors are currently converted to chracters in if_else and ifelse", call. = FALSE)
+  }

Review comment:
       This is a more general issue (e.g. I believe all arithmetic kernels do the same thing). However for those kernels it's more unavoidable; for if_else and friends maybe we should implement specific dictionary handling.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671359965



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,45 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  if (!is.null(missing)) {
+    arrow_not_supported("missing argument")
+  }
+
+  # TODO: if_else doesn't yet support factors/dictionaries this can be removed when
+  # ARROW-13358 merges
+  warn_r_types <- is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  warn_expression_types_true <- inherits(true, "Expression") &&  nse_funcs$is.factor(true)
+  warn_expression_types_false <- inherits(false, "Expression") && nse_funcs$is.factor(false)
+
+  if (warn_r_types | warn_expression_types_true | warn_expression_types_false) {
+    warning("Factors are currently converted to chracters in if_else and ifelse", call. = FALSE)
+  }

Review comment:
       They are converted? Where/how?
   
   Also "chracters" is misspelled here and in the test




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r682697874



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       That sounds great. Cause the null-encode is basically the only other safe option and is also pretty common to want




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r670772307



##########
File path: r/R/dplyr-functions.R
##########
@@ -559,3 +559,24 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
   Expression$create("day_of_week", x, options = list(one_based_numbering = TRUE, week_start = week_start))
 
 }
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  if (inherits(true, "character") || inherits(false, "character")) {
+    stop("`true` and `false` character values not yet supported in Arrow")

Review comment:
       I've added a bit more testing for this. It's a bit hacky using `nse_funcs$is.character(...)`. I haven't dug too deeply into if we should allow `is.character(var)` like was rejected here: https://issues.apache.org/jira/browse/ARROW-12781?focusedCommentId=17344170&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17344170 but even if we should, that's probably a separate ticket.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671579494



##########
File path: r/R/arrow-datum.R
##########
@@ -47,10 +47,29 @@ is.infinite.ArrowDatum <- function(x) {
 }
 
 #' @export
-is.na.ArrowDatum <- function(x) call_function("is_null", x)
+is.na.ArrowDatum <- function(x) {
+  # TODO: if an option is added to the is_null kernel to treat NaN as NA,
+  # use that to simplify the code here (ARROW-13367)
+  if (x$type_id() %in% TYPES_WITH_NAN) {
+    call_function("is_nan", x) | call_function("is_null", x)
+  } else {
+    call_function("is_null", x)
+  }
+}
 
 #' @export
-is.nan.ArrowDatum <- function(x) call_function("is_nan", x)
+is.nan.ArrowDatum <- function(x) {
+  if (x$type_id() %in% TYPES_WITH_NAN) {
+    # TODO: if an option is added to the is_nan kernel to treat NA as NaN,
+    # use that to simplify the code here (ARROW-13366)
+    call_function("is_nan", x) & call_function("is_valid", x)
+  } else {
+    # This is just a hacky way to return an ArrowDatum identical to the input
+    # in shape but with a Boolean value of false in every position.
+    # TODO: implement this more efficiently and elegantly if possible
+    call_function("is_valid", x) & call_function("is_null", x)

Review comment:
       How about this?
   
   ```suggestion
       Scalar$create(FALSE)$as_array(length(x))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r670764086



##########
File path: r/R/dplyr-functions.R
##########
@@ -559,3 +559,24 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
   Expression$create("day_of_week", x, options = list(one_based_numbering = TRUE, week_start = week_start))
 
 }
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  if (inherits(true, "character") || inherits(false, "character")) {
+    stop("`true` and `false` character values not yet supported in Arrow")

Review comment:
       _it turns out_ only a limited set of types are supported right now ARROW-12955 has a PR to add other types. I've added some comments + tests around this and linked to that ticket as well. The types that are currently supported are:  Boolean, Null, Numeric, Temporal

##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -945,3 +945,63 @@ test_that("abs()", {
     df
   )
 })
+
+test_that("if_else and ifelse", {
+  df <- tibble(x = c(-127, -10, -1, -0 , 0, 1, 10, 127, NA))

Review comment:
       Here are (some of) the c++ tests, which look pretty comprehensive for the types that are currently supported: https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_if_else_test.cc#L49-L94




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671580769



##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -1013,44 +1049,166 @@ test_that("log functions", {
   )
 
 })
-  
+
 test_that("trig functions", {
-  
+
   df <- tibble(x = c(seq(from = 0, to = 1, by = 0.1), NA))
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = sin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = cos(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = tan(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = asin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = acos(x)) %>%
       collect(),
     df
   )
 
-})
\ No newline at end of file
+})
+
+test_that("if_else and ifelse", {
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, int, 0L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_error(
+    Table$create(example_data) %>%
+      mutate(
+        y = if_else(int > 5, 1, FALSE)
+      ) %>% collect(),
+    'NotImplemented: Function if_else has no kernel matching input types'
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, NA_real_)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = ifelse(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, TRUE, FALSE)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(chr %in% letters[1:3], 1L, 3L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, "one", "zero")
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, chr, chr)

Review comment:
       This is a meaningless test right? You get `chr` either way. Feel free to add more columns to the example data, or create your own ad hoc data frame to test with. I'd like to see tests where one or both of `yes`/`no` are scalars or columns in the dataset.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671525674



##########
File path: r/R/dplyr-functions.R
##########
@@ -57,6 +57,24 @@ nse_funcs$cast <- function(x, target_type, safe = TRUE, ...) {
   Expression$create("cast", x, options = opts)
 }
 
+nse_funcs$is.na <- function(x) {
+  if (is.double(x) || (inherits(x, "Expression") &&
+      x$type_id() %in% TYPES_WITH_NAN)) {
+    build_expr("is_nan", x) | build_expr("is_null", x)
+  } else {
+    build_expr("is_null", x)
+  }
+}
+
+nse_funcs$is.nan <- function(x) {
+  if (is.double(x) || (inherits(x, "Expression") &&
+      x$type_id() %in% TYPES_WITH_NAN)) {
+    build_expr("is_nan", x) & !build_expr("is_null", x)

Review comment:
       ```suggestion
       build_expr("is_nan", x) & build_expr("is_valid", x)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r682696886



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       Yeah, that's also reasonable - error only if we see a value we can't encode (and we could add a toggle to automatically null-encode it or just error).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671514276



##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -1013,44 +1013,158 @@ test_that("log functions", {
   )
 
 })
-  
+
 test_that("trig functions", {
-  
+
   df <- tibble(x = c(seq(from = 0, to = 1, by = 0.1), NA))
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = sin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = cos(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = tan(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = asin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = acos(x)) %>%
       collect(),
     df
   )
 
-})
\ No newline at end of file
+})
+
+test_that("if_else and ifelse", {
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, int, 0L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_error(
+    Table$create(example_data) %>%
+      mutate(
+        y = if_else(int > 5, 1, FALSE)
+      ) %>% collect(),
+    'NotImplemented: Function if_else has no kernel matching input types'
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, NA_real_)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = ifelse(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, TRUE, FALSE)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(chr %in% letters[1:3], 1L, 3L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, "one", "zero")
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, chr, chr)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, chr, chr, missing = "MISSING")
+      ) %>% collect(),
+    example_data
+  )
+
+  # TODO: remove the mutate + warning after ARROW-13358 is merged and Arrow
+  # supports factors in if(_)else
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, fct, factor("a"))
+      ) %>% collect() %>%
+      # This is a no-op on the Arrow side, but necesary to make the results equal
+      mutate(y = as.character(y)),
+    example_data,
+    warning = "Factors are currently converted to characters in if_else and ifelse"
+  )
+
+  skip("ARROW-12055 for better NaN support")
+  # currently NaNs are not NAs and so the missing argument is not correctly
+  # applied
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, chr, chr, missing = "MISSING")
+      ) %>% collect(),
+    example_data_for_sorting
+  )

Review comment:
       ARROW-12055 is now resolved by this PR




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r670705908



##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -945,3 +945,63 @@ test_that("abs()", {
     df
   )
 })
+
+test_that("if_else and ifelse", {
+  df <- tibble(x = c(-127, -10, -1, -0 , 0, 1, 10, 127, NA))

Review comment:
       Good point. I'll look and see if they are, if not I'll add some.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671579805



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +659,45 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  if (!is.null(missing)) {
+    return(nse_funcs$if_else(
+      is.na(condition),

Review comment:
       Is it better to explicitly call the nse_func here?
   
   ```suggestion
         nse_funcs$is.na(condition),
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671403052



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,45 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  if (!is.null(missing)) {
+    arrow_not_supported("missing argument")
+  }
+
+  # TODO: if_else doesn't yet support factors/dictionaries this can be removed when
+  # ARROW-13358 merges
+  warn_r_types <- is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  warn_expression_types_true <- inherits(true, "Expression") &&  nse_funcs$is.factor(true)
+  warn_expression_types_false <- inherits(false, "Expression") && nse_funcs$is.factor(false)
+
+  if (warn_r_types | warn_expression_types_true | warn_expression_types_false) {
+    warning("Factors are currently converted to chracters in if_else and ifelse", call. = FALSE)
+  }

Review comment:
       e.g case_when does the same, so will coalesce when merged.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671334301



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       _turns out_ we're not quite there yet — the last (common) type that's not yet implemented still is factors/dictionaries. Currently they are auto decoded into strings — which is better than nothing, but not quite right. 
   
   I can use the (fragile) `is.factor()` approach here to warn that though we are getting factors, we are returning strings (for now) or we could silently do the conversation with no checking. Or, I guess we could also disable factors entirely, but that seems extreme. I vote that we warn so that there's no confusion about the change in type, even if it is a bit fragile.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671404227



##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -1013,44 +1013,158 @@ test_that("log functions", {
   )
 
 })
-  
+
 test_that("trig functions", {
-  
+
   df <- tibble(x = c(seq(from = 0, to = 1, by = 0.1), NA))
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = sin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = cos(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = tan(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = asin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = acos(x)) %>%
       collect(),
     df
   )
 
-})
\ No newline at end of file
+})
+
+test_that("if_else and ifelse", {
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, int, 0L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_error(
+    Table$create(example_data) %>%
+      mutate(
+        y = if_else(int > 5, 1, FALSE)
+      ) %>% collect(),
+    'NotImplemented: Function if_else has no kernel matching input types'
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, NA_real_)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = ifelse(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, TRUE, FALSE)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(chr %in% letters[1:3], 1L, 3L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, "one", "zero")
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, chr, chr)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, chr, chr, missing = "MISSING")
+      ) %>% collect(),
+    example_data
+  )
+
+  # TODO: remove the mutate + warning after ARROW-13358 is merged and Arrow
+  # supports factors in if(_)else
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, fct, factor("a"))
+      ) %>% collect() %>%
+      # This is a no-op on the Arrow side, but necesary to make the results equal
+      mutate(y = as.character(y)),
+    example_data,
+    warning = "Factors are currently converted to characters in if_else and ifelse"
+  )
+
+  skip("ARROW-12055 for better NaN support")
+  # currently NaNs are not NAs and so the missing argument is not correctly
+  # applied
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, chr, chr, missing = "MISSING")
+      ) %>% collect(),
+    example_data_for_sorting
+  )

Review comment:
       This skip is not great, I could probably write up a hacky helper that basically checks to see if the condition `is.na` and if it's not then _only if the type is double_ do `is.nan()`. And maybe I should do that already so that we don't silently differ from expectations here...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r670704661



##########
File path: r/R/dplyr-functions.R
##########
@@ -559,3 +559,24 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
   Expression$create("day_of_week", x, options = list(one_based_numbering = TRUE, week_start = week_start))
 
 }
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  if (inherits(true, "character") || inherits(false, "character")) {
+    stop("`true` and `false` character values not yet supported in Arrow")

Review comment:
       Why?

##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -945,3 +945,63 @@ test_that("abs()", {
     df
   )
 })
+
+test_that("if_else and ifelse", {
+  df <- tibble(x = c(-127, -10, -1, -0 , 0, 1, 10, 127, NA))

Review comment:
       Is there a reason we only test with type numeric? Do we trust that all types are faithfully tested in C++?

##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -945,3 +945,63 @@ test_that("abs()", {
     df
   )
 })
+
+test_that("if_else and ifelse", {
+  df <- tibble(x = c(-127, -10, -1, -0 , 0, 1, 10, 127, NA))
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(x > 0, 1, 0)
+      ) %>% collect(),
+    df
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(x > 0, x, 0)
+      ) %>% collect(),
+    df
+  )
+
+  expect_error(
+    Table$create(df) %>%
+      mutate(
+        y = if_else(x > 0, 1, FALSE)
+      ) %>% collect(),
+    'NotImplemented: Function if_else has no kernel matching input types'
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(x > 0, 1, NA_real_)
+      ) %>% collect(),
+    df
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = ifelse(x > 0, 1, 0)
+      ) %>% collect(),
+    df
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = ifelse(x > 0, x, 0)
+      ) %>% collect(),
+    df
+  )
+
+  skip("TODO: could? should? we support the autocasting in ifelse")

Review comment:
       Every compute function should autopromote the same way right? How does this fail?

##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -945,3 +945,63 @@ test_that("abs()", {
     df
   )
 })
+
+test_that("if_else and ifelse", {
+  df <- tibble(x = c(-127, -10, -1, -0 , 0, 1, 10, 127, NA))
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(x > 0, 1, 0)
+      ) %>% collect(),
+    df
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(x > 0, x, 0)
+      ) %>% collect(),
+    df
+  )
+
+  expect_error(
+    Table$create(df) %>%
+      mutate(
+        y = if_else(x > 0, 1, FALSE)
+      ) %>% collect(),
+    'NotImplemented: Function if_else has no kernel matching input types'
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(x > 0, 1, NA_real_)
+      ) %>% collect(),
+    df
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = ifelse(x > 0, 1, 0)
+      ) %>% collect(),
+    df
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = ifelse(x > 0, x, 0)
+      ) %>% collect(),
+    df
+  )
+
+  skip("TODO: could? should? we support the autocasting in ifelse")

Review comment:
       Oh I see above. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671387944



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,45 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  if (!is.null(missing)) {
+    arrow_not_supported("missing argument")

Review comment:
       It provides a value to be used if the `condition` is missing/NA:
   
   ```
   > vec <- c(1, 2, NA, 3, 4)
   > dplyr::if_else(vec < 4, "<4", ">=4", missing = "MISSING")
   [1] "<4"      "<4"      "MISSING" "<4"      ">=4"    
   ```
   
   Actually, we could support it now with a nested if_else... I'll work on that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671525434



##########
File path: r/R/arrow-datum.R
##########
@@ -47,10 +47,25 @@ is.infinite.ArrowDatum <- function(x) {
 }
 
 #' @export
-is.na.ArrowDatum <- function(x) call_function("is_null", x)
+is.na.ArrowDatum <- function(x) {
+  if (x$type_id() %in% TYPES_WITH_NAN) {
+    call_function("is_nan", x) | call_function("is_null", x)
+  } else {
+    call_function("is_null", x)
+  }
+}
 
 #' @export
-is.nan.ArrowDatum <- function(x) call_function("is_nan", x)
+is.nan.ArrowDatum <- function(x) {
+  if (x$type_id() %in% TYPES_WITH_NAN) {
+    call_function("is_nan", x) & !call_function("is_null", x)

Review comment:
       Can save a function call this way:
   
   ```suggestion
       call_function("is_nan", x) & call_function("is_valid", x)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#issuecomment-880871906


   Ok, I've pushed a few changes. The biggest one is that I've removed the `assert_that` call (since it wasn't quite what we wanted) and am not relying on the kernel dispatch to tell us if we have incompatible types.
   
   We might want to have our own type checking, but it's not totally trivial with what we have now. We have some of the `is.*` methods after https://issues.apache.org/jira/browse/ARROW-12781, though for this what we really need is something like an extension of `typeof()` for expressions that return the type and then a function that compares those types + the R types and their arrow mappings to ensure that those are compatible. I think this is out of the scope for this PR (and might actually be deferrable until https://issues.apache.org/jira/browse/ARROW-13186 is done or possibly forever).
   
   We don't support `ifelse`'s autocasting abilities — I'm hesitant to even try since it's not a particularly stable or good behavior, though I wish there was a way we could message or warn explaining that / why we did that. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671527574



##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -1013,44 +1013,158 @@ test_that("log functions", {
   )
 
 })
-  
+
 test_that("trig functions", {
-  
+
   df <- tibble(x = c(seq(from = 0, to = 1, by = 0.1), NA))
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = sin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = cos(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = tan(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = asin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = acos(x)) %>%
       collect(),
     df
   )
 
-})
\ No newline at end of file
+})
+
+test_that("if_else and ifelse", {
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, int, 0L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_error(
+    Table$create(example_data) %>%
+      mutate(
+        y = if_else(int > 5, 1, FALSE)
+      ) %>% collect(),
+    'NotImplemented: Function if_else has no kernel matching input types'
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, NA_real_)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = ifelse(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, TRUE, FALSE)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(chr %in% letters[1:3], 1L, 3L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, "one", "zero")
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, chr, chr)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, chr, chr, missing = "MISSING")
+      ) %>% collect(),
+    example_data
+  )
+
+  # TODO: remove the mutate + warning after ARROW-13358 is merged and Arrow
+  # supports factors in if(_)else
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, fct, factor("a"))
+      ) %>% collect() %>%
+      # This is a no-op on the Arrow side, but necesary to make the results equal
+      mutate(y = as.character(y)),
+    example_data,
+    warning = "Factors are currently converted to characters in if_else and ifelse"
+  )
+
+  skip("ARROW-12055 for better NaN support")
+  # currently NaNs are not NAs and so the missing argument is not correctly
+  # applied
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, chr, chr, missing = "MISSING")
+      ) %>% collect(),
+    example_data_for_sorting
+  )

Review comment:
       Can this skip now be removed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671526253



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +655,45 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  if (!is.null(missing)) {
+    return(nse_funcs$if_else(
+      is.na(condition),
+      missing,
+      nse_funcs$if_else(condition, true, false)
+    ))
+  }
+
+  # if_else doesn't yet support factors/dictionaries
+  # TODO: remove this after ARROW-13358 is merged
+  warn_types <- nse_funcs$is.factor(true) | nse_funcs$is.factor(false)
+  if (warn_types) {
+    warning("Dictionaries (in R: factors) are currently converted to strings (characters) in if_else and ifelse", call. = FALSE)
+  }
+
+  build_expr("if_else", condition, true, false)
+}
+
+# Although base R ifelse allows `yes` and `no` to be different classes
+#
+nse_funcs$ifelse <- function(test, yes, no){
+ nse_funcs$if_else(condition = test, true = yes, false = no)

Review comment:
       ```suggestion
     nse_funcs$if_else(condition = test, true = yes, false = no)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671310863



##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -1013,44 +1013,139 @@ test_that("log functions", {
   )
 
 })
-  
+
 test_that("trig functions", {
-  
+
   df <- tibble(x = c(seq(from = 0, to = 1, by = 0.1), NA))
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = sin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = cos(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = tan(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = asin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = acos(x)) %>%
       collect(),
     df
   )
 
-})
\ No newline at end of file
+})
+
+test_that("if_else and ifelse", {
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, int, 0L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_error(
+    Table$create(example_data) %>%
+      mutate(
+        y = if_else(int > 5, 1, FALSE)
+      ) %>% collect(),
+    'NotImplemented: Function if_else has no kernel matching input types'
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, NA_real_)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = ifelse(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, TRUE, FALSE)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(chr %in% letters[1:3], 1L, 3L)
+      ) %>% collect(),
+    example_data
+  )
+
+  # TODO: this should not warn / pull into R, once ARROW-12955 merges

Review comment:
       I'm doing that right now




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671258951



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly

Review comment:
       @thisisnic may I push some commits to this PR that make the `is.*` functions compatible with R scalars so that this can be simplified? I've been meaning to do that in ARROW-13118 anyhow. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671309529



##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -1013,44 +1013,139 @@ test_that("log functions", {
   )
 
 })
-  
+
 test_that("trig functions", {
-  
+
   df <- tibble(x = c(seq(from = 0, to = 1, by = 0.1), NA))
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = sin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = cos(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = tan(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = asin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = acos(x)) %>%
       collect(),
     df
   )
 
-})
\ No newline at end of file
+})
+
+test_that("if_else and ifelse", {
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, int, 0L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_error(
+    Table$create(example_data) %>%
+      mutate(
+        y = if_else(int > 5, 1, FALSE)
+      ) %>% collect(),
+    'NotImplemented: Function if_else has no kernel matching input types'
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, NA_real_)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = ifelse(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, TRUE, FALSE)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(chr %in% letters[1:3], 1L, 3L)
+      ) %>% collect(),
+    example_data
+  )
+
+  # TODO: this should not warn / pull into R, once ARROW-12955 merges

Review comment:
       This issue has merged, can you rebase and clean this up?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671348517



##########
File path: r/tests/testthat/helper-expectation.R
##########
@@ -91,7 +91,7 @@ expect_dplyr_equal <- function(expr,
 
   if (isTRUE(warning)) {
     # Special-case the simple warning:
-    warning <- "not supported in Arrow; pulling data into R"
+    warning <- "not supported (in|by) Arrow; pulling data into R"

Review comment:
       Oops, we might want to harmonise `arrow_not_supported` with it's "not supported by Arrow" with the "not supported in Arrow" phrasing used elsewhere (though I think we should do that in a separate ticket), but this regex works for now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r682676388



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       I would say that the base R version here is wrong / never what someone actually wants.
   
   I don't think we want to get into the business of coalescing/merging the dictionaries to be the same (there are many edge cases that can lead to very funny outcomes). But emulating the dplyr behavior seems reasonable here (use the levels of the first, merge the values together and any value that's not in the levels of the first gets an NA + warning that the dictionaries didn't match)

##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       Aaah, ok. Would it be possible to do something like this then: Use the levels of the first, merge the values together, and error on any value that's not in the levels of the first? (where we could redirect the person to either re-encode the dictionaries or `NULL` the offending values)
   
   IME it's not uncommon to have a circumstance where you filter down to rows that have values that overlap (even though the full dictionaries are different) and forcing someone to re-encode there when no offending value would ever be there could be a bit frustrating.
   

##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       That sounds great. Cause the null-encode is basically the only other safe option and is also pretty common to want




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671309005



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       Why do this at all? Why not let the C++ library tell us (by whether it succeeds or not) what it supports? This seems brittle and I don't want to have to maintain duplicated validation code if we don't have to.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671580198



##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -1013,44 +1013,158 @@ test_that("log functions", {
   )
 
 })
-  
+
 test_that("trig functions", {
-  
+
   df <- tibble(x = c(seq(from = 0, to = 1, by = 0.1), NA))
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = sin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = cos(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = tan(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = asin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = acos(x)) %>%
       collect(),
     df
   )
 
-})
\ No newline at end of file
+})
+
+test_that("if_else and ifelse", {
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, int, 0L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_error(
+    Table$create(example_data) %>%
+      mutate(
+        y = if_else(int > 5, 1, FALSE)
+      ) %>% collect(),
+    'NotImplemented: Function if_else has no kernel matching input types'
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, NA_real_)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = ifelse(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, TRUE, FALSE)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(chr %in% letters[1:3], 1L, 3L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, "one", "zero")
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, chr, chr)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, chr, chr, missing = "MISSING")
+      ) %>% collect(),
+    example_data
+  )
+
+  # TODO: remove the mutate + warning after ARROW-13358 is merged and Arrow
+  # supports factors in if(_)else
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, fct, factor("a"))
+      ) %>% collect() %>%
+      # This is a no-op on the Arrow side, but necesary to make the results equal
+      mutate(y = as.character(y)),
+    example_data,
+    warning = "Factors are currently converted to characters in if_else and ifelse"
+  )
+
+  skip("ARROW-12055 for better NaN support")
+  # currently NaNs are not NAs and so the missing argument is not correctly
+  # applied
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, chr, chr, missing = "MISSING")
+      ) %>% collect(),
+    example_data_for_sorting
+  )

Review comment:
       Gosh, missing data is hard 🤔 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson closed pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson closed pull request #10724:
URL: https://github.com/apache/arrow/pull/10724


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671543533



##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -524,7 +524,7 @@ test_that("is.finite(), is.infinite(), is.nan()", {
       ) %>% collect(),
     df
   )
-  skip("is.nan() evaluates to NA on NA values (ARROW-12850)")
+  # is.nan() evaluates to FALSE on NA_real_ (ARROW-12850)

Review comment:
       I also created ARROW-13367 for handling the `is.na` treatment of `NaN`s with a C++ option. `TODO` comments for these are added in 6c2876335b5b9481f8ac9edd069e1e130e486bac




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r682671015



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       Now that I'm looking at explicit dictionary support, what is the expectation for how dictionaries behave? Do we require that all inputs have the same exact dictionary, or should we merge dictionaries? It looks like base R/dplyr behave inconsistently here when the dictionaries differ:
   
   ```r
   > library(dplyr)
   
   Attaching package: ‘dplyr’
   
   The following objects are masked from ‘package:stats’:
   
       filter, lag
   
   The following objects are masked from ‘package:base’:
   
       intersect, setdiff, setequal, union
   
   > fct1 <- factor(c("a", "b"), levels = c("a", "b", "c"))
   > fct2 <- factor(c("a", "d"), levels = c("a", "b", "d"))
   > int <- c(10, 2)
   > if_else(int > 5, fct1, fct2)
   [1] a    <NA>
   Levels: a b c
   Warning message:
   In `[<-.factor`(`*tmp*`, i, value = 3L) :
     invalid factor level, NA generated
   > ifelse(int > 5, fct1, fct2)
   [1] 1 3
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671538131



##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -524,7 +524,7 @@ test_that("is.finite(), is.infinite(), is.nan()", {
       ) %>% collect(),
     df
   )
-  skip("is.nan() evaluates to NA on NA values (ARROW-12850)")
+  # is.nan() evaluates to FALSE on NA_real_ (ARROW-12850)

Review comment:
       I created ARROW-13366 for this. I'll add `TODO` comments that reference this in the `is.nan` implementations, not here in the tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r683043704



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       > We don't have a way to emit warnings
   Only partially related but this reminded me to create ARROW-13566 which might help with these sorts of situations

##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       > We don't have a way to emit warnings
   
   Only partially related but this reminded me to create ARROW-13566 which might help with these sorts of situations




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671527041



##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -524,7 +524,7 @@ test_that("is.finite(), is.infinite(), is.nan()", {
       ) %>% collect(),
     df
   )
-  skip("is.nan() evaluates to NA on NA values (ARROW-12850)")
+  # is.nan() evaluates to FALSE on NA_real_ (ARROW-12850)

Review comment:
       Should this comment be updated to note that we've worked around this in R but that perhaps it should be handled in C++ by ARROW-12850?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#issuecomment-880737529


   https://issues.apache.org/jira/browse/ARROW-12964


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r682694052



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       Aaah, ok. Would it be possible to do something like this then: Use the levels of the first, merge the values together, and error on any value that's not in the levels of the first? (where we could redirect the person to either re-encode the dictionaries or `NULL` the offending values)
   
   IME it's not uncommon to have a circumstance where you filter down to rows that have values that overlap (even though the full dictionaries are different) and forcing someone to re-encode there when no offending value would ever be there could be a bit frustrating.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671705434



##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -1013,44 +1049,166 @@ test_that("log functions", {
   )
 
 })
-  
+
 test_that("trig functions", {
-  
+
   df <- tibble(x = c(seq(from = 0, to = 1, by = 0.1), NA))
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = sin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = cos(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = tan(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = asin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = acos(x)) %>%
       collect(),
     df
   )
 
-})
\ No newline at end of file
+})
+
+test_that("if_else and ifelse", {
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, int, 0L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_error(
+    Table$create(example_data) %>%
+      mutate(
+        y = if_else(int > 5, 1, FALSE)
+      ) %>% collect(),
+    'NotImplemented: Function if_else has no kernel matching input types'
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, NA_real_)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = ifelse(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, TRUE, FALSE)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(chr %in% letters[1:3], 1L, 3L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, "one", "zero")
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, chr, chr)

Review comment:
       Yeah, absolutely, I've changed them to be more useful




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671309968



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly

Review comment:
       @ianmcook I question whether this is. checking is desirable at all here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r670706539



##########
File path: r/R/dplyr-functions.R
##########
@@ -559,3 +559,24 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
   Expression$create("day_of_week", x, options = list(one_based_numbering = TRUE, week_start = week_start))
 
 }
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  if (inherits(true, "character") || inherits(false, "character")) {
+    stop("`true` and `false` character values not yet supported in Arrow")

Review comment:
       Hmm this might be left over from above, I'll take this out (or add why it needs to stay) when I rebase + fix conflicts as well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r682676388



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       I would say that the base R version here is wrong / never what someone actually wants.
   
   I don't think we want to get into the business of coalescing/merging the dictionaries to be the same (there are many edge cases that can lead to very funny outcomes). But emulating the dplyr behavior seems reasonable here (use the levels of the first, merge the values together and any value that's not in the levels of the first gets an NA + warning that the dictionaries didn't match)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671358648



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,45 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  if (!is.null(missing)) {
+    arrow_not_supported("missing argument")

Review comment:
       What is this argument? Do we plan to support it in the future (i.e. is there a JIRA, or should there be)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r682686202



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       We don't have a way to emit warnings, unfortunately; we'd have to either error on mismatched dictionaries, or implement something like casts where you choose to allow/deny 'unsafe' conversions ahead of time. I think it's reasonable to require that the user explicitly merge dictionaries (and re-encode data) before calling if that's what they desire (though you could imagine fusing the two for performance reasons if desired down the line).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r682671015



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       Now that I'm looking at explicit dictionary support, what is the expectation for how dictionaries behave? Do we require that all inputs have the same exact dictionary, or should we merge dictionaries? It looks like base R/dplyr behave inconsistently here when the dictionaries differ:
   
   ```r
   > library(dplyr)
   
   Attaching package: ‘dplyr’
   
   The following objects are masked from ‘package:stats’:
   
       filter, lag
   
   The following objects are masked from ‘package:base’:
   
       intersect, setdiff, setequal, union
   
   > fct1 <- factor(c("a", "b"), levels = c("a", "b", "c"))
   > fct2 <- factor(c("a", "d"), levels = c("a", "b", "d"))
   > int <- c(10, 2)
   > if_else(int > 5, fct1, fct2)
   [1] a    <NA>
   Levels: a b c
   Warning message:
   In `[<-.factor`(`*tmp*`, i, value = 3L) :
     invalid factor level, NA generated
   > ifelse(int > 5, fct1, fct2)
   [1] 1 3
   ```
   

##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       We don't have a way to emit warnings, unfortunately; we'd have to either error on mismatched dictionaries, or implement something like casts where you choose to allow/deny 'unsafe' conversions ahead of time. I think it's reasonable to require that the user explicitly merge dictionaries (and re-encode data) before calling if that's what they desire (though you could imagine fusing the two for performance reasons if desired down the line).

##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       Yeah, that's also reasonable - error only if we see a value we can't encode (and we could add a toggle to automatically null-encode it or just error).

##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       I'll try to also get this into the coalesce/select/if_else kernels when I get a chance (though I'll start with case_when since I'm already in there).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671404572



##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -1013,44 +1013,158 @@ test_that("log functions", {
   )
 
 })
-  
+
 test_that("trig functions", {
-  
+
   df <- tibble(x = c(seq(from = 0, to = 1, by = 0.1), NA))
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = sin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = cos(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = tan(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = asin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = acos(x)) %>%
       collect(),
     df
   )
 
-})
\ No newline at end of file
+})
+
+test_that("if_else and ifelse", {
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, int, 0L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_error(
+    Table$create(example_data) %>%
+      mutate(
+        y = if_else(int > 5, 1, FALSE)
+      ) %>% collect(),
+    'NotImplemented: Function if_else has no kernel matching input types'
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, NA_real_)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = ifelse(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, TRUE, FALSE)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(chr %in% letters[1:3], 1L, 3L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, "one", "zero")
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, chr, chr)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, chr, chr, missing = "MISSING")
+      ) %>% collect(),
+    example_data
+  )
+
+  # TODO: remove the mutate + warning after ARROW-13358 is merged and Arrow
+  # supports factors in if(_)else
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, fct, factor("a"))
+      ) %>% collect() %>%
+      # This is a no-op on the Arrow side, but necesary to make the results equal
+      mutate(y = as.character(y)),
+    example_data,
+    warning = "Factors are currently converted to characters in if_else and ifelse"
+  )
+
+  skip("ARROW-12055 for better NaN support")
+  # currently NaNs are not NAs and so the missing argument is not correctly
+  # applied
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, chr, chr, missing = "MISSING")
+      ) %>% collect(),
+    example_data_for_sorting
+  )

Review comment:
       https://issues.apache.org/jira/browse/ARROW-12055 is the "make `NaN` actually work" ticket




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671405951



##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -1013,44 +1013,158 @@ test_that("log functions", {
   )
 
 })
-  
+
 test_that("trig functions", {
-  
+
   df <- tibble(x = c(seq(from = 0, to = 1, by = 0.1), NA))
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = sin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = cos(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = tan(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = asin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = acos(x)) %>%
       collect(),
     df
   )
 
-})
\ No newline at end of file
+})
+
+test_that("if_else and ifelse", {
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, int, 0L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_error(
+    Table$create(example_data) %>%
+      mutate(
+        y = if_else(int > 5, 1, FALSE)
+      ) %>% collect(),
+    'NotImplemented: Function if_else has no kernel matching input types'
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, NA_real_)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = ifelse(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, TRUE, FALSE)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(chr %in% letters[1:3], 1L, 3L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, "one", "zero")
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, chr, chr)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, chr, chr, missing = "MISSING")
+      ) %>% collect(),
+    example_data
+  )
+
+  # TODO: remove the mutate + warning after ARROW-13358 is merged and Arrow
+  # supports factors in if(_)else
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, fct, factor("a"))
+      ) %>% collect() %>%
+      # This is a no-op on the Arrow side, but necesary to make the results equal
+      mutate(y = as.character(y)),
+    example_data,
+    warning = "Factors are currently converted to characters in if_else and ifelse"
+  )
+
+  skip("ARROW-12055 for better NaN support")
+  # currently NaNs are not NAs and so the missing argument is not correctly
+  # applied
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, chr, chr, missing = "MISSING")
+      ) %>% collect(),
+    example_data_for_sorting
+  )

Review comment:
       But `dbl` doesn't have `NaN` does it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671347175



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,45 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  if (!is.null(missing)) {
+    arrow_not_supported("missing argument")
+  }
+
+  # TODO: if_else doesn't yet support factors/dictionaries this can be removed when
+  # ARROW-13358 merges
+  warn_r_types <- is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  warn_expression_types_true <- inherits(true, "Expression") &&  nse_funcs$is.factor(true)
+  warn_expression_types_false <- inherits(false, "Expression") && nse_funcs$is.factor(false)
+
+  if (warn_r_types | warn_expression_types_true | warn_expression_types_false) {
+    warning("Factors are currently converted to chracters in if_else and ifelse", call. = FALSE)
+  }

Review comment:
       I implemented the warning, but we can take this out if we think it would be better to not.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671312280



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       Hopefully all of this will go away once I rebase (see below) though, one benefit to having this validation + error proactively than wait for the kernel dispatching to error is that if we do it here it will abandon ship and pull into R and work that way. If we let the kernel error we only get the error (and additionally no indication that doing something like `collect()` before this step would help with the situation)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671382473



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,45 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  if (!is.null(missing)) {
+    arrow_not_supported("missing argument")
+  }
+
+  # TODO: if_else doesn't yet support factors/dictionaries this can be removed when
+  # ARROW-13358 merges
+  warn_r_types <- is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  warn_expression_types_true <- inherits(true, "Expression") &&  nse_funcs$is.factor(true)
+  warn_expression_types_false <- inherits(false, "Expression") && nse_funcs$is.factor(false)
+
+  if (warn_r_types | warn_expression_types_true | warn_expression_types_false) {
+    warning("Factors are currently converted to chracters in if_else and ifelse", call. = FALSE)
+  }

Review comment:
       The likely source of this conversion is https://github.com/apache/arrow/blob/f7cac0e5866a08e946a14976ef854e8c42ac281b/cpp/src/arrow/compute/kernels/scalar_if_else.cc#L980 (h/t @lidavidm ) When `dispatchbest` ing, dictionaries are decoded (unless there is a specific kernel that takes dictionaries).
   
   I've corrected the misspelling




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671359140



##########
File path: r/tests/testthat/helper-expectation.R
##########
@@ -91,7 +91,7 @@ expect_dplyr_equal <- function(expr,
 
   if (isTRUE(warning)) {
     # Special-case the simple warning:
-    warning <- "not supported in Arrow; pulling data into R"
+    warning <- "not supported (in|by) Arrow; pulling data into R"

Review comment:
       Please add a TODO comment (and optionally make a JIRA)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671270359



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly

Review comment:
       feel free, nic is off through monday so I took this over — simplifying it would be great though! 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671519775



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,45 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  if (!is.null(missing)) {
+    arrow_not_supported("missing argument")
+  }
+
+  # TODO: if_else doesn't yet support factors/dictionaries this can be removed when
+  # ARROW-13358 merges
+  warn_r_types <- is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  warn_expression_types_true <- inherits(true, "Expression") &&  nse_funcs$is.factor(true)
+  warn_expression_types_false <- inherits(false, "Expression") && nse_funcs$is.factor(false)
+
+  if (warn_r_types | warn_expression_types_true | warn_expression_types_false) {
+    warning("Factors are currently converted to chracters in if_else and ifelse", call. = FALSE)
+  }

Review comment:
       I've added both dictionary+factor and string+character to the warning. I agree about talking about this type of casting/coalescing in a vignette somewhere when it happens in the cases where it's more-or-less unavoidable. Though in this case, like @lidavidm says, the casting is (hopefully) only temporary so we should be able to pull this out next release (or maybe even before then if the extra functionality is added soon enough). 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r671549980



##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -1013,44 +1013,158 @@ test_that("log functions", {
   )
 
 })
-  
+
 test_that("trig functions", {
-  
+
   df <- tibble(x = c(seq(from = 0, to = 1, by = 0.1), NA))
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = sin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = cos(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = tan(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = asin(x)) %>%
       collect(),
     df
   )
-  
+
   expect_dplyr_equal(
     input %>%
       mutate(y = acos(x)) %>%
       collect(),
     df
   )
 
-})
\ No newline at end of file
+})
+
+test_that("if_else and ifelse", {
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, int, 0L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_error(
+    Table$create(example_data) %>%
+      mutate(
+        y = if_else(int > 5, 1, FALSE)
+      ) %>% collect(),
+    'NotImplemented: Function if_else has no kernel matching input types'
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, 1, NA_real_)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = ifelse(int > 5, 1, 0)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, TRUE, FALSE)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(chr %in% letters[1:3], 1L, 3L)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, "one", "zero")
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, chr, chr)
+      ) %>% collect(),
+    example_data
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, chr, chr, missing = "MISSING")
+      ) %>% collect(),
+    example_data
+  )
+
+  # TODO: remove the mutate + warning after ARROW-13358 is merged and Arrow
+  # supports factors in if(_)else
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(int > 5, fct, factor("a"))
+      ) %>% collect() %>%
+      # This is a no-op on the Arrow side, but necesary to make the results equal
+      mutate(y = as.character(y)),
+    example_data,
+    warning = "Factors are currently converted to characters in if_else and ifelse"
+  )
+
+  skip("ARROW-12055 for better NaN support")
+  # currently NaNs are not NAs and so the missing argument is not correctly
+  # applied
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        y = if_else(dbl > 5, chr, chr, missing = "MISSING")
+      ) %>% collect(),
+    example_data_for_sorting
+  )

Review comment:
       ~Unfortunately~ _as it turns out_, no. I've created https://issues.apache.org/jira/browse/ARROW-13364 to track this, but Arrow's comparison with `NaN`s results in `false` and not an `NA`(-like) value: 
   
   ```
   > example_data_for_sorting %>% mutate(
   +     y = if_else(dbl > 5, chr, chr, missing = "MISSING")
   + ) %>% collect()
   # A tibble: 10 x 7
              int          dbl chr    lgl   dttm                grp   y        
            <int>        <dbl> <chr>  <lgl> <dttm>              <chr> <chr>    
    1 -2147483647 -Inf         ""     FALSE 0000-01-01 00:00:00 A     ""       
    2        -101   -1.80e+308 ""     FALSE 1919-05-29 13:08:55 A     ""       
    3        -100   -2.23e-308 "\""   FALSE 1955-06-20 04:10:42 A     "\""     
    4           0    0         "&"    FALSE 1973-06-30 11:38:41 A     "&"      
    5           0    2.23e-308 "ABC"  TRUE  1987-03-29 12:49:47 A     "ABC"    
    6           1    3.14e+  0 "NULL" TRUE  1991-06-11 19:07:01 B     "NULL"   
    7         100    1.80e+308 "a"    TRUE  NA                  B     "a"      
    8        1000  Inf         "abc"  TRUE  2017-08-21 18:26:40 B     "abc"    
    9  2147483647  NaN         "zzz"  TRUE  2017-08-21 18:26:40 B     "MISSING"
   10          NA   NA          NA    NA    9999-12-31 23:59:59 B     "MISSING"
   > Table$create(example_data_for_sorting) %>% mutate(
   +     y = if_else(dbl > 5, chr, chr, missing = "MISSING")
   + ) %>% collect()
   # A tibble: 10 x 7
              int          dbl chr    lgl   dttm                grp   y        
            <int>        <dbl> <chr>  <lgl> <dttm>              <chr> <chr>    
    1 -2147483647 -Inf         ""     FALSE 0000-01-01 00:00:00 A     ""       
    2        -101   -1.80e+308 ""     FALSE 1919-05-29 13:08:55 A     ""       
    3        -100   -2.23e-308 "\""   FALSE 1955-06-20 04:10:42 A     "\""     
    4           0    0         "&"    FALSE 1973-06-30 11:38:41 A     "&"      
    5           0    2.23e-308 "ABC"  TRUE  1987-03-29 12:49:47 A     "ABC"    
    6           1    3.14e+  0 "NULL" TRUE  1991-06-11 19:07:01 B     "NULL"   
    7         100    1.80e+308 "a"    TRUE  NA                  B     "a"      
    8        1000  Inf         "abc"  TRUE  2017-08-21 18:26:40 B     "abc"    
    9  2147483647  NaN         "zzz"  TRUE  2017-08-21 18:26:40 B     "zzz"    
   10          NA   NA          NA    NA    9999-12-31 23:59:59 B     "MISSING"
   ```
   
   That 9th row `NaN > 5` is evaluated to `NA` in R and therefore gets a missing value, where as in Arrow `NaN > 5` evaluates to `false` so we get the `"zzz"` from the `chr` column




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r682698338



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       I'll try to also get this into the coalesce/select/if_else kernels when I get a chance (though I'll start with case_when since I'm already in there).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org