You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/04 14:29:05 UTC

[GitHub] [arrow] lidavidm commented on a change in pull request #10724: ARROW-12964: [R] Add bindings for ifelse() and if_else()

lidavidm commented on a change in pull request #10724:
URL: https://github.com/apache/arrow/pull/10724#discussion_r682671015



##########
File path: r/R/dplyr-functions.R
##########
@@ -634,20 +634,53 @@ nse_funcs$wday <- function(x, label = FALSE, abbr = TRUE, week_start = getOption
 }
 
 nse_funcs$log <- function(x, base = exp(1)) {
-  
+
   if (base == exp(1)) {
     return(Expression$create("ln_checked", x))
   }
-  
+
   if (base == 2) {
     return(Expression$create("log2_checked", x))
   }
-  
+
   if (base == 10) {
     return(Expression$create("log10_checked", x))
-  } 
+  }
   # ARROW-13345
   stop("`base` values other than exp(1), 2 and 10 not supported in Arrow", call. = FALSE)
 }
 
 nse_funcs$logb <- nse_funcs$log
+
+nse_funcs$if_else <- function(condition, true, false, missing = NULL){
+  # We ought to assert that the types of the true and false conditions will result
+  # in the same types. We can't compare the objects themselves directly because
+  # they might be expressions (that will result in a type) or R objects that will
+  # need to be compared to see if they are compatible with arrow types.
+  # ARROW-13186 might make this easier with a more robust way.
+  # TODO: do this ^^^
+
+  # if_else only supports boolean, numeric, or temporal types right now
+  # TODO: remove when ARROW-12955 merges
+  # If true/false are R types, we can use `is.*` directly
+  invalid_r_types <- is.character(true) || is.character(false) || is.list(true) ||
+    is.list(false) || is.factor(true) || is.factor(false)
+  # However, if they are expressions, we need to use the functions from nse_funcs
+  invalid_expression_types_true <- inherits(true, "Expression") && (
+    nse_funcs$is.character(true) || nse_funcs$is.list(true) || nse_funcs$is.factor(true)
+  )
+  invalid_expression_types_false <- inherits(false, "Expression") && (
+    nse_funcs$is.character(false) || nse_funcs$is.list(false) || nse_funcs$is.factor(false)
+  )
+  if (invalid_r_types | invalid_expression_types_true | invalid_expression_types_false) {
+    stop("`true` and `false` character values not yet supported in Arrow", call. = FALSE)
+  }

Review comment:
       Now that I'm looking at explicit dictionary support, what is the expectation for how dictionaries behave? Do we require that all inputs have the same exact dictionary, or should we merge dictionaries? It looks like base R/dplyr behave inconsistently here when the dictionaries differ:
   
   ```r
   > library(dplyr)
   
   Attaching package: ‘dplyr’
   
   The following objects are masked from ‘package:stats’:
   
       filter, lag
   
   The following objects are masked from ‘package:base’:
   
       intersect, setdiff, setequal, union
   
   > fct1 <- factor(c("a", "b"), levels = c("a", "b", "c"))
   > fct2 <- factor(c("a", "d"), levels = c("a", "b", "d"))
   > int <- c(10, 2)
   > if_else(int > 5, fct1, fct2)
   [1] a    <NA>
   Levels: a b c
   Warning message:
   In `[<-.factor`(`*tmp*`, i, value = 3L) :
     invalid factor level, NA generated
   > ifelse(int > 5, fct1, fct2)
   [1] 1 3
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org