You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/14 23:41:51 UTC

[GitHub] [arrow] nealrichardson commented on a change in pull request #10327: ARROW-12781: [R] Implement is.type() functions for dplyr

nealrichardson commented on a change in pull request #10327:
URL: https://github.com/apache/arrow/pull/10327#discussion_r632858178



##########
File path: r/R/dplyr-functions.R
##########
@@ -109,6 +109,39 @@ nse_funcs$as.numeric <- function(x) {
   Expression$create("cast", x, options = cast_options(to_type = float64()))
 }
 
+# is.* type functions
+nse_funcs$is.character <- function(x) {
+  x$type_id() %in% Type[c("STRING", "LARGE_STRING")]
+}
+
+nse_funcs$is.numeric <- function(x) {
+  x$type_id() %in% c(2:12, 23:24)
+}
+
+nse_funcs$is.double <- function(x) {
+  x$type_id() %in% Type["DOUBLE"]
+}
+
+nse_funcs$is.integer <- function(x) {
+  x$type_id() %in% c(2:9)
+}
+
+nse_funcs$is.integer64 <- function(x) {
+  x$type_id() %in% Type[c("UINT64", "INT64")]

Review comment:
       Technically `bit64::integer64` can't do uint64
   
   ```suggestion
     x$type_id() %in% Type["INT64"]
   ```

##########
File path: r/R/dplyr-functions.R
##########
@@ -109,6 +109,39 @@ nse_funcs$as.numeric <- function(x) {
   Expression$create("cast", x, options = cast_options(to_type = float64()))
 }
 
+# is.* type functions
+nse_funcs$is.character <- function(x) {
+  x$type_id() %in% Type[c("STRING", "LARGE_STRING")]
+}
+
+nse_funcs$is.numeric <- function(x) {
+  x$type_id() %in% c(2:12, 23:24)

Review comment:
       I would spell these out too with the Type enum, just so they're obvious

##########
File path: r/R/dplyr-functions.R
##########
@@ -109,6 +109,39 @@ nse_funcs$as.numeric <- function(x) {
   Expression$create("cast", x, options = cast_options(to_type = float64()))
 }
 
+# is.* type functions
+nse_funcs$is.character <- function(x) {
+  x$type_id() %in% Type[c("STRING", "LARGE_STRING")]
+}
+
+nse_funcs$is.numeric <- function(x) {
+  x$type_id() %in% c(2:12, 23:24)
+}
+
+nse_funcs$is.double <- function(x) {
+  x$type_id() %in% Type["DOUBLE"]
+}
+
+nse_funcs$is.integer <- function(x) {
+  x$type_id() %in% c(2:9)
+}
+
+nse_funcs$is.integer64 <- function(x) {
+  x$type_id() %in% Type[c("UINT64", "INT64")]
+}
+
+nse_funcs$is.logical <- function(x) {
+  x$type_id() %in% Type["BOOL"]
+}
+
+nse_funcs$is.factor <- function(x) {
+  x$type_id() == Type["DICTIONARY"]
+}
+
+nse_funcs$is.list <- function(x) {
+  x$type_id() %in% Type[c("LIST", "FIXED_SIZE_LIST", "LARGE_LIST")]
+}
+

Review comment:
       You may want to consider other method(s) to do the same logic but with Arrow-specific capabilities. Perhaps something as simple as this (following the base R signature)
   
   ```r
   nse_funcs$is <- function(object, class2) {
     x$type$ToString() %in% class2
   }
   ```
   
   Also, are there hms/lubridate/etc. `is_*` functions, or tidyverse-spelling versions of these base R ones, to add?

##########
File path: r/R/expression.R
##########
@@ -76,7 +76,15 @@
 Expression <- R6Class("Expression", inherit = ArrowObject,
   public = list(
     ToString = function() compute___expr__ToString(self),
-    type = function(schema) compute___expr__type(self, schema),
+    schema = NULL,
+    bind = function(schema) self$schema <- schema,
+    type = function() {
+      if (is.null(self$schema)) {
+        stop("Must bind() expression to a schema before returning its type", call. = FALSE)
+      }
+      compute___expr__type(self, self$schema)

Review comment:
       How about this?
   
   ```suggestion
       type = function(schema = self$schema) {
         assert_that(!is.null(schema))
         compute___expr__type(self, schema)
   ```
   
   I think it's misleading to have a `$bind()` method that doesn't actually Bind the Expression. I think you can achieve the same by doing `expr$schema <- schema` inside `relocate()` And also this way you retain the schema arg to `$type()` (so you don't need to modify the print method above)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org