You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/18 14:15:56 UTC

[GitHub] [arrow] nealrichardson commented on a diff in pull request #12817: ARROW-15168: [R] Add S3 generics to create main Arrow objects

nealrichardson commented on code in PR #12817:
URL: https://github.com/apache/arrow/pull/12817#discussion_r852136789


##########
r/R/chunked-array.R:
##########
@@ -153,3 +153,47 @@ ChunkedArray$create <- function(..., type = NULL) {
 #' @rdname ChunkedArray
 #' @export
 chunked_array <- ChunkedArray$create
+
+#' Convert an object to an Arrow ChunkedArray
+#'
+#' Whereas [chunked_array()] constructs a [ChunkedArray] from zero or more
+#' [Array]s, `as_chunked_array()` converts a single object to a

Review Comment:
   `ChunkedArray$create()` accepts R vectors too



##########
r/R/array.R:
##########
@@ -217,6 +217,93 @@ Array$create <- function(x, type = NULL) {
 Array$import_from_c <- ImportArray
 
 
+#' Convert an object to an Arrow Array
+#'
+#' Whereas `Array$create()` constructs an [Array] from the built-in data types
+#' for which the Arrow package implements fast converters, `as_arrow_array()`
+#' provides a means by which other packages can define conversions to Arrow
+#' objects.
+#'
+#' @param x An object to convert to an Arrow Array
+#' @param ... Passed to S3 methods
+#' @param type A [type][data-type] for the final Array. A value of `NULL`
+#'   will default to the type guessed by [type()].
+#'
+#' @return An [Array].
+#' @export
+#'
+#' @examplesIf arrow_available()
+#' as_arrow_array(1:5)
+#'
+as_arrow_array <- function(x, ..., type = NULL) {
+  UseMethod("as_arrow_array")
+}
+
+#' @rdname as_arrow_array
+#' @export
+as_arrow_array.Array <- function(x, ..., type = NULL) {
+  if (is.null(type)) {
+    x
+  } else {
+    x$cast(type)
+  }
+}
+
+#' @rdname as_arrow_array
+#' @export
+as_arrow_array.ChunkedArray <- function(x, ..., type = NULL) {
+  concat_arrays(!!! x$chunks, type = type)
+}
+
+#' @rdname as_arrow_array
+#' @export
+as_arrow_array.vctrs_vctr <- function(x, ..., type = NULL) {
+  if (is.null(type)) {
+    vctrs_extension_array(x)
+  } else if (inherits(type, "VctrsExtensionType")) {
+    vctrs_extension_array(
+      x,
+      ptype = type$ptype(),
+      storage_type = type$storage_type()
+    )
+  } else {
+    NextMethod()
+  }
+}
+
+#' @export
+as_arrow_array.POSIXlt <- function(x, ..., type = NULL) {
+  as_arrow_array.vctrs_vctr(x, ..., type = type)

Review Comment:
   Is there a risk of infinite loop if you have an object with class `c("vctrs_vctr", "POSIXlt")`? Or do we trust NextMethod() to do sane things?



##########
r/R/type.R:
##########
@@ -69,16 +70,43 @@ FLOAT_TYPES <- c("float16", "float32", "float64", "halffloat", "float", "double"
 #' type(mtcars)
 #' type(Sys.Date())
 #' @export
-type <- function(x) UseMethod("type")
+type <- function(x, ...) UseMethod("type")
 
 #' @export
-type.default <- function(x) Array__infer_type(x)
+type.default <- function(x, ..., from_array_infer_type = FALSE) {
+  # If from_array_infer_type is TRUE, this is a call from C++ and there was
+  # no S3 method defined for this object.
+  if (from_array_infer_type) {
+    abort(
+      sprintf(
+        "Can't infer Arrow data type from object inheriting from %s",
+        paste(class(x), collapse = " / ")
+      )
+    )
+  }
+
+  # If from_array_infer_type is FALSE, this is a user calling type() from R
+  # and we to call into C++. If there is no built-in conversion for this
+  # object type, C++ will call back here with from_array_infer_type = TRUE

Review Comment:
   Again, why do we need to call back to R? We can throw errors from C++, there shouldn't be any information in R that we don't have in C++.



##########
r/R/array.R:
##########
@@ -217,6 +217,93 @@ Array$create <- function(x, type = NULL) {
 Array$import_from_c <- ImportArray
 
 
+#' Convert an object to an Arrow Array
+#'
+#' Whereas `Array$create()` constructs an [Array] from the built-in data types
+#' for which the Arrow package implements fast converters, `as_arrow_array()`
+#' provides a means by which other packages can define conversions to Arrow
+#' objects.
+#'
+#' @param x An object to convert to an Arrow Array
+#' @param ... Passed to S3 methods
+#' @param type A [type][data-type] for the final Array. A value of `NULL`
+#'   will default to the type guessed by [type()].
+#'
+#' @return An [Array].
+#' @export
+#'
+#' @examplesIf arrow_available()
+#' as_arrow_array(1:5)
+#'
+as_arrow_array <- function(x, ..., type = NULL) {
+  UseMethod("as_arrow_array")
+}
+
+#' @rdname as_arrow_array
+#' @export
+as_arrow_array.Array <- function(x, ..., type = NULL) {
+  if (is.null(type)) {
+    x
+  } else {
+    x$cast(type)
+  }
+}
+
+#' @rdname as_arrow_array
+#' @export
+as_arrow_array.ChunkedArray <- function(x, ..., type = NULL) {
+  concat_arrays(!!! x$chunks, type = type)
+}
+
+#' @rdname as_arrow_array
+#' @export
+as_arrow_array.vctrs_vctr <- function(x, ..., type = NULL) {
+  if (is.null(type)) {
+    vctrs_extension_array(x)
+  } else if (inherits(type, "VctrsExtensionType")) {
+    vctrs_extension_array(
+      x,
+      ptype = type$ptype(),
+      storage_type = type$storage_type()
+    )
+  } else {
+    NextMethod()
+  }
+}
+
+#' @export
+as_arrow_array.POSIXlt <- function(x, ..., type = NULL) {
+  as_arrow_array.vctrs_vctr(x, ..., type = type)
+}
+
+
+#' @export
+as_arrow_array.default <- function(x, ..., type = NULL, from_constructor = FALSE) {
+  # If from_constructor is TRUE, this is a call from C++ for which S3 dispatch
+  # failed to find a method for the object. If this is the case, we error.
+  if (from_constructor && is.null(type)) {
+    abort(
+      sprintf(
+        "Can't create Array from object of type %s",
+        paste(class(x), collapse = " / ")
+      )
+    )
+  } else if (from_constructor) {
+    abort(
+      sprintf(
+        "Can't create Array<%s> from object of type %s",
+        format(type$code()),
+        paste(class(x), collapse = " / ")
+      )
+    )
+  }
+
+  # If from_constructor is FALSE, we use the built-in logic exposed by
+  # Array$create(). If there is no built-in conversion, C++ will call back
+  # here with from_constructor = TRUE to generate a nice error message.

Review Comment:
   Why does C++ have to call back here to create an error message?



##########
r/R/type.R:
##########
@@ -69,16 +70,43 @@ FLOAT_TYPES <- c("float16", "float32", "float64", "halffloat", "float", "double"
 #' type(mtcars)
 #' type(Sys.Date())
 #' @export
-type <- function(x) UseMethod("type")
+type <- function(x, ...) UseMethod("type")

Review Comment:
   Should this migrate to be called `data_type`? Seems inconsistent with `as_data_type`.



##########
r/R/record-batch-reader.R:
##########
@@ -176,3 +176,37 @@ RecordBatchFileReader$create <- function(file) {
   assert_is(file, "InputStream")
   ipc___RecordBatchFileReader__Open(file)
 }
+
+#' Convert an object to an Arrow RecordBatchReader
+#'
+#' @param x An object to convert to a [RecordBatchReader]
+#' @param ... Passed to S3 methods
+#'
+#' @return A [RecordBatchReader]
+#' @export
+#'
+#' @examplesIf arrow_available() && arrow_with_dataset()
+#' reader <- as_record_batch_reader(data.frame(col1 = 1, col2 = "two"))
+#' reader$read_next_batch()
+#'
+as_record_batch_reader <- function(x, ...) {
+  UseMethod("as_record_batch_reader")
+}
+
+#' @rdname as_record_batch_reader
+#' @export
+as_record_batch_reader.RecordBatchReader <- function(x, ...) {
+  x
+}
+
+#' @rdname as_arrow_table
+#' @export
+as_record_batch_reader.default <- function(x, ...) {
+  Scanner$create(x)$ToRecordBatchReader()

Review Comment:
   I'm not sure we want this. We're trying to reduce or eliminate our usage of Scanner, and this is also masking a bunch of complexity. 



##########
r/R/record-batch.R:
##########
@@ -189,3 +189,58 @@ record_batch <- RecordBatch$create
 
 #' @export
 names.RecordBatch <- function(x) x$names()
+
+#' Convert an object to an Arrow RecordBatch
+#'
+#' Whereas [record_batch()] constructs a [RecordBatch] from one or more columns,
+#' `as_record_batch()` converts a single object to an Arrow [RecordBatch].
+#'
+#' @param x An object to convert to an Arrow RecordBatch
+#' @param ... Passed to S3 methods
+#' @inheritParams record_batch
+#'
+#' @return A [RecordBatch]
+#' @export
+#'
+#' @examplesIf arrow_available()
+#' as_record_batch(data.frame(col1 = 1, col2 = "two"))
+#'
+as_record_batch <- function(x, ..., schema = NULL) {
+  UseMethod("as_record_batch")
+}
+
+#' @rdname as_record_batch
+#' @export
+as_record_batch.RecordBatch <- function(x, ..., schema = NULL) {
+  if (is.null(schema)) {
+    x
+  } else {
+    x$cast(schema)
+  }
+}
+
+#' @rdname as_record_batch
+#' @export
+as_record_batch.Table <- function(x, ..., schema = NULL) {
+  if (x$num_columns == 0) {
+    batch <- record_batch(data.frame())
+    return(batch$Take(rep_len(0, x$num_rows)))
+  }
+
+  arrays_out <- vector(mode = "list", length = x$num_columns)
+  for (i in seq_len(x$num_columns)) {
+    arrays_out[[i]] <- as_arrow_array(
+      x$column(i - 1L),
+      type = schema[[i]]$type
+    )
+  }
+  names(arrays_out) <- names(x)
+
+  record_batch(!!! arrays_out)

Review Comment:
   ```suggestion
     arrays_out <- lapply(x$columns, as_arrow_array)
     out <- RecordBatch$create(!!!arrays_out)
     if (!is.null(schema)) {
       out <- out$cast(schema)
     }
     out
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org