You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/14 22:22:30 UTC

[GitHub] [arrow] wjones127 commented on a diff in pull request #12817: ARROW-15168: [R] Add S3 generics to create main Arrow objects

wjones127 commented on code in PR #12817:
URL: https://github.com/apache/arrow/pull/12817#discussion_r850805719


##########
r/R/record-batch-reader.R:
##########
@@ -176,3 +176,37 @@ RecordBatchFileReader$create <- function(file) {
   assert_is(file, "InputStream")
   ipc___RecordBatchFileReader__Open(file)
 }
+
+#' Convert an object to an Arrow RecordBatchReader
+#'
+#' @param x An object to convert to a [RecordBatchReader]
+#' @param ... Passed to S3 methods
+#'
+#' @return A [RecordBatchReader]
+#' @export
+#'
+#' @examplesIf arrow_available() && arrow_with_dataset()
+#' reader <- as_record_batch_reader(data.frame(col1 = 1, col2 = "two"))
+#' reader$read_next_batch()
+#'
+as_record_batch_reader <- function(x, ...) {
+  UseMethod("as_record_batch_reader")
+}
+
+#' @rdname as_record_batch_reader
+#' @export
+as_record_batch_reader.RecordBatchReader <- function(x, ...) {
+  x
+}
+
+#' @rdname as_arrow_table

Review Comment:
   ```suggestion
   #' @rdname as_record_batch_reader
   ```



##########
r/R/record-batch-reader.R:
##########
@@ -176,3 +176,37 @@ RecordBatchFileReader$create <- function(file) {
   assert_is(file, "InputStream")
   ipc___RecordBatchFileReader__Open(file)
 }
+
+#' Convert an object to an Arrow RecordBatchReader
+#'
+#' @param x An object to convert to a [RecordBatchReader]
+#' @param ... Passed to S3 methods
+#'
+#' @return A [RecordBatchReader]
+#' @export
+#'
+#' @examplesIf arrow_available() && arrow_with_dataset()
+#' reader <- as_record_batch_reader(data.frame(col1 = 1, col2 = "two"))
+#' reader$read_next_batch()
+#'
+as_record_batch_reader <- function(x, ...) {
+  UseMethod("as_record_batch_reader")
+}
+
+#' @rdname as_record_batch_reader
+#' @export
+as_record_batch_reader.RecordBatchReader <- function(x, ...) {
+  x
+}
+
+#' @rdname as_arrow_table
+#' @export
+as_record_batch_reader.default <- function(x, ...) {
+  Scanner$create(x)$ToRecordBatchReader()
+}
+
+#' @rdname as_arrow_table

Review Comment:
   ```suggestion
   #' @rdname as_record_batch_reader
   ```



##########
r/R/type.R:
##########
@@ -69,16 +70,43 @@ FLOAT_TYPES <- c("float16", "float32", "float64", "halffloat", "float", "double"
 #' type(mtcars)
 #' type(Sys.Date())
 #' @export
-type <- function(x) UseMethod("type")
+type <- function(x, ...) UseMethod("type")
 
 #' @export
-type.default <- function(x) Array__infer_type(x)
+type.default <- function(x, ..., from_array_infer_type = FALSE) {
+  # If from_array_infer_type is TRUE, this is a call from C++ and there was
+  # no S3 method defined for this object.
+  if (from_array_infer_type) {
+    abort(
+      sprintf(
+        "Can't infer Arrow data type from object inheriting from %s",
+        paste(class(x), collapse = " / ")
+      )
+    )
+  }
+
+  # If from_array_infer_type is FALSE, this is a user calling type() from R
+  # and we to call into C++. If there is no built-in conversion for this
+  # object type, C++ will call back here with from_array_infer_type = TRUE
+  # to generate a nice error message.
+  Array__infer_type(x)
+}
+
+#' @export
+type.ArrowDatum <- function(x, ...) x$type
+
+#' @export
+type.Expression <- function(x, ...) x$type()
 
 #' @export
-type.ArrowDatum <- function(x) x$type
+type.vctrs_vctr <- function(x, ...) {
+  vctrs_extension_type(vctrs::vec_ptype(x))
+}
 
 #' @export

Review Comment:
   I'm not sure we care, but I noticed we don't show these other implementations in the docs, like we do with the other S3 methods.



##########
r/R/util.R:
##########
@@ -138,18 +138,17 @@ handle_parquet_io_error <- function(e, format) {
   stop(e)
 }
 
-is_writable_table <- function(x) {
-  inherits(x, c("data.frame", "ArrowTabular"))
-}
-
-# This attribute is used when is_writable is passed into assert_that, and allows
-# the call to form part of the error message when is_writable is FALSE
-attr(is_writable_table, "fail") <- function(call, env) {
-  paste0(
-    deparse(call$x),
-    " must be an object of class 'data.frame', 'RecordBatch', or 'Table', not '",
-    class(env[[deparse(call$x)]])[[1]],
-    "'."
+as_writable_table <- function(x, arg_name = "x") {

Review Comment:
   If we are passing down the `arg_name`, do we want to also take the call object to pass in? That would make it so the function calling this helper would be shown instead of `as_writable_table` in the error chain.



##########
r/src/table.cpp:
##########
@@ -226,20 +222,41 @@ arrow::Status AddMetadataFromDots(SEXP lst, int num_fields,
   cpp11::writable::list metadata(2);
   metadata.names() = arrow::r::data::names_metadata;
 
-  bool has_metadata = false;
+  bool has_top_level_metadata = false;
 
   // "top level" attributes, only relevant if the first object is not named and a data
   // frame
   cpp11::strings names = Rf_getAttrib(lst, R_NamesSymbol);
   if (names[0] == "" && Rf_inherits(VECTOR_ELT(lst, 0), "data.frame")) {
     SEXP top_level = metadata[0] = arrow_attributes(VECTOR_ELT(lst, 0), true);
     if (!Rf_isNull(top_level) && XLENGTH(top_level) > 0) {
-      has_metadata = true;
+      has_top_level_metadata = true;
     }
   }
 
   // recurse to get all columns metadata
-  metadata[1] = CollectColumnMetadata(lst, num_fields, has_metadata);
+  cpp11::writable::list metadata_columns = CollectColumnMetadata(lst, num_fields);
+
+  // Remove metadata for ExtensionType columns, because these have their own mechanism for
+  // preserving R type information
+  for (R_xlen_t i = 0; i < schema->num_fields(); i++) {
+    if (schema->field(i)->type()->id() == Type::EXTENSION) {
+      metadata_columns[i] = R_NilValue;

Review Comment:
   Does this handle nested ExtensionType columns? is that a worry at all?



##########
r/R/type.R:
##########
@@ -59,6 +59,7 @@ FLOAT_TYPES <- c("float16", "float32", "float64", "halffloat", "float", "double"
 #' infer the arrow Array type from an R vector
 #'
 #' @param x an R vector

Review Comment:
   This says "an R vector", but it seems it's much more expansive than that now. Should we update this? (Or maybe "vector" is just a very expansive term in R; is a list of dataframes a vector?)



##########
r/tests/testthat/test-metadata.R:
##########
@@ -63,6 +63,13 @@ test_that("R metadata is not stored for types that map to Arrow types (factor, D
   expect_null(Table$create(example_with_times[1:3])$metadata$r)
 })
 
+test_that("R metadata is not stored for ExtensionType columns", {
+  tab <- Table$create(
+    x = vctrs::new_vctr(1:5, class = "special_integer")
+  )

Review Comment:
   Was checking if this works if the extensiontype is nested too, but ran into error:
   
   ```r
   tab <- Table$create(
     x = data.frame(a = vctrs::new_vctr(1:5, class = "special_integer"))
   )
   # Error: NotImplemented: extension
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org