You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/27 04:09:34 UTC

[GitHub] [arrow] nealrichardson commented on a change in pull request #7279: ARROW-8938: [R] Provide binding for arrow::compute::CallFunction

nealrichardson commented on a change in pull request #7279:
URL: https://github.com/apache/arrow/pull/7279#discussion_r430719454



##########
File path: r/src/compute.cpp
##########
@@ -223,4 +205,66 @@ std::shared_ptr<arrow::Table> Table__FilterChunked(
   }
   return tab;
 }
+
+arrow::Datum to_datum(SEXP x) {
+  // TODO: this is repetitive, can we DRY it out?
+  if (Rf_inherits(x, "ArrowObject") && Rf_inherits(x, "Array")) {
+    Rcpp::ConstReferenceSmartPtrInputParameter<std::shared_ptr<arrow::Array>> obj(x);
+    return static_cast<std::shared_ptr<arrow::Array>>(obj);
+  } else if (Rf_inherits(x, "ArrowObject") && Rf_inherits(x, "ChunkedArray")) {
+    Rcpp::ConstReferenceSmartPtrInputParameter<std::shared_ptr<arrow::ChunkedArray>> obj(x);
+    return static_cast<std::shared_ptr<arrow::ChunkedArray>>(obj);
+  } else if (Rf_inherits(x, "ArrowObject") && Rf_inherits(x, "RecordBatch")) {
+    Rcpp::ConstReferenceSmartPtrInputParameter<std::shared_ptr<arrow::RecordBatch>> obj(x);
+    return static_cast<std::shared_ptr<arrow::RecordBatch>>(obj);
+  } else if (Rf_inherits(x, "ArrowObject") && Rf_inherits(x, "Table")) {
+    Rcpp::ConstReferenceSmartPtrInputParameter<std::shared_ptr<arrow::Table>> obj(x);
+    return static_cast<std::shared_ptr<arrow::Table>>(obj);
+  } else {
+    // TODO: scalar?

Review comment:
       Currently elsewhere we use a scalar wrapper function from the dataset namespace (see https://github.com/apache/arrow/blob/master/r/src/expression.cpp#L106-L158). What should we be doing here?

##########
File path: r/R/array.R
##########
@@ -128,17 +128,19 @@ Array <- R6Class("Array",
         i <- Array$create(i)
       }
       if (inherits(i, "ChunkedArray")) {
+        # Invalid: Kernel does not support chunked array arguments

Review comment:
       @wesm I didn't see a JIRA for this but admittedly there are a lot attached to the umbrella ticket so I may have missed it. Is there one for supporting Take with ChunkedArrays (as either the first and/or second argument)? Likewise for Take/Filter on RecordBatch and Table.

##########
File path: r/src/compute.cpp
##########
@@ -223,4 +205,66 @@ std::shared_ptr<arrow::Table> Table__FilterChunked(
   }
   return tab;
 }
+
+arrow::Datum to_datum(SEXP x) {

Review comment:
       We chose not to expose the Datum class in R at all. Datum is redundant because we already have the SEXP box for different inputs, so we can just deal with that in the interface. 
   
   Unrelated, see comment on the next line.

##########
File path: r/src/compute.cpp
##########
@@ -223,4 +205,66 @@ std::shared_ptr<arrow::Table> Table__FilterChunked(
   }
   return tab;
 }
+
+arrow::Datum to_datum(SEXP x) {
+  // TODO: this is repetitive, can we DRY it out?
+  if (Rf_inherits(x, "ArrowObject") && Rf_inherits(x, "Array")) {
+    Rcpp::ConstReferenceSmartPtrInputParameter<std::shared_ptr<arrow::Array>> obj(x);
+    return static_cast<std::shared_ptr<arrow::Array>>(obj);
+  } else if (Rf_inherits(x, "ArrowObject") && Rf_inherits(x, "ChunkedArray")) {
+    Rcpp::ConstReferenceSmartPtrInputParameter<std::shared_ptr<arrow::ChunkedArray>> obj(x);
+    return static_cast<std::shared_ptr<arrow::ChunkedArray>>(obj);
+  } else if (Rf_inherits(x, "ArrowObject") && Rf_inherits(x, "RecordBatch")) {
+    Rcpp::ConstReferenceSmartPtrInputParameter<std::shared_ptr<arrow::RecordBatch>> obj(x);
+    return static_cast<std::shared_ptr<arrow::RecordBatch>>(obj);
+  } else if (Rf_inherits(x, "ArrowObject") && Rf_inherits(x, "Table")) {
+    Rcpp::ConstReferenceSmartPtrInputParameter<std::shared_ptr<arrow::Table>> obj(x);
+    return static_cast<std::shared_ptr<arrow::Table>>(obj);
+  } else {
+    // TODO: scalar?
+    // This assumes that R objects have already been converted to Arrow objects;
+    // that seems right but should we do the wrapping here too/instead?
+    Rcpp::stop("to_datum: Not implemented");
+  }
+}
+
+SEXP from_datum(arrow::Datum datum) {
+  if (datum.is_array()) {
+    return Rcpp::wrap(datum.make_array());
+  } else if (datum.is_arraylike()) {
+    return Rcpp::wrap(datum.chunked_array());
+  } else {
+    // TODO: the other datum types
+    Rcpp::stop("from_datum: Not implemented");
+  }
+}
+
+std::shared_ptr<arrow::compute::FunctionOptions> make_compute_options(std::string func_name,
+    List_ options) {
+  if (func_name == "filter") {
+    auto out = std::make_shared<arrow::compute::FilterOptions>(arrow::compute::FilterOptions::Defaults());
+    if (!Rf_isNull(options["keep_na"]) && options["keep_na"]) {
+      out->null_selection_behavior = arrow::compute::FilterOptions::EMIT_NULL;
+    }
+    return out;
+  } else if (func_name == "take") {
+    auto out = std::make_shared<arrow::compute::TakeOptions>(arrow::compute::TakeOptions::Defaults());
+    return out;
+  } else {
+    return nullptr;
+  }
+  // TODO: make sure the correct destructor is called?

Review comment:
       This was @bkietz's note to self, I think




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org