You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/01 20:50:40 UTC

[GitHub] [arrow] wesm opened a new pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

wesm opened a new pull request #7318:
URL: https://github.com/apache/arrow/pull/7318


   A "metafunction" is one that dispatches to other functions based on the argument types. It does not contain any kernels.
   
   Other stuff in this PR:
   
   * Make "take" and "filter" metafunctions that also deal with RecordBatch, Table arguments
   * Delete tons of now unnecessary binding code from Python and R
   
   There is one failing R test that I wasn't able to debug easily. 
   
   GLib needs to be ported. I will leave it to Kou the best path forward on this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kou commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
kou commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637182726


   OK.
   I'll work on the GLib part as a separated pull request and add `ARROW_DEPRECATED` in the pull request.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637130350


   Fixed


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637883867


   @kou this test failure https://github.com/apache/arrow/pull/7318/checks?check_run_id=732734225 seems unrelated to this patch, thoughts? It seems like the test case may be malformed
   
   https://github.com/apache/arrow/blob/master/c_glib/test/test-struct-array.rb#L41


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637849432


   Will merge this once the build passes


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#discussion_r436304103



##########
File path: r/R/chunked-array.R
##########
@@ -75,23 +75,15 @@ ChunkedArray <- R6Class("ChunkedArray", inherit = ArrowObject,
       if (is.integer(i)) {
         i <- Array$create(i)
       }
-      # Invalid: Kernel does not support chunked array arguments
-      # so use the old method for both cases
-      if (inherits(i, "ChunkedArray")) {
-        return(shared_ptr(ChunkedArray, ChunkedArray__TakeChunked(self, i)))
-      }
-      assert_is(i, "Array")
-      return(shared_ptr(ChunkedArray, ChunkedArray__Take(self, i)))
+      # Invalid: Tried executing function with non-value type: ChunkedArray

Review comment:
       Done in https://github.com/apache/arrow/pull/7308/commits/247a1048e85871214e0456f3ee347a9b3e5388da




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637095245


   @nealrichardson I will need your help on the one failing R test
   
   @kou I removed 6 of the 8 `Take` APIs -- I think it would be better for GLib to use the Datum/Datum or CallFunction API if possible and perhaps net a code reduction like Python and R have achieved here. What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kiszk commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
kiszk commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637686215


   Oh, I saw https://travis-ci.org/github/apache/arrow/builds one hour ago. And, I got good feeling...
   
   According to my experience, this type of disk full error may disappear when we will run the test after 30 min. or 1 hour.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#discussion_r434285021



##########
File path: r/R/table.R
##########
@@ -146,21 +146,15 @@ Table <- R6Class("Table", inherit = ArrowObject,
       if (is.integer(i)) {
         i <- Array$create(i)
       }
-      if (inherits(i, "ChunkedArray")) {
-        return(shared_ptr(Table, Table__TakeChunked(self, i)))
-      }
-      assert_is(i, "Array")
-      shared_ptr(Table, Table__Take(self, i))
+      # Invalid: Tried executing function with non-value type: Table

Review comment:
       Another comment to remove?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637161793


   Done. So GLib can be refactored later.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637156379


   @kou thanks. I'll try to add deprecated APIs for the functions that were removed so that merging need not be blocked on the refactoring


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on a change in pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#discussion_r433873141



##########
File path: cpp/src/arrow/compute/api_vector.h
##########
@@ -304,5 +170,41 @@ Result<std::shared_ptr<Array>> ValueCounts(const Datum& value,
 ARROW_EXPORT
 Result<Datum> DictionaryEncode(const Datum& data, ExecContext* ctx = NULLPTR);
 
+// ----------------------------------------------------------------------
+// Deprecated functions
+
+// TODO: Add deprecation warnings to these functions

Review comment:
       Kou is going to take care of it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637095245


   @nealrichardson I will need your help on the one failing R test
   
   @kou I removed 6 of the 8 `Take` APIs -- I think it would be better for GLib to use the Datum/Datum API if possible and perhaps net a code reduction like Python and R have achieved here. What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on a change in pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#discussion_r434285617



##########
File path: r/R/table.R
##########
@@ -146,21 +146,15 @@ Table <- R6Class("Table", inherit = ArrowObject,
       if (is.integer(i)) {
         i <- Array$create(i)
       }
-      if (inherits(i, "ChunkedArray")) {
-        return(shared_ptr(Table, Table__TakeChunked(self, i)))
-      }
-      assert_is(i, "Array")
-      shared_ptr(Table, Table__Take(self, i))
+      # Invalid: Tried executing function with non-value type: Table

Review comment:
       Yes probably -- I got confused about what the comment meant. Do you think https://issues.apache.org/jira/browse/ARROW-9001 is feasible?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637100263


   https://issues.apache.org/jira/browse/ARROW-8917


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637884163


   Merging as it doesn't seem like the test failure could be related to this patch (the error showed up after rebasing)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#discussion_r434284526



##########
File path: r/R/chunked-array.R
##########
@@ -75,23 +75,15 @@ ChunkedArray <- R6Class("ChunkedArray", inherit = ArrowObject,
       if (is.integer(i)) {
         i <- Array$create(i)
       }
-      # Invalid: Kernel does not support chunked array arguments
-      # so use the old method for both cases
-      if (inherits(i, "ChunkedArray")) {
-        return(shared_ptr(ChunkedArray, ChunkedArray__TakeChunked(self, i)))
-      }
-      assert_is(i, "Array")
-      return(shared_ptr(ChunkedArray, ChunkedArray__Take(self, i)))
+      # Invalid: Tried executing function with non-value type: ChunkedArray

Review comment:
       Is this comment (still) valid? Looks like this is calling `call_function` now




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637225192


   > OK.
   > I'll work on the GLib part as a separated pull request and add `ARROW_DEPRECATED` in the pull request.
   
   Perfect


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kou commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
kou commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637141114


   I'll take a look this in a few days.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#discussion_r433833418



##########
File path: cpp/src/arrow/compute/api_vector.h
##########
@@ -72,152 +72,18 @@ struct ARROW_EXPORT TakeOptions : public FunctionOptions {
 /// \param[in] values datum from which to take
 /// \param[in] indices which values to take
 /// \param[in] options options
-/// \param[in] context the function execution context, optional
+/// \param[in] ctx the function execution context, optional
 /// \return the resulting datum
 ARROW_EXPORT
 Result<Datum> Take(const Datum& values, const Datum& indices,
                    const TakeOptions& options = TakeOptions::Defaults(),
-                   ExecContext* context = NULLPTR);
+                   ExecContext* ctx = NULLPTR);
 
-/// \brief Take from an array of values at indices in another array
-///
-/// The output array will be of the same type as the input values
-/// array, with elements taken from the values array at the given
-/// indices. If an index is null then the taken element will be null.
-///
-/// For example given values = ["a", "b", "c", null, "e", "f"] and
-/// indices = [2, 1, null, 3], the output will be
-/// = [values[2], values[1], null, values[3]]
-/// = ["c", "b", null, null]

Review comment:
       should we keep this explanation+example in the docstring of the Datum-based Take above?

##########
File path: cpp/src/arrow/compute/api_vector.h
##########
@@ -304,5 +170,41 @@ Result<std::shared_ptr<Array>> ValueCounts(const Datum& value,
 ARROW_EXPORT
 Result<Datum> DictionaryEncode(const Datum& data, ExecContext* ctx = NULLPTR);
 
+// ----------------------------------------------------------------------
+// Deprecated functions
+
+// TODO: Add deprecation warnings to these functions

Review comment:
       TODO in this PR or is there already a JIRA?

##########
File path: cpp/src/arrow/compute/function.h
##########
@@ -230,5 +234,19 @@ class ARROW_EXPORT ScalarAggregateFunction
       const std::vector<ValueDescr>& values) const;
 };
 
+/// \brief A function that dispatches to other functions. Must override
+/// Function::Execute.

Review comment:
       does this mean that it must handle the different Datum kinds itself? (such as dealing with combination of array and chuned array?)

##########
File path: cpp/src/arrow/compute/kernels/vector_filter.cc
##########
@@ -176,6 +239,9 @@ void RegisterVectorFilter(FunctionRegistry* registry) {
     AddKernel(InputType::Array(value_ty->id()), *value_ty);
   }
   DCHECK_OK(registry->AddFunction(std::move(filter)));
+
+  // Add take metafunction

Review comment:
       ```suggestion
     // Add filter metafunction
   ```

##########
File path: python/pyarrow/array.pxi
##########
@@ -921,81 +921,15 @@ cdef class Array(_PandasConvertible):
 
     def take(self, object indices):
         """
-        Take elements from an array.
-
-        The resulting array will be of the same type as the input array, with
-        elements taken from the input array at the given indices. If an index
-        is null then the taken element will be null.

Review comment:
       Can you keep this "If an index is null then the taken element will be null." sentence in the docstring in the compute module?

##########
File path: python/pyarrow/array.pxi
##########
@@ -921,81 +921,15 @@ cdef class Array(_PandasConvertible):
 
     def take(self, object indices):
         """
-        Take elements from an array.
-
-        The resulting array will be of the same type as the input array, with
-        elements taken from the input array at the given indices. If an index
-        is null then the taken element will be null.

Review comment:
       and maybe copy paste the example below as well




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kou commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
kou commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637884562


   We can ignore the failure.
   It's a test code problem. It's a GC related problem. We need to keep reference to the raw buffer data.
   I'll fix it later.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm closed pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm closed pull request #7318:
URL: https://github.com/apache/arrow/pull/7318


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637496088


   The CI is passing except for this s390x issue
   
   ```
   src/arrow/CMakeFiles/arrow_objlib.dir/vendored/uriparser/UriShorten.c.o  /usr/lib/s390x-linux-gnu/libcrypto.so  /usr/lib/s390x-linux-gnu/libssl.so  /usr/lib/s390x-linux-gnu/libbrotlienc.so  /usr/lib/s390x-linux-gnu/libbrotlidec.so  /usr/lib/s390x-linux-gnu/libbrotlicommon.so  orc_ep-install/lib/liborc.a  protobuf_ep-install/lib/libprotobuf.a  /usr/lib/s390x-linux-gnu/libglog.so  -ldl  jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a  -lrt  /usr/lib/s390x-linux-gnu/libcrypto.so  /usr/lib/s390x-linux-gnu/libssl.so  /usr/lib/s390x-linux-gnu/libbrotlienc.so  /usr/lib/s390x-linux-gnu/libbrotlidec.so  /usr/lib/s390x-linux-gnu/libbrotlicommon.so  /usr/lib/s390x-linux-gnu/libbz2.so  /usr/lib/s390x-linux-gnu/liblz4.so  /usr/lib/s390x-linux-gnu/libsnappy.so.1.1.8  /usr/lib/s390x-linux-gnu/libz.so  /usr/lib/s390x-linux-gnu/libzstd.so  orc_ep-install/lib/liborc.a  protobuf_ep-install/lib/libprotobuf.a  /usr/lib/s390x-linux-gnu/libglog.so  /usr/lib/s390x-linux-gnu/libcrypto.so  -pthread && :
   /usr/bin/ld: final link failed: No space left on device
   collect2: error: ld returned 1 exit status
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on a change in pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#discussion_r434218506



##########
File path: cpp/src/arrow/compute/function.h
##########
@@ -229,5 +233,22 @@ class ARROW_EXPORT ScalarAggregateFunction
       const std::vector<ValueDescr>& values) const;
 };
 
+/// \brief A function that dispatches to other functions. Must override
+/// Function::Execute.
+///
+/// For Array, ChunkedArray, and Scalar Datum kinds, may rely on the execution
+/// of concrete Function types, but must handle other Datum kinds on its own.
+class ARROW_EXPORT MetaFunction : public Function {
+ public:
+  int num_kernels() const override { return 0; }
+
+  Result<Datum> Execute(const std::vector<Datum>& args, const FunctionOptions* options,
+                        ExecContext* ctx) const override;
+
+ protected:

Review comment:
       done, thank you




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637112653


   I figured out the R issue, I'm trying to fix. It's this hack here
   
   https://github.com/apache/arrow/blob/master/r/src/compute.cpp#L180


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#discussion_r433845450



##########
File path: cpp/src/arrow/compute/function.h
##########
@@ -230,5 +234,19 @@ class ARROW_EXPORT ScalarAggregateFunction
       const std::vector<ValueDescr>& values) const;
 };
 
+/// \brief A function that dispatches to other functions. Must override
+/// Function::Execute.

Review comment:
       does this mean that it must handle the different Datum kinds itself? (such as dealing with combination of array and chunked array?)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637681876


   @kiszk the s390x machine on Travis appears to be in bad shape.
   
   It would be good to get this merged sometime today, let me know if there are other comments on the C++ or R side


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#discussion_r434162682



##########
File path: cpp/src/arrow/compute/function.h
##########
@@ -229,5 +233,22 @@ class ARROW_EXPORT ScalarAggregateFunction
       const std::vector<ValueDescr>& values) const;
 };
 
+/// \brief A function that dispatches to other functions. Must override
+/// Function::Execute.
+///
+/// For Array, ChunkedArray, and Scalar Datum kinds, may rely on the execution
+/// of concrete Function types, but must handle other Datum kinds on its own.
+class ARROW_EXPORT MetaFunction : public Function {
+ public:
+  int num_kernels() const override { return 0; }
+
+  Result<Datum> Execute(const std::vector<Datum>& args, const FunctionOptions* options,
+                        ExecContext* ctx) const override;
+
+ protected:

Review comment:
       Instead of refusing the bequest by default, I think it'd be better to provide a pure virtual function for MetaFunction subclasses to override:
   ```suggestion
     Result<Datum> Execute(const std::vector<Datum>& args, const FunctionOptions* options,
                           ExecContext* ctx) const override {
       return ExecuteImpl(args, options, ctx);
     }
   
    protected:
     virtual Result<Datum> ExecuteImpl(const std::vector<Datum>& args, const FunctionOptions* options,
                           ExecContext* ctx) const = 0;
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637112653


   I figured out the R issue, I'm trying to fix


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#discussion_r433864179



##########
File path: cpp/src/arrow/compute/function.h
##########
@@ -230,5 +234,19 @@ class ARROW_EXPORT ScalarAggregateFunction
       const std::vector<ValueDescr>& values) const;
 };
 
+/// \brief A function that dispatches to other functions. Must override
+/// Function::Execute.

Review comment:
       From further reading the code: this indeed is the case (maybe a small comment explaining this what it means to implement Function:Execute yourself might be worth it, although people working with the compute API might be expected to know what the responsibility of `Function:Execute` is)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637541402


   > One more conceptual aspect I am wondering: compute functions on tables that basically just apply the function on each column (like take, but also if eg support for Table would be added to the sum registered function), wouldn't that start to be part of the (higher level) execution plan?
   
   Perhaps, but it's not mutually exclusive. Some applications will just want to do `CallFunction(name, args)` while others might build an expression pipeline with multiple functions and then request that it be evaluated (once we have this capability, which will be relatively soon)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#discussion_r434284600



##########
File path: r/R/record-batch.R
##########
@@ -121,14 +121,13 @@ RecordBatch <- R6Class("RecordBatch", inherit = ArrowObject,
       assert_is(i, "Array")
       # Invalid: Tried executing function with non-value type: RecordBatch
       # so use old methods
-      shared_ptr(RecordBatch, RecordBatch__Take(self, i))
+      shared_ptr(RecordBatch, call_function("take", self, i))

Review comment:
       Likewise should remove comment above, which is no longer valid.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7318: ARROW-8917: [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7318:
URL: https://github.com/apache/arrow/pull/7318#issuecomment-637810842


   > Would you comment here or on the JIRA about other metafunctions (aside from take and filter)?
   
   These are the only ones I can think of aside from cast at the moment 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org