You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/13 17:14:11 UTC

[GitHub] [arrow] pitrou opened a new pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

pitrou opened a new pull request #8457:
URL: https://github.com/apache/arrow/pull/8457


   Also re-use the embedded documentations in Python to generate nicer signatures and docstrings.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r505564859



##########
File path: cpp/src/arrow/compute/function.h
##########
@@ -137,19 +169,23 @@ class ARROW_EXPORT Function {
   /// that default_options() is valid to pass to Execute as options.
   const FunctionOptions* default_options() const { return default_options_; }
 
+  virtual Status Validate() const;
+
  protected:
-  Function(std::string name, Function::Kind kind, const Arity& arity,
+  Function(std::string name, Function::Kind kind, const Arity& arity, FunctionDoc doc,
            const FunctionOptions* default_options)
       : name_(std::move(name)),
         kind_(kind),
         arity_(arity),
+        doc_(std::move(doc)),
         default_options_(default_options) {}
 
   Status CheckArity(int passed_num_args) const;
 
   std::string name_;
   Function::Kind kind_;
   Arity arity_;
+  FunctionDoc doc_;
   const FunctionOptions* default_options_ = NULLPTR;

Review comment:
       Why not, do you think there's any chance of them being dynamically generated?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kou commented on pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
kou commented on pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#issuecomment-709691186


   Wow! Great!
   I want not only argument name but also acceptable argument types to generate methods automatically in Ruby. (It'll be a follow-up task.)
   For example, if the first argument of function A (`x` of `A(x, y, z)`) accepts a table and a record batch, we can add an A method to table and record batch classes:
   
   ```ruby
   x = table
   x.A(y, z)
   
   x = record_batch
   x.A(y, z)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#issuecomment-707891501


   https://issues.apache.org/jira/browse/ARROW-9164


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r505568854



##########
File path: python/pyarrow/compute.py
##########
@@ -47,34 +47,60 @@
 import pyarrow as pa
 
 
+def _get_arg_names(func):
+    arg_names = func._doc.arg_names
+    if not arg_names:
+        if func.arity == 1:
+            arg_names = ["arg"]
+        elif func.arity == 2:
+            arg_names = ["left", "right"]
+        else:
+            raise NotImplementedError(
+                "unsupported arity: {}".format(func.arity))

Review comment:
       Since this is only a fallback for functions which don't defined a documentation, I agree.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r505565340



##########
File path: python/pyarrow/compute.py
##########
@@ -47,34 +47,60 @@
 import pyarrow as pa
 
 
+def _get_arg_names(func):
+    arg_names = func._doc.arg_names
+    if not arg_names:
+        if func.arity == 1:
+            arg_names = ["arg"]
+        elif func.arity == 2:
+            arg_names = ["left", "right"]
+        else:
+            raise NotImplementedError(
+                "unsupported arity: {}".format(func.arity))
+
+    return arg_names
+
+
 def _decorate_compute_function(wrapper, exposed_name, func, option_class):
     wrapper.__arrow_compute_function__ = dict(name=func.name,
                                               arity=func.arity)
     wrapper.__name__ = exposed_name
     wrapper.__qualname__ = exposed_name
 
-    # TODO (ARROW-9164): expose actual docstring from C++
     doc_pieces = []
-    arg_str = "arguments" if func.arity > 1 else "argument"
+
+    cpp_doc = func._doc
+    summary = cpp_doc.summary
+    if not summary:
+        arg_str = "arguments" if func.arity > 1 else "argument"
+        summary = ("Call compute function {!r} with the given {}"
+                   .format(func.name, arg_str))
+
+    description = cpp_doc.description
+    arg_names = _get_arg_names(func)
+
     doc_pieces.append("""\
-        Call compute function {!r} with the given {}.
+        {}.
+
+        """.format(summary))
 
+    if description:
+        doc_pieces.append("{}\n\n".format(description))
+
+    doc_pieces.append("""\
         Parameters
         ----------
-        """.format(func.name, arg_str))
+        """)
 
-    if func.arity == 1:
+    for arg_name in arg_names:
+        if func.kind in ('vector', 'scalar_aggregate'):
+            arg_type = 'Array-like'
+        else:
+            arg_type = 'Array-like or scalar-like'

Review comment:
       Perhaps, yes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r507709352



##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -302,19 +299,38 @@ void AddSortingKernels(VectorKernel base, VectorFunction* func) {
   }
 }
 
+FunctionDoc sort_indices_doc(
+    "Return the indices that would sort an array",
+    ("Compute an array of indices that define a non-stable sort\n"
+     "of the input array.  Null values are considered greater than any\n"
+     "other value and are therefore sorted at the end of the array."),

Review comment:
       ARROW-10345




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r505566689



##########
File path: python/pyarrow/compute.py
##########
@@ -130,32 +146,38 @@ def _handle_options(name, option_class, options, kwargs):
     return options
 
 
-def _simple_unary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
-    if option_class is not None:
-        def wrapper(arg, *, options=None, memory_pool=None, **kwargs):
-            options = _handle_options(name, option_class, options, kwargs)
-            return func.call([arg], options, memory_pool)
-    else:
-        def wrapper(arg, *, memory_pool=None):
-            return func.call([arg], None, memory_pool)
-
-    return _decorate_compute_function(wrapper, name, func, option_class)
-
-
-def _simple_binary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
+_wrapper_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, memory_pool=None):
+            return func.call([{args_sig}], None, memory_pool)
+        return {func_name}
+    """)
+
+_wrapper_options_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, options=None, memory_pool=None,
+                        **kwargs):
+            options = _handle_options({func_name!r}, option_class, options,
+                                      kwargs)
+            return func.call([{args_sig}], options, memory_pool)
+        return {func_name}
+    """)

Review comment:
       We can use `__signature__`, which should work for documentation but some error messages will be less nice when the wrong number of arguments is given (because the interpreter doesn't inspect `__signature__` when functions are called).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r505690052



##########
File path: python/pyarrow/compute.py
##########
@@ -130,32 +146,38 @@ def _handle_options(name, option_class, options, kwargs):
     return options
 
 
-def _simple_unary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
-    if option_class is not None:
-        def wrapper(arg, *, options=None, memory_pool=None, **kwargs):
-            options = _handle_options(name, option_class, options, kwargs)
-            return func.call([arg], options, memory_pool)
-    else:
-        def wrapper(arg, *, memory_pool=None):
-            return func.call([arg], None, memory_pool)
-
-    return _decorate_compute_function(wrapper, name, func, option_class)
-
-
-def _simple_binary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
+_wrapper_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, memory_pool=None):
+            return func.call([{args_sig}], None, memory_pool)
+        return {func_name}
+    """)
+
+_wrapper_options_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, options=None, memory_pool=None,
+                        **kwargs):
+            options = _handle_options({func_name!r}, option_class, options,
+                                      kwargs)
+            return func.call([{args_sig}], options, memory_pool)
+        return {func_name}
+    """)
+
+
+def _wrap_function(name, func):
+    option_class = _get_options_class(func)
+    arg_names = _get_arg_names(func)
+    args_sig = ', '.join(arg_names)
+

Review comment:
       ARROW-10316




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r505698824



##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -302,19 +299,38 @@ void AddSortingKernels(VectorKernel base, VectorFunction* func) {
   }
 }
 
+FunctionDoc sort_indices_doc(
+    "Return the indices that would sort an array",
+    ("Compute an array of indices that define a non-stable sort\n"
+     "of the input array.  Null values are considered greater than any\n"
+     "other value and are therefore sorted at the end of the array."),

Review comment:
       I think NaN is just not handled, and its presence makes all comparisons return false.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r505672267



##########
File path: cpp/src/arrow/compute/kernels/vector_nested.cc
##########
@@ -65,18 +65,33 @@ Result<ValueDescr> ValuesType(KernelContext*, const std::vector<ValueDescr>& arg
   return ValueDescr::Array(list_type.value_type());
 }
 
+FunctionDoc list_flatten_doc(
+    "Flatten list values",
+    ("`lists` must have a list-like type.\n"
+     "Return an array with the top list level flattened.\n"
+     "Top-level null values in `lists` do not emit anything in the input."),
+    {"lists"});
+
+FunctionDoc list_parent_indices_doc(
+    "Compute parent indices of nested list values",
+    ("`lists` must have a list-like type.\n"
+     "For each value in each list of `lists`, the top-level list index\n"

Review comment:
       Yes, it's not obvious how to make this too unwiedly to write.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#issuecomment-712136427


   Travis-CI build: https://travis-ci.org/github/pitrou/arrow/builds/737045923
   
   CI failures are unrelated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#issuecomment-712106591


   Will rebase and merge.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#issuecomment-709454642


   @nealrichardson @kou This may help for R and Ruby.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r505685579



##########
File path: python/pyarrow/compute.py
##########
@@ -130,32 +146,38 @@ def _handle_options(name, option_class, options, kwargs):
     return options
 
 
-def _simple_unary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
-    if option_class is not None:
-        def wrapper(arg, *, options=None, memory_pool=None, **kwargs):
-            options = _handle_options(name, option_class, options, kwargs)
-            return func.call([arg], options, memory_pool)
-    else:
-        def wrapper(arg, *, memory_pool=None):
-            return func.call([arg], None, memory_pool)
-
-    return _decorate_compute_function(wrapper, name, func, option_class)
-
-
-def _simple_binary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
+_wrapper_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, memory_pool=None):
+            return func.call([{args_sig}], None, memory_pool)
+        return {func_name}
+    """)
+
+_wrapper_options_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, options=None, memory_pool=None,
+                        **kwargs):
+            options = _handle_options({func_name!r}, option_class, options,
+                                      kwargs)
+            return func.call([{args_sig}], options, memory_pool)
+        return {func_name}
+    """)

Review comment:
       That can be added to the wrappers, though: `assert len(args) == func.arity`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r505689048



##########
File path: python/pyarrow/compute.py
##########
@@ -130,32 +146,38 @@ def _handle_options(name, option_class, options, kwargs):
     return options
 
 
-def _simple_unary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
-    if option_class is not None:
-        def wrapper(arg, *, options=None, memory_pool=None, **kwargs):
-            options = _handle_options(name, option_class, options, kwargs)
-            return func.call([arg], options, memory_pool)
-    else:
-        def wrapper(arg, *, memory_pool=None):
-            return func.call([arg], None, memory_pool)
-
-    return _decorate_compute_function(wrapper, name, func, option_class)
-
-
-def _simple_binary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
+_wrapper_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, memory_pool=None):
+            return func.call([{args_sig}], None, memory_pool)
+        return {func_name}
+    """)
+
+_wrapper_options_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, options=None, memory_pool=None,
+                        **kwargs):
+            options = _handle_options({func_name!r}, option_class, options,
+                                      kwargs)
+            return func.call([{args_sig}], options, memory_pool)
+        return {func_name}
+    """)
+
+
+def _wrap_function(name, func):
+    option_class = _get_options_class(func)
+    arg_names = _get_arg_names(func)
+    args_sig = ', '.join(arg_names)
+

Review comment:
       Hmm, I didn't know that `__wrapped__` was taken up by `inspect.signature`. Interesting. I'd rather examine this later though.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r504891193



##########
File path: python/pyarrow/compute.py
##########
@@ -47,34 +47,60 @@
 import pyarrow as pa
 
 
+def _get_arg_names(func):
+    arg_names = func._doc.arg_names
+    if not arg_names:
+        if func.arity == 1:
+            arg_names = ["arg"]
+        elif func.arity == 2:
+            arg_names = ["left", "right"]
+        else:
+            raise NotImplementedError(
+                "unsupported arity: {}".format(func.arity))

Review comment:
       I think it'd probably be better to keep this as simple as possible:
   ```suggestion
       if not arg_names:
           arg_names = ["arg{}".format(i) for i in range(func.arity)]
   ```

##########
File path: python/pyarrow/compute.py
##########
@@ -93,20 +119,10 @@ def _decorate_compute_function(wrapper, exposed_name, func, option_class):
     return wrapper
 
 
-_option_classes = {
-    # TODO: export the option class name from C++ metadata?
-    'cast': CastOptions,
-    'filter': FilterOptions,
-    'index_in': SetLookupOptions,
-    'is_in': SetLookupOptions,
-    'match_substring': MatchSubstringOptions,
-    'min_max': MinMaxOptions,
-    'partition_nth_indices': PartitionNthOptions,
-    'stddev': VarianceOptions,
-    'strptime': StrptimeOptions,
-    'take': TakeOptions,
-    'variance': VarianceOptions,
-}
+def _get_options_class(func):
+    class_name = func._doc.options_class
+    # XXX should we raise if the class isn't exposed in Python?
+    return globals().get(class_name) if class_name else None

Review comment:
       I think we should definitely raise
   ```suggestion
       klass =  globals().get(func._doc.options_class, None)
       if klass is not None:
           return klass
       raise NotImplemented("Accessing options for {} from python".format(func))
   ```

##########
File path: python/pyarrow/compute.py
##########
@@ -130,32 +146,38 @@ def _handle_options(name, option_class, options, kwargs):
     return options
 
 
-def _simple_unary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
-    if option_class is not None:
-        def wrapper(arg, *, options=None, memory_pool=None, **kwargs):
-            options = _handle_options(name, option_class, options, kwargs)
-            return func.call([arg], options, memory_pool)
-    else:
-        def wrapper(arg, *, memory_pool=None):
-            return func.call([arg], None, memory_pool)
-
-    return _decorate_compute_function(wrapper, name, func, option_class)
-
-
-def _simple_binary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
+_wrapper_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, memory_pool=None):
+            return func.call([{args_sig}], None, memory_pool)
+        return {func_name}
+    """)
+
+_wrapper_options_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, options=None, memory_pool=None,
+                        **kwargs):
+            options = _handle_options({func_name!r}, option_class, options,
+                                      kwargs)
+            return func.call([{args_sig}], options, memory_pool)
+        return {func_name}
+    """)

Review comment:
       I guess there isn't a way to accomplish this without `exec()` but these wrappers seem brittle in the extreme. Could we minimize the exec'd code by setting `__wrapped__ = eval('lambda {}: None'.format(args_sig))` to get the arguments right, then set `__name__`, `__qualname__` etc directly?

##########
File path: python/pyarrow/compute.py
##########
@@ -130,32 +146,38 @@ def _handle_options(name, option_class, options, kwargs):
     return options
 
 
-def _simple_unary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
-    if option_class is not None:
-        def wrapper(arg, *, options=None, memory_pool=None, **kwargs):
-            options = _handle_options(name, option_class, options, kwargs)
-            return func.call([arg], options, memory_pool)
-    else:
-        def wrapper(arg, *, memory_pool=None):
-            return func.call([arg], None, memory_pool)
-
-    return _decorate_compute_function(wrapper, name, func, option_class)
-
-
-def _simple_binary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
+_wrapper_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, memory_pool=None):
+            return func.call([{args_sig}], None, memory_pool)
+        return {func_name}
+    """)
+
+_wrapper_options_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, options=None, memory_pool=None,
+                        **kwargs):
+            options = _handle_options({func_name!r}, option_class, options,
+                                      kwargs)
+            return func.call([{args_sig}], options, memory_pool)
+        return {func_name}
+    """)
+
+
+def _wrap_function(name, func):
+    option_class = _get_options_class(func)
+    arg_names = _get_arg_names(func)
+    args_sig = ', '.join(arg_names)
+

Review comment:
       ```suggestion
           if option_class is None:
               def wrapper(*args, *, memory_pool=None):
                   return func.call(args, None, memory_pool)
               wrapped_template = 'lambda {}, *, memory_pool=None: ()'
           else:
               def wrapper(*args, *, options=None, memory_pool=None, **kwargs):
                   options = _handle_options(name, option_class, options, kwargs)
                   return func.call(args, options, memory_pool)
               wrapped_template = 'lambda {}, *, options=None, memory_pool=None, **kwargs: ()'
   
           wrapper.__name__ = name
           wrapper.__qualname__ = name
           wrapper.__wrapped__ = eval(wrapped_template.format(args_sig))
   ```

##########
File path: python/pyarrow/compute.py
##########
@@ -47,34 +47,60 @@
 import pyarrow as pa
 
 
+def _get_arg_names(func):
+    arg_names = func._doc.arg_names
+    if not arg_names:
+        if func.arity == 1:
+            arg_names = ["arg"]
+        elif func.arity == 2:
+            arg_names = ["left", "right"]
+        else:
+            raise NotImplementedError(
+                "unsupported arity: {}".format(func.arity))
+
+    return arg_names
+
+
 def _decorate_compute_function(wrapper, exposed_name, func, option_class):
     wrapper.__arrow_compute_function__ = dict(name=func.name,
                                               arity=func.arity)
     wrapper.__name__ = exposed_name
     wrapper.__qualname__ = exposed_name
 
-    # TODO (ARROW-9164): expose actual docstring from C++
     doc_pieces = []
-    arg_str = "arguments" if func.arity > 1 else "argument"
+
+    cpp_doc = func._doc
+    summary = cpp_doc.summary
+    if not summary:
+        arg_str = "arguments" if func.arity > 1 else "argument"
+        summary = ("Call compute function {!r} with the given {}"
+                   .format(func.name, arg_str))
+
+    description = cpp_doc.description
+    arg_names = _get_arg_names(func)
+
     doc_pieces.append("""\
-        Call compute function {!r} with the given {}.
+        {}.
+
+        """.format(summary))
 
+    if description:
+        doc_pieces.append("{}\n\n".format(description))
+
+    doc_pieces.append("""\
         Parameters
         ----------
-        """.format(func.name, arg_str))
+        """)
 
-    if func.arity == 1:
+    for arg_name in arg_names:
+        if func.kind in ('vector', 'scalar_aggregate'):
+            arg_type = 'Array-like'
+        else:
+            arg_type = 'Array-like or scalar-like'

Review comment:
       I suppose at some point we'll want to expose `KernelSignature` or otherwise examine what inputs the kernels actually accept in order to generate this more rigorously

##########
File path: cpp/src/arrow/compute/function.h
##########
@@ -137,19 +169,23 @@ class ARROW_EXPORT Function {
   /// that default_options() is valid to pass to Execute as options.
   const FunctionOptions* default_options() const { return default_options_; }
 
+  virtual Status Validate() const;
+
  protected:
-  Function(std::string name, Function::Kind kind, const Arity& arity,
+  Function(std::string name, Function::Kind kind, const Arity& arity, FunctionDoc doc,
            const FunctionOptions* default_options)
       : name_(std::move(name)),
         kind_(kind),
         arity_(arity),
+        doc_(std::move(doc)),
         default_options_(default_options) {}
 
   Status CheckArity(int passed_num_args) const;
 
   std::string name_;
   Function::Kind kind_;
   Arity arity_;
+  FunctionDoc doc_;
   const FunctionOptions* default_options_ = NULLPTR;

Review comment:
       Since all `FunctionDoc` instances are statically allocated, we could just refer to them with non-owning pointers:
   ```suggestion
     Function(std::string name, Function::Kind kind, const Arity& arity,
              const FunctionOptions* default_options,
              const FunctionDoc* doc = FunctionDoc::empty())
         : name_(std::move(name)),
           kind_(kind),
           arity_(arity),
           default_options_(default_options),
           doc_(doc) {}
   
     Status CheckArity(int passed_num_args) const;
   
     std::string name_;
     Function::Kind kind_;
     Arity arity_;
     const FunctionOptions* default_options_ = NULLPTR;
     const FunctionDoc* doc_;
   ```

##########
File path: python/pyarrow/compute.py
##########
@@ -130,32 +146,38 @@ def _handle_options(name, option_class, options, kwargs):
     return options
 
 
-def _simple_unary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
-    if option_class is not None:
-        def wrapper(arg, *, options=None, memory_pool=None, **kwargs):
-            options = _handle_options(name, option_class, options, kwargs)
-            return func.call([arg], options, memory_pool)
-    else:
-        def wrapper(arg, *, memory_pool=None):
-            return func.call([arg], None, memory_pool)
-
-    return _decorate_compute_function(wrapper, name, func, option_class)
-
-
-def _simple_binary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
+_wrapper_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, memory_pool=None):
+            return func.call([{args_sig}], None, memory_pool)
+        return {func_name}
+    """)
+
+_wrapper_options_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, options=None, memory_pool=None,
+                        **kwargs):
+            options = _handle_options({func_name!r}, option_class, options,
+                                      kwargs)
+            return func.call([{args_sig}], options, memory_pool)
+        return {func_name}
+    """)

Review comment:
       Maybe we could set the `__signature__` property?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r505674797



##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -302,19 +299,38 @@ void AddSortingKernels(VectorKernel base, VectorFunction* func) {
   }
 }
 
+FunctionDoc sort_indices_doc(
+    "Return the indices that would sort an array",
+    ("Compute an array of indices that define a non-stable sort\n"
+     "of the input array.  Null values are considered greater than any\n"
+     "other value and are therefore sorted at the end of the array."),
+    {"array"});
+
+FunctionDoc partition_nth_indices_doc(
+    "Return the indices that would partition an array around a pivot",
+    ("Compute an array of indices that define a non-stable partition\n"
+     "around the `N`th smallest element of the input array.\n"

Review comment:
       Will improve this, thank you.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8457:
URL: https://github.com/apache/arrow/pull/8457


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r505684699



##########
File path: python/pyarrow/compute.py
##########
@@ -93,20 +119,10 @@ def _decorate_compute_function(wrapper, exposed_name, func, option_class):
     return wrapper
 
 
-_option_classes = {
-    # TODO: export the option class name from C++ metadata?
-    'cast': CastOptions,
-    'filter': FilterOptions,
-    'index_in': SetLookupOptions,
-    'is_in': SetLookupOptions,
-    'match_substring': MatchSubstringOptions,
-    'min_max': MinMaxOptions,
-    'partition_nth_indices': PartitionNthOptions,
-    'stddev': VarianceOptions,
-    'strptime': StrptimeOptions,
-    'take': TakeOptions,
-    'variance': VarianceOptions,
-}
+def _get_options_class(func):
+    class_name = func._doc.options_class
+    # XXX should we raise if the class isn't exposed in Python?
+    return globals().get(class_name) if class_name else None

Review comment:
       I made it emit a warning instead. Raising is not very friendly towards C++ developers that don't know Python.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r505700265



##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -302,19 +299,38 @@ void AddSortingKernels(VectorKernel base, VectorFunction* func) {
   }
 }
 
+FunctionDoc sort_indices_doc(
+    "Return the indices that would sort an array",
+    ("Compute an array of indices that define a non-stable sort\n"
+     "of the input array.  Null values are considered greater than any\n"
+     "other value and are therefore sorted at the end of the array."),

Review comment:
       We should definitely create a JIRA.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r505671134



##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -277,29 +278,80 @@ std::shared_ptr<ScalarFunction> MakeArithmeticFunction(std::string name) {
 // Like MakeArithmeticFunction, but for arithmetic ops that need to run
 // only on non-null output.
 template <typename Op>
-std::shared_ptr<ScalarFunction> MakeArithmeticFunctionNotNull(std::string name) {
-  auto func = std::make_shared<ScalarFunction>(name, Arity::Binary());
+std::shared_ptr<ScalarFunction> MakeArithmeticFunctionNotNull(std::string name,
+                                                              FunctionDoc doc) {
+  auto func = std::make_shared<ScalarFunction>(name, Arity::Binary(), std::move(doc));
   for (const auto& ty : NumericTypes()) {
     auto exec = NumericEqualTypesBinary<ScalarBinaryNotNullEqualTypes, Op>(ty);
     DCHECK_OK(func->AddKernel({ty, ty}, ty, exec));
   }
   return func;
 }
 
+FunctionDoc add_doc{"Add the elements of two arrays",
+                    ("Results will wrap around on integer overflow.\n"
+                     "Use function \"add_checked\" if you want overflow\n"
+                     "to return an error."),
+                    {"x", "y"}};
+
+FunctionDoc add_checked_doc{
+    "Add the elements of two arrays",
+    ("This function returns an error on overflow.  For a variant that\n"
+     "doesn't fail on overflow, use function \"add\"."),
+    {"x", "y"}};
+
+FunctionDoc sub_doc{"Substract the elements of an array from another",
+                    ("Results will wrap around on integer overflow.\n"
+                     "Use function \"subtract_checked\" if you want overflow\n"
+                     "to return an error."),
+                    {"x", "y"}};
+
+FunctionDoc sub_checked_doc{
+    "Substract the elements of an array from another",
+    ("This function returns an error on overflow.  For a variant that\n"
+     "doesn't fail on overflow, use function \"subtract\"."),
+    {"x", "y"}};
+
+FunctionDoc mul_doc{"Multiply the elements of two arrays",
+                    ("Results will wrap around on integer overflow.\n"
+                     "Use function \"multiply_checked\" if you want overflow\n"
+                     "to return an error."),
+                    {"x", "y"}};
+
+FunctionDoc mul_checked_doc{
+    "Multiply the elements of two arrays",
+    ("This function returns an error on overflow.  For a variant that\n"
+     "doesn't fail on overflow, use function \"multiply\"."),
+    {"x", "y"}};
+
+FunctionDoc div_doc{"Divide the elements of an array by another",
+                    ("An error is returned when trying to divide by zero.\n"

Review comment:
       Right, will do.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r505685027



##########
File path: python/pyarrow/compute.py
##########
@@ -130,32 +146,38 @@ def _handle_options(name, option_class, options, kwargs):
     return options
 
 
-def _simple_unary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
-    if option_class is not None:
-        def wrapper(arg, *, options=None, memory_pool=None, **kwargs):
-            options = _handle_options(name, option_class, options, kwargs)
-            return func.call([arg], options, memory_pool)
-    else:
-        def wrapper(arg, *, memory_pool=None):
-            return func.call([arg], None, memory_pool)
-
-    return _decorate_compute_function(wrapper, name, func, option_class)
-
-
-def _simple_binary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
+_wrapper_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, memory_pool=None):
+            return func.call([{args_sig}], None, memory_pool)
+        return {func_name}
+    """)
+
+_wrapper_options_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, options=None, memory_pool=None,
+                        **kwargs):
+            options = _handle_options({func_name!r}, option_class, options,
+                                      kwargs)
+            return func.call([{args_sig}], options, memory_pool)
+        return {func_name}
+    """)

Review comment:
       I'm keeping the generated wrappers for now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r505527509



##########
File path: cpp/src/arrow/compute/kernels/aggregate_var_std.cc
##########
@@ -180,20 +180,38 @@ void AddVarStdKernels(KernelInit init,
   }
 }
 
+FunctionDoc stddev_doc{
+    "Calculate the standard deviation of a numeric array",
+    ("The number of degrees of freedom can be controlled using VarianceOptions.\n"
+     "By default (`ddof` = 0), the population standard deviation is calculated.\n"
+     "Nulls are ignored.  If there are not enough non-null values in the array\n"
+     "to satisfy `ddof`, an error is returned."),

Review comment:
       At least on the Python side, it seems that null is returned instead of an error?
   
   ```
   In [19]: pc.stddev(pa.array([], pa.float64()))
   Out[19]: <pyarrow.DoubleScalar: None>
   ```
   ```

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -277,29 +278,80 @@ std::shared_ptr<ScalarFunction> MakeArithmeticFunction(std::string name) {
 // Like MakeArithmeticFunction, but for arithmetic ops that need to run
 // only on non-null output.
 template <typename Op>
-std::shared_ptr<ScalarFunction> MakeArithmeticFunctionNotNull(std::string name) {
-  auto func = std::make_shared<ScalarFunction>(name, Arity::Binary());
+std::shared_ptr<ScalarFunction> MakeArithmeticFunctionNotNull(std::string name,
+                                                              FunctionDoc doc) {
+  auto func = std::make_shared<ScalarFunction>(name, Arity::Binary(), std::move(doc));
   for (const auto& ty : NumericTypes()) {
     auto exec = NumericEqualTypesBinary<ScalarBinaryNotNullEqualTypes, Op>(ty);
     DCHECK_OK(func->AddKernel({ty, ty}, ty, exec));
   }
   return func;
 }
 
+FunctionDoc add_doc{"Add the elements of two arrays",

Review comment:
       Not necessarily two "arrays" .. Can also be scalar(s), etc. Now, not sure how to clearly rephrase this while still being concise but also not overly generic (like "objects"), so maybe keeping as arrays is fine. 

##########
File path: cpp/src/arrow/compute/kernels/scalar_boolean.cc
##########
@@ -171,21 +171,67 @@ void MakeFunction(std::string name, int arity, ArrayKernelExec exec,
   DCHECK_OK(registry->AddFunction(std::move(func)));
 }
 
+FunctionDoc invert_doc{"Invert boolean values", "", {"values"}};
+
+FunctionDoc and_doc{"Logical 'and' boolean values",
+                    ("When a null is encountered is either input, a null is output.\n"
+                     "For a different null behavior, see function \"and_kleene\"."),
+                    {"x", "y"}};
+
+FunctionDoc or_doc{"Logical 'or' boolean values",
+                   ("When a null is encountered is either input, a null is output.\n"
+                    "For a different null behavior, see function \"or_kleene\"."),
+                   {"x", "y"}};
+
+FunctionDoc xor_doc{"Logical 'xor' boolean values",
+                    ("When a null is encountered is either input, a null is output."),
+                    {"x", "y"}};
+
+FunctionDoc and_kleene_doc{
+    "Logical 'and' boolean values (Kleene logic)",
+    ("This function behaves as follows with nulls:\n\n"
+     "- true and null = null\n"
+     "- null and true = null\n"
+     "- false and null = false\n"
+     "- null and false = false\n"
+     "- null and null = null\n"
+     "\n"
+     "In other words, in this context a null value really means \"unknown\",\n"
+     "and an unknown value 'and' false is always false.\n"
+     "For a different null behavior, see function \"and\"."),
+    {"x", "y"}};
+
+FunctionDoc or_kleene_doc{
+    "Logical 'or' boolean values (Kleene logic)",
+    ("This function behaves as follows with nulls:\n\n"
+     "- true or null = true\n"
+     "- null and true = true\n"
+     "- false and null = null\n"
+     "- null and false = null\n"
+     "- null and null = null\n"
+     "\n"
+     "In other words, in this context a null value really means \"unknown\",\n"
+     "and an unknown value 'or' true is always true.\n"
+     "For a different null behavior, see function \"and\"."),

Review comment:
       ```suggestion
        "For a different null behavior, see function \"or\"."),
   ```

##########
File path: cpp/src/arrow/compute/kernels/vector_nested.cc
##########
@@ -65,18 +65,33 @@ Result<ValueDescr> ValuesType(KernelContext*, const std::vector<ValueDescr>& arg
   return ValueDescr::Array(list_type.value_type());
 }
 
+FunctionDoc list_flatten_doc(
+    "Flatten list values",
+    ("`lists` must have a list-like type.\n"
+     "Return an array with the top list level flattened.\n"
+     "Top-level null values in `lists` do not emit anything in the input."),
+    {"lists"});
+
+FunctionDoc list_parent_indices_doc(
+    "Compute parent indices of nested list values",
+    ("`lists` must have a list-like type.\n"
+     "For each value in each list of `lists`, the top-level list index\n"

Review comment:
       Not for this PR, but I think that for conceptually more complex functions (like this one, for me), it might be useful to include a small example. But that might get verbose to write in this form ..
   
   For example, for this case: `[[1, 2], [3, 4, 5], [6]]` returns `[0, 0, 1, 1, 1, 2]`

##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -302,19 +299,38 @@ void AddSortingKernels(VectorKernel base, VectorFunction* func) {
   }
 }
 
+FunctionDoc sort_indices_doc(
+    "Return the indices that would sort an array",
+    ("Compute an array of indices that define a non-stable sort\n"
+     "of the input array.  Null values are considered greater than any\n"
+     "other value and are therefore sorted at the end of the array."),
+    {"array"});
+
+FunctionDoc partition_nth_indices_doc(
+    "Return the indices that would partition an array around a pivot",
+    ("Compute an array of indices that define a non-stable partition\n"
+     "around the `N`th smallest element of the input array.\n"

Review comment:
       I think what you wrote in the compute.rst guide is clearer:
   
   > * \(1) The output is an array of indices into the input array, that define
     a partial sort such that the *N*'th index points to the *N*'th element
     in sorted order, and all indices before the *N*'th point to elements
     less or equal to elements at or after the *N*'th (similar to
     :func:`std::nth_element`).  *N* is given in
     :member:`PartitionNthOptions::pivot`.

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -277,29 +278,80 @@ std::shared_ptr<ScalarFunction> MakeArithmeticFunction(std::string name) {
 // Like MakeArithmeticFunction, but for arithmetic ops that need to run
 // only on non-null output.
 template <typename Op>
-std::shared_ptr<ScalarFunction> MakeArithmeticFunctionNotNull(std::string name) {
-  auto func = std::make_shared<ScalarFunction>(name, Arity::Binary());
+std::shared_ptr<ScalarFunction> MakeArithmeticFunctionNotNull(std::string name,
+                                                              FunctionDoc doc) {
+  auto func = std::make_shared<ScalarFunction>(name, Arity::Binary(), std::move(doc));
   for (const auto& ty : NumericTypes()) {
     auto exec = NumericEqualTypesBinary<ScalarBinaryNotNullEqualTypes, Op>(ty);
     DCHECK_OK(func->AddKernel({ty, ty}, ty, exec));
   }
   return func;
 }
 
+FunctionDoc add_doc{"Add the elements of two arrays",
+                    ("Results will wrap around on integer overflow.\n"
+                     "Use function \"add_checked\" if you want overflow\n"
+                     "to return an error."),
+                    {"x", "y"}};
+
+FunctionDoc add_checked_doc{
+    "Add the elements of two arrays",
+    ("This function returns an error on overflow.  For a variant that\n"
+     "doesn't fail on overflow, use function \"add\"."),
+    {"x", "y"}};
+
+FunctionDoc sub_doc{"Substract the elements of an array from another",
+                    ("Results will wrap around on integer overflow.\n"
+                     "Use function \"subtract_checked\" if you want overflow\n"
+                     "to return an error."),
+                    {"x", "y"}};
+
+FunctionDoc sub_checked_doc{
+    "Substract the elements of an array from another",
+    ("This function returns an error on overflow.  For a variant that\n"
+     "doesn't fail on overflow, use function \"subtract\"."),
+    {"x", "y"}};
+
+FunctionDoc mul_doc{"Multiply the elements of two arrays",
+                    ("Results will wrap around on integer overflow.\n"
+                     "Use function \"multiply_checked\" if you want overflow\n"
+                     "to return an error."),
+                    {"x", "y"}};
+
+FunctionDoc mul_checked_doc{
+    "Multiply the elements of two arrays",
+    ("This function returns an error on overflow.  For a variant that\n"
+     "doesn't fail on overflow, use function \"multiply\"."),
+    {"x", "y"}};
+
+FunctionDoc div_doc{"Divide the elements of an array by another",
+                    ("An error is returned when trying to divide by zero.\n"

Review comment:
       This seems the case when dividing integers:
   
   ```
   In [29]: pc.divide([1, 2], 0)
   ...
   ArrowInvalid: divide by zero
   ```
   
   but not when dividing floats:
   
   ```
   In [28]: pc.divide([1., 2], 0.)
   Out[28]: 
   <pyarrow.lib.DoubleArray object at 0x7fab75767588>
   [
     inf,
     inf
   ]
   ```
   
   

##########
File path: cpp/src/arrow/compute/kernels/scalar_boolean.cc
##########
@@ -171,21 +171,67 @@ void MakeFunction(std::string name, int arity, ArrayKernelExec exec,
   DCHECK_OK(registry->AddFunction(std::move(func)));
 }
 
+FunctionDoc invert_doc{"Invert boolean values", "", {"values"}};
+
+FunctionDoc and_doc{"Logical 'and' boolean values",
+                    ("When a null is encountered is either input, a null is output.\n"

Review comment:
       ```suggestion
                       ("When a null is encountered in either input, a null is output.\n"
   ```
   
   and same for the 2 docstrings below

##########
File path: cpp/src/arrow/compute/kernels/scalar_validity.cc
##########
@@ -123,13 +123,21 @@ void IsNullExec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
   }
 }
 
+FunctionDoc is_valid_doc(
+    "Return true if non-null",
+    ("For each input value, emit true iff the value is valid (non-null)."), {"input"});

Review comment:
       Regarding the argument name, for other unary functions above (where the input values were not of a specific type, like strings or lists), also "values" was used. Do we want to try to be somewhat consistent here? 

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -277,29 +278,80 @@ std::shared_ptr<ScalarFunction> MakeArithmeticFunction(std::string name) {
 // Like MakeArithmeticFunction, but for arithmetic ops that need to run
 // only on non-null output.
 template <typename Op>
-std::shared_ptr<ScalarFunction> MakeArithmeticFunctionNotNull(std::string name) {
-  auto func = std::make_shared<ScalarFunction>(name, Arity::Binary());
+std::shared_ptr<ScalarFunction> MakeArithmeticFunctionNotNull(std::string name,
+                                                              FunctionDoc doc) {
+  auto func = std::make_shared<ScalarFunction>(name, Arity::Binary(), std::move(doc));
   for (const auto& ty : NumericTypes()) {
     auto exec = NumericEqualTypesBinary<ScalarBinaryNotNullEqualTypes, Op>(ty);
     DCHECK_OK(func->AddKernel({ty, ty}, ty, exec));
   }
   return func;
 }
 
+FunctionDoc add_doc{"Add the elements of two arrays",
+                    ("Results will wrap around on integer overflow.\n"
+                     "Use function \"add_checked\" if you want overflow\n"
+                     "to return an error."),
+                    {"x", "y"}};
+
+FunctionDoc add_checked_doc{
+    "Add the elements of two arrays",
+    ("This function returns an error on overflow.  For a variant that\n"
+     "doesn't fail on overflow, use function \"add\"."),
+    {"x", "y"}};
+
+FunctionDoc sub_doc{"Substract the elements of an array from another",
+                    ("Results will wrap around on integer overflow.\n"
+                     "Use function \"subtract_checked\" if you want overflow\n"
+                     "to return an error."),
+                    {"x", "y"}};
+
+FunctionDoc sub_checked_doc{
+    "Substract the elements of an array from another",
+    ("This function returns an error on overflow.  For a variant that\n"
+     "doesn't fail on overflow, use function \"subtract\"."),
+    {"x", "y"}};
+
+FunctionDoc mul_doc{"Multiply the elements of two arrays",
+                    ("Results will wrap around on integer overflow.\n"
+                     "Use function \"multiply_checked\" if you want overflow\n"
+                     "to return an error."),
+                    {"x", "y"}};
+
+FunctionDoc mul_checked_doc{
+    "Multiply the elements of two arrays",
+    ("This function returns an error on overflow.  For a variant that\n"
+     "doesn't fail on overflow, use function \"multiply\"."),
+    {"x", "y"}};
+
+FunctionDoc div_doc{"Divide the elements of an array by another",
+                    ("An error is returned when trying to divide by zero.\n"
+                     "However, integer overflow will silently wrap around.\n"
+                     "Use function \"divide_checked\" if you want integer\n"
+                     "overflow to also return an error."),
+                    {"dividend", "divisor"}};
+
+FunctionDoc div_checked_doc{
+    "Divide the elements of an array by another",
+    ("An error is returned when trying to divide by zero, or when\n"
+     "integer overflow is encountered.\n"),

Review comment:
       ```suggestion
        "integer overflow is encountered."),
   ```
   
   (the others don't end with a newline, and in python this results in two blank lines)

##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -302,19 +299,38 @@ void AddSortingKernels(VectorKernel base, VectorFunction* func) {
   }
 }
 
+FunctionDoc sort_indices_doc(
+    "Return the indices that would sort an array",
+    ("Compute an array of indices that define a non-stable sort\n"
+     "of the input array.  Null values are considered greater than any\n"
+     "other value and are therefore sorted at the end of the array."),

Review comment:
       I wanted to comment to maybe mention NaN as well (also sorted last, but before null). But apparently my expectation is not true, we don't really have well-defined sort behavior with NaNs present. Will open a JIRA about that
   
   For example
   
   ```
   In [39]: arr = pa.array([2, 1, np.nan, 2, 1])
   
   In [40]: pc.take(arr, pc.sort_indices(arr))
   Out[40]: 
   <pyarrow.lib.DoubleArray object at 0x7fc693d07c48>
   [
     1,
     2,
     nan,
     1,
     2
   ]
   
   In [41]: arr = pa.array([2, 1, np.nan, 2, np.nan, 1])
   
   In [42]: pc.take(arr, pc.sort_indices(arr))
   Out[42]: 
   <pyarrow.lib.DoubleArray object at 0x7fc693d10348>
   [
     1,
     1,
     2,
     nan,
     2,
     nan
   ]
   ```
   
   (for which I can't really infer the logic)

##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -302,19 +299,38 @@ void AddSortingKernels(VectorKernel base, VectorFunction* func) {
   }
 }
 
+FunctionDoc sort_indices_doc(
+    "Return the indices that would sort an array",
+    ("Compute an array of indices that define a non-stable sort\n"
+     "of the input array.  Null values are considered greater than any\n"
+     "other value and are therefore sorted at the end of the array."),
+    {"array"});
+
+FunctionDoc partition_nth_indices_doc(
+    "Return the indices that would partition an array around a pivot",
+    ("Compute an array of indices that define a non-stable partition\n"
+     "around the `N`th smallest element of the input array.\n"

Review comment:
       For me it's not fully clear what this "partition around the Nth smallest element" means.

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -277,29 +278,80 @@ std::shared_ptr<ScalarFunction> MakeArithmeticFunction(std::string name) {
 // Like MakeArithmeticFunction, but for arithmetic ops that need to run
 // only on non-null output.
 template <typename Op>
-std::shared_ptr<ScalarFunction> MakeArithmeticFunctionNotNull(std::string name) {
-  auto func = std::make_shared<ScalarFunction>(name, Arity::Binary());
+std::shared_ptr<ScalarFunction> MakeArithmeticFunctionNotNull(std::string name,
+                                                              FunctionDoc doc) {
+  auto func = std::make_shared<ScalarFunction>(name, Arity::Binary(), std::move(doc));
   for (const auto& ty : NumericTypes()) {
     auto exec = NumericEqualTypesBinary<ScalarBinaryNotNullEqualTypes, Op>(ty);
     DCHECK_OK(func->AddKernel({ty, ty}, ty, exec));
   }
   return func;
 }
 
+FunctionDoc add_doc{"Add the elements of two arrays",

Review comment:
       Just checked, for np.add ufunc, they use "Add arguments element-wise"

##########
File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc
##########
@@ -277,29 +278,80 @@ std::shared_ptr<ScalarFunction> MakeArithmeticFunction(std::string name) {
 // Like MakeArithmeticFunction, but for arithmetic ops that need to run
 // only on non-null output.
 template <typename Op>
-std::shared_ptr<ScalarFunction> MakeArithmeticFunctionNotNull(std::string name) {
-  auto func = std::make_shared<ScalarFunction>(name, Arity::Binary());
+std::shared_ptr<ScalarFunction> MakeArithmeticFunctionNotNull(std::string name,
+                                                              FunctionDoc doc) {
+  auto func = std::make_shared<ScalarFunction>(name, Arity::Binary(), std::move(doc));
   for (const auto& ty : NumericTypes()) {
     auto exec = NumericEqualTypesBinary<ScalarBinaryNotNullEqualTypes, Op>(ty);
     DCHECK_OK(func->AddKernel({ty, ty}, ty, exec));
   }
   return func;
 }
 
+FunctionDoc add_doc{"Add the elements of two arrays",
+                    ("Results will wrap around on integer overflow.\n"
+                     "Use function \"add_checked\" if you want overflow\n"
+                     "to return an error."),
+                    {"x", "y"}};
+
+FunctionDoc add_checked_doc{
+    "Add the elements of two arrays",
+    ("This function returns an error on overflow.  For a variant that\n"
+     "doesn't fail on overflow, use function \"add\"."),
+    {"x", "y"}};
+
+FunctionDoc sub_doc{"Substract the elements of an array from another",
+                    ("Results will wrap around on integer overflow.\n"
+                     "Use function \"subtract_checked\" if you want overflow\n"
+                     "to return an error."),
+                    {"x", "y"}};
+
+FunctionDoc sub_checked_doc{
+    "Substract the elements of an array from another",
+    ("This function returns an error on overflow.  For a variant that\n"
+     "doesn't fail on overflow, use function \"subtract\"."),
+    {"x", "y"}};
+
+FunctionDoc mul_doc{"Multiply the elements of two arrays",
+                    ("Results will wrap around on integer overflow.\n"
+                     "Use function \"multiply_checked\" if you want overflow\n"
+                     "to return an error."),
+                    {"x", "y"}};
+
+FunctionDoc mul_checked_doc{
+    "Multiply the elements of two arrays",
+    ("This function returns an error on overflow.  For a variant that\n"
+     "doesn't fail on overflow, use function \"multiply\"."),
+    {"x", "y"}};
+
+FunctionDoc div_doc{"Divide the elements of an array by another",
+                    ("An error is returned when trying to divide by zero.\n"

Review comment:
       It might also be worth to mention that integer division doesn't return floats, but is basically a "rounded" division (also not a floor division it seems)
   
   (if this is intentional, of course)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
kszucs commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r504966687



##########
File path: python/pyarrow/compute.py
##########
@@ -130,32 +146,38 @@ def _handle_options(name, option_class, options, kwargs):
     return options
 
 
-def _simple_unary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
-    if option_class is not None:
-        def wrapper(arg, *, options=None, memory_pool=None, **kwargs):
-            options = _handle_options(name, option_class, options, kwargs)
-            return func.call([arg], options, memory_pool)
-    else:
-        def wrapper(arg, *, memory_pool=None):
-            return func.call([arg], None, memory_pool)
-
-    return _decorate_compute_function(wrapper, name, func, option_class)
-
-
-def _simple_binary_function(name):
-    func = get_function(name)
-    option_class = _option_classes.get(name)
-
+_wrapper_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, memory_pool=None):
+            return func.call([{args_sig}], None, memory_pool)
+        return {func_name}
+    """)
+
+_wrapper_options_template = dedent("""\
+    def make_wrapper(func, option_class):
+        def {func_name}({args_sig}, *, options=None, memory_pool=None,
+                        **kwargs):
+            options = _handle_options({func_name!r}, option_class, options,
+                                      kwargs)
+            return func.call([{args_sig}], options, memory_pool)
+        return {func_name}
+    """)
+
+
+def _wrap_function(name, func):
+    option_class = _get_options_class(func)
+    arg_names = _get_arg_names(func)
+    args_sig = ', '.join(arg_names)
+
+    # Generate templated wrapper, so that the signature matches
+    # the documented argument names.
+    ns = {}
     if option_class is not None:
-        def wrapper(left, right, *, options=None, memory_pool=None, **kwargs):
-            options = _handle_options(name, option_class, options, kwargs)
-            return func.call([left, right], options, memory_pool)
+        template = _wrapper_options_template
     else:
-        def wrapper(left, right, *, memory_pool=None):
-            return func.call([left, right], None, memory_pool)
+        template = _wrapper_template
+    exec(template.format(func_name=name, args_sig=args_sig), globals(), ns)

Review comment:
       Wouldn't it be better to set the `__signature__` property instead? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r505667275



##########
File path: cpp/src/arrow/compute/kernels/aggregate_var_std.cc
##########
@@ -180,20 +180,38 @@ void AddVarStdKernels(KernelInit init,
   }
 }
 
+FunctionDoc stddev_doc{
+    "Calculate the standard deviation of a numeric array",
+    ("The number of degrees of freedom can be controlled using VarianceOptions.\n"
+     "By default (`ddof` = 0), the population standard deviation is calculated.\n"
+     "Nulls are ignored.  If there are not enough non-null values in the array\n"
+     "to satisfy `ddof`, an error is returned."),

Review comment:
       Hmm, looks like you're right. I misread the tests.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #8457: ARROW-9164: [C++] Add embedded documentation to compute functions

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #8457:
URL: https://github.com/apache/arrow/pull/8457#discussion_r505701873



##########
File path: cpp/src/arrow/compute/kernels/vector_nested.cc
##########
@@ -65,18 +65,33 @@ Result<ValueDescr> ValuesType(KernelContext*, const std::vector<ValueDescr>& arg
   return ValueDescr::Array(list_type.value_type());
 }
 
+FunctionDoc list_flatten_doc(
+    "Flatten list values",
+    ("`lists` must have a list-like type.\n"
+     "Return an array with the top list level flattened.\n"
+     "Top-level null values in `lists` do not emit anything in the input."),
+    {"lists"});
+
+FunctionDoc list_parent_indices_doc(
+    "Compute parent indices of nested list values",
+    ("`lists` must have a list-like type.\n"
+     "For each value in each list of `lists`, the top-level list index\n"

Review comment:
       Going further down the docstring rabbithole: we could introduce a doctest-like class (of which a FunctionDoc contains a vector), which could just be a set of lambdas which generate a set of inputs, options, and maybe the expected output. These can then be rendered lazily in python




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org