You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/08 02:02:54 UTC

[GitHub] [arrow-cookbook] drin opened a new pull request, #227: 174: Adding recipe for custom compute functions

drin opened a new pull request, #227:
URL: https://github.com/apache/arrow-cookbook/pull/227

   This recipe shows the major portions of a custom, or new, compute
   function:
   - defining a compute kernel
   - creating a function instance
   - associating the kernel with the function
   - registering the function in a registry
   - calling the function
   
   The aliases are to keep the code as readable as possible, and also to frontload the various dependencies to make it obvious to the reader.
   
   The kernel implementation here is essentially the same as the new FastHash32 function in progress of being added. The reason for this is that there are examples of much simpler functions (AbsoluteValue), and this recipe also helps the reader understand how to call other functions and how to structure non-trivial returns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] drin commented on a diff in pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
drin commented on code in PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#discussion_r932515503


##########
cpp/source/compute.rst:
##########
@@ -0,0 +1,151 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+====================================
+Defining and Using Compute Functions
+====================================
+
+This section contains (or will contain) a number of recipes illustrating how to
+define new "compute functions" or how to use existing ones. Arrow contains a "Compute
+API," which primarily consists of a "registry" of functions that can be invoked.
+Currently, Arrow populates a default registry with a variety of useful functions. The
+recipes provided in this section show some approaches to define a compute function as well
+as how to invoke a compute function by name, given a registry.
+
+
+.. contents::
+
+Invoke a Compute Function
+=========================
+
+When invoking a compute function, the function must exist in a function registry. In this

Review Comment:
   oh yeah, I guess I didn't think about that being where it is accessed from. I think it's a good note



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] ksuarez1423 commented on a diff in pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
ksuarez1423 commented on code in PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#discussion_r932462563


##########
cpp/source/compute.rst:
##########
@@ -0,0 +1,151 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+====================================
+Defining and Using Compute Functions
+====================================
+
+This section contains (or will contain) a number of recipes illustrating how to
+define new "compute functions" or how to use existing ones. Arrow contains a "Compute
+API," which primarily consists of a "registry" of functions that can be invoked.
+Currently, Arrow populates a default registry with a variety of useful functions. The
+recipes provided in this section show some approaches to define a compute function as well
+as how to invoke a compute function by name, given a registry.
+
+
+.. contents::
+
+Invoke a Compute Function
+=========================
+
+When invoking a compute function, the function must exist in a function registry. In this

Review Comment:
   This isn't a tutorial, so we don't need to go too deep, but I think it would be worthwhile to say that a FunctionRegistry corresponds to an ExecContext. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] drin commented on a diff in pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
drin commented on code in PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#discussion_r932518234


##########
cpp/source/compute.rst:
##########
@@ -0,0 +1,151 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+====================================
+Defining and Using Compute Functions
+====================================
+
+This section contains (or will contain) a number of recipes illustrating how to
+define new "compute functions" or how to use existing ones. Arrow contains a "Compute
+API," which primarily consists of a "registry" of functions that can be invoked.
+Currently, Arrow populates a default registry with a variety of useful functions. The
+recipes provided in this section show some approaches to define a compute function as well
+as how to invoke a compute function by name, given a registry.
+
+
+.. contents::
+
+Invoke a Compute Function
+=========================
+
+When invoking a compute function, the function must exist in a function registry. In this
+recipe, we use `CallFunction()` to invoke the function with name "named_scalar_fn".
+
+.. recipe:: ../code/compute_fn.cc InvokeByCallFunction
+  :caption: Use CallFunction() to invoke a compute function by name
+  :dedent: 2
+
+.. note::
+    This method allows us to specify arguments as a vector and a custom ExecContext.
+
+If an `ExecContext` is not passed to `CallFunction` (it is null), then the default
+FunctionRegistry will be used to call the function from.

Review Comment:
   yeah, my code adds it then invokes it, but I also wanted to conceptually de-couple the 2.
   
   I think separate cookbook entries would have a lot of overlap, but I suppose it would make the concepts more accessible.
   
   For now, I can maybe re-word to mention that an empty ExecContext actually causes a default one to be used, and where I mention the link between ExecContext and FunctionRegistry above, I can say that the default ExecContext references the default FunctionRegistry



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] drin commented on a diff in pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
drin commented on code in PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#discussion_r933685435


##########
cpp/code/compute_fn.cc:
##########
@@ -0,0 +1,270 @@
+// ------------------------------
+// Dependencies
+
+// standard dependencies
+#include <stdint.h>
+#include <string>
+#include <iostream>
+
+// arrow dependencies
+#include <arrow/api.h>
+#include <arrow/compute/api.h>
+#include <arrow/compute/exec/key_hash.h>
+
+#include "common.h"
+
+
+// >> aliases for types in standard library
+using std::shared_ptr;
+using std::vector;
+
+// arrow util types
+using arrow::Result;
+using arrow::Status;
+using arrow::Datum;
+
+// arrow data types and helpers
+using arrow::UInt32Builder;
+using arrow::Int32Builder;
+
+using arrow::Array;
+using arrow::ArraySpan;
+
+
+// aliases for types used in `NamedScalarFn`
+//    |> kernel parameters
+using arrow::compute::KernelContext;
+using arrow::compute::ExecSpan;
+using arrow::compute::ExecResult;
+
+//    |> other context types
+using arrow::compute::ExecContext;
+using arrow::compute::LightContext;
+
+//    |> common types for compute functions
+using arrow::compute::FunctionRegistry;
+using arrow::compute::FunctionDoc;
+using arrow::compute::InputType;
+using arrow::compute::OutputType;
+using arrow::compute::Arity;
+
+//    |> the "kind" of function we want
+using arrow::compute::ScalarFunction;
+
+//    |> structs and classes for hashing
+using arrow::util::MiniBatch;
+using arrow::util::TempVectorStack;
+
+using arrow::compute::KeyColumnArray;
+using arrow::compute::Hashing32;
+
+//    |> functions used for hashing
+using arrow::compute::ColumnArrayFromArrayData;
+
+
+// ------------------------------
+// Structs and Classes
+
+// >> Documentation for a compute function
+/**
+ * Create a const instance of `FunctionDoc` that contains 3 attributes:
+ *  1. Short description
+ *  2. Long  description (limited to 78 characters)
+ *  3. Name of input arguments
+ */
+const FunctionDoc named_scalar_fn_doc {
+   "Unary function that calculates a hash for each row of the input"
+  ,"This function uses an xxHash-like algorithm which produces 32-bit hashes."
+  ,{ "input_array" }
+};
+
+
+// >> Kernel implementations for a compute function
+/**
+ * Create implementations that will be associated with our compute function. When a
+ * compute function is invoked, the compute API framework will delegate execution to an
+ * associated kernel that matches: (1) input argument types/shapes and (2) output argument
+ * types/shapes.
+ *
+ * Kernel implementations may be functions or may be methods (functions within a class or
+ * struct).
+ */
+struct NamedScalarFn {
+
+  /**
+   * A kernel implementation that expects a single array as input, and outputs an array of
+   * uint32 values. We write this implementation knowing what function we want to
+   * associate it with ("NamedScalarFn"), but that association is made later (see
+   * `RegisterScalarFnKernels()` below).
+   */
+  static Status
+  Exec(KernelContext *ctx, const ExecSpan &input_arg, ExecResult *out) {
+    StartRecipe("DefineAComputeKernel");

Review Comment:
   for now I just tried to use literal includes. it still references the file, and then I just set line numbers. Will experiment with the recipe stuff again in the future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] ksuarez1423 commented on a diff in pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
ksuarez1423 commented on code in PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#discussion_r932465001


##########
cpp/source/compute.rst:
##########
@@ -0,0 +1,151 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+====================================
+Defining and Using Compute Functions
+====================================
+
+This section contains (or will contain) a number of recipes illustrating how to
+define new "compute functions" or how to use existing ones. Arrow contains a "Compute
+API," which primarily consists of a "registry" of functions that can be invoked.
+Currently, Arrow populates a default registry with a variety of useful functions. The
+recipes provided in this section show some approaches to define a compute function as well
+as how to invoke a compute function by name, given a registry.
+
+
+.. contents::
+
+Invoke a Compute Function
+=========================
+
+When invoking a compute function, the function must exist in a function registry. In this
+recipe, we use `CallFunction()` to invoke the function with name "named_scalar_fn".
+
+.. recipe:: ../code/compute_fn.cc InvokeByCallFunction
+  :caption: Use CallFunction() to invoke a compute function by name
+  :dedent: 2
+
+.. note::
+    This method allows us to specify arguments as a vector and a custom ExecContext.
+
+If an `ExecContext` is not passed to `CallFunction` (it is null), then the default
+FunctionRegistry will be used to call the function from.
+
+If we have defined a convenience function that wraps `CallFunction()`, then we can call
+that function instead. Various compute functions provided by Arrow have these convenience
+functions defined, such as `Add` or `Subtract`.
+
+.. recipe:: ../code/compute_fn.cc InvokeByConvenienceFunction
+  :caption: Use a convenience invocation function to call a compute function
+  :dedent: 2
+
+
+Adding a Custom Compute Function
+================================
+
+To make a custom compute function available, there are 3 primary steps:
+
+1. Define kernels for the function (these implement the actual logic)
+
+2. Associate the kernels with a function object
+
+3. Add the function object to a function registry
+
+
+Define Function Kernels
+-----------------------
+
+A kernel is a particular function that implements desired logic for a compute function.

Review Comment:
   The information in this paragraph feels better suited to a tutorial -- if we just want to establish how to define an execution kernel, then I think the next paragraph is enough, but if an initialization kernel is necessary, that should also be an entry here. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] ksuarez1423 commented on a diff in pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
ksuarez1423 commented on code in PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#discussion_r932463541


##########
cpp/source/compute.rst:
##########
@@ -0,0 +1,151 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+====================================
+Defining and Using Compute Functions
+====================================
+
+This section contains (or will contain) a number of recipes illustrating how to
+define new "compute functions" or how to use existing ones. Arrow contains a "Compute
+API," which primarily consists of a "registry" of functions that can be invoked.
+Currently, Arrow populates a default registry with a variety of useful functions. The
+recipes provided in this section show some approaches to define a compute function as well
+as how to invoke a compute function by name, given a registry.
+
+
+.. contents::
+
+Invoke a Compute Function
+=========================
+
+When invoking a compute function, the function must exist in a function registry. In this
+recipe, we use `CallFunction()` to invoke the function with name "named_scalar_fn".
+
+.. recipe:: ../code/compute_fn.cc InvokeByCallFunction
+  :caption: Use CallFunction() to invoke a compute function by name
+  :dedent: 2
+
+.. note::
+    This method allows us to specify arguments as a vector and a custom ExecContext.
+
+If an `ExecContext` is not passed to `CallFunction` (it is null), then the default
+FunctionRegistry will be used to call the function from.

Review Comment:
   Referring to the default FunctionRegistry here makes me wonder if this should be the other way around -- adding functions then invoking them. Or even if these should be two separate cookbook entries -- not quite sure, just want the idea to float around.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] lidavidm commented on pull request #227: [C++] Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#issuecomment-1227153041

   I kicked off CI, but not sure if I'll have time to look in more detail right now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] drin commented on a diff in pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
drin commented on code in PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#discussion_r930117289


##########
cpp/code/compute_fn.cc:
##########
@@ -0,0 +1,270 @@
+// ------------------------------
+// Dependencies
+
+// standard dependencies
+#include <stdint.h>
+#include <string>
+#include <iostream>
+
+// arrow dependencies
+#include <arrow/api.h>
+#include <arrow/compute/api.h>
+#include <arrow/compute/exec/key_hash.h>
+
+#include "common.h"
+
+
+// >> aliases for types in standard library
+using std::shared_ptr;
+using std::vector;
+
+// arrow util types
+using arrow::Result;
+using arrow::Status;
+using arrow::Datum;
+
+// arrow data types and helpers
+using arrow::UInt32Builder;
+using arrow::Int32Builder;
+
+using arrow::Array;
+using arrow::ArraySpan;
+
+
+// aliases for types used in `NamedScalarFn`
+//    |> kernel parameters
+using arrow::compute::KernelContext;
+using arrow::compute::ExecSpan;
+using arrow::compute::ExecResult;
+
+//    |> other context types
+using arrow::compute::ExecContext;
+using arrow::compute::LightContext;
+
+//    |> common types for compute functions
+using arrow::compute::FunctionRegistry;
+using arrow::compute::FunctionDoc;
+using arrow::compute::InputType;
+using arrow::compute::OutputType;
+using arrow::compute::Arity;
+
+//    |> the "kind" of function we want
+using arrow::compute::ScalarFunction;
+
+//    |> structs and classes for hashing
+using arrow::util::MiniBatch;
+using arrow::util::TempVectorStack;
+
+using arrow::compute::KeyColumnArray;
+using arrow::compute::Hashing32;
+
+//    |> functions used for hashing
+using arrow::compute::ColumnArrayFromArrayData;
+
+
+// ------------------------------
+// Structs and Classes
+
+// >> Documentation for a compute function
+/**
+ * Create a const instance of `FunctionDoc` that contains 3 attributes:
+ *  1. Short description
+ *  2. Long  description (limited to 78 characters)
+ *  3. Name of input arguments
+ */
+const FunctionDoc named_scalar_fn_doc {
+   "Unary function that calculates a hash for each row of the input"
+  ,"This function uses an xxHash-like algorithm which produces 32-bit hashes."
+  ,{ "input_array" }
+};
+
+
+// >> Kernel implementations for a compute function
+/**
+ * Create implementations that will be associated with our compute function. When a
+ * compute function is invoked, the compute API framework will delegate execution to an
+ * associated kernel that matches: (1) input argument types/shapes and (2) output argument
+ * types/shapes.
+ *
+ * Kernel implementations may be functions or may be methods (functions within a class or
+ * struct).
+ */
+struct NamedScalarFn {
+
+  /**
+   * A kernel implementation that expects a single array as input, and outputs an array of
+   * uint32 values. We write this implementation knowing what function we want to
+   * associate it with ("NamedScalarFn"), but that association is made later (see
+   * `RegisterScalarFnKernels()` below).
+   */
+  static Status
+  Exec(KernelContext *ctx, const ExecSpan &input_arg, ExecResult *out) {
+    StartRecipe("DefineAComputeKernel");
+
+    if (input_arg.num_values() != 1 or not input_arg[0].is_array()) {
+      return Status::Invalid("Unsupported argument types or shape");
+    }
+
+    // >> Initialize stack-based memory allocator with an allocator and memory size
+    TempVectorStack stack_memallocator;
+    auto            input_dtype_width = input_arg[0].type()->bit_width();
+    if (input_dtype_width > 0) {
+      ARROW_RETURN_NOT_OK(
+        stack_memallocator.Init(
+           ctx->exec_context()->memory_pool()
+          ,input_dtype_width * max_batchsize
+        )
+      );
+    }
+
+    // >> Prepare input data structure for propagation to hash function
+    // NOTE: "start row index" and "row count" can potentially be options in the future
+    ArraySpan hash_input    = input_arg[0].array;
+    int64_t   hash_startrow = 0;
+    int64_t   hash_rowcount = hash_input.length;
+    ARROW_ASSIGN_OR_RAISE(
+       KeyColumnArray input_keycol
+      ,ColumnArrayFromArrayData(hash_input.ToArrayData(), hash_startrow, hash_rowcount)
+    );

Review Comment:
   I can update to use `util/hashing.h`. I didn't know about it at the time, but now I have some experience with it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] drin commented on pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
drin commented on PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#issuecomment-1195642849

   thanks for the comments! I will get around to the reST file today. I was putting it off a bit but today feels like a good day for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] lidavidm commented on pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#issuecomment-1195597541

   Thanks for adding this, I'll take a look.
   
   Side note: we should probably just set up clang-format for C++ examples…


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] lidavidm commented on a diff in pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
lidavidm commented on code in PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#discussion_r930077003


##########
cpp/code/compute_fn.cc:
##########
@@ -0,0 +1,270 @@
+// ------------------------------
+// Dependencies
+
+// standard dependencies
+#include <stdint.h>
+#include <string>
+#include <iostream>
+
+// arrow dependencies
+#include <arrow/api.h>
+#include <arrow/compute/api.h>
+#include <arrow/compute/exec/key_hash.h>
+
+#include "common.h"
+
+
+// >> aliases for types in standard library
+using std::shared_ptr;
+using std::vector;
+
+// arrow util types
+using arrow::Result;
+using arrow::Status;
+using arrow::Datum;
+
+// arrow data types and helpers
+using arrow::UInt32Builder;
+using arrow::Int32Builder;
+
+using arrow::Array;
+using arrow::ArraySpan;
+
+
+// aliases for types used in `NamedScalarFn`
+//    |> kernel parameters
+using arrow::compute::KernelContext;
+using arrow::compute::ExecSpan;
+using arrow::compute::ExecResult;
+
+//    |> other context types
+using arrow::compute::ExecContext;
+using arrow::compute::LightContext;
+
+//    |> common types for compute functions
+using arrow::compute::FunctionRegistry;
+using arrow::compute::FunctionDoc;
+using arrow::compute::InputType;
+using arrow::compute::OutputType;
+using arrow::compute::Arity;
+
+//    |> the "kind" of function we want
+using arrow::compute::ScalarFunction;
+
+//    |> structs and classes for hashing
+using arrow::util::MiniBatch;
+using arrow::util::TempVectorStack;
+
+using arrow::compute::KeyColumnArray;
+using arrow::compute::Hashing32;
+
+//    |> functions used for hashing
+using arrow::compute::ColumnArrayFromArrayData;
+
+
+// ------------------------------
+// Structs and Classes
+
+// >> Documentation for a compute function
+/**
+ * Create a const instance of `FunctionDoc` that contains 3 attributes:
+ *  1. Short description
+ *  2. Long  description (limited to 78 characters)

Review Comment:
   78 characters wide right?



##########
cpp/code/compute_fn.cc:
##########
@@ -0,0 +1,270 @@
+// ------------------------------
+// Dependencies
+
+// standard dependencies
+#include <stdint.h>
+#include <string>
+#include <iostream>
+
+// arrow dependencies
+#include <arrow/api.h>
+#include <arrow/compute/api.h>
+#include <arrow/compute/exec/key_hash.h>
+
+#include "common.h"
+
+
+// >> aliases for types in standard library
+using std::shared_ptr;
+using std::vector;
+
+// arrow util types
+using arrow::Result;
+using arrow::Status;
+using arrow::Datum;
+
+// arrow data types and helpers
+using arrow::UInt32Builder;
+using arrow::Int32Builder;
+
+using arrow::Array;
+using arrow::ArraySpan;
+
+
+// aliases for types used in `NamedScalarFn`
+//    |> kernel parameters
+using arrow::compute::KernelContext;
+using arrow::compute::ExecSpan;
+using arrow::compute::ExecResult;
+
+//    |> other context types
+using arrow::compute::ExecContext;
+using arrow::compute::LightContext;
+
+//    |> common types for compute functions
+using arrow::compute::FunctionRegistry;
+using arrow::compute::FunctionDoc;
+using arrow::compute::InputType;
+using arrow::compute::OutputType;
+using arrow::compute::Arity;
+
+//    |> the "kind" of function we want
+using arrow::compute::ScalarFunction;
+
+//    |> structs and classes for hashing
+using arrow::util::MiniBatch;
+using arrow::util::TempVectorStack;
+
+using arrow::compute::KeyColumnArray;
+using arrow::compute::Hashing32;
+
+//    |> functions used for hashing
+using arrow::compute::ColumnArrayFromArrayData;
+
+
+// ------------------------------
+// Structs and Classes
+
+// >> Documentation for a compute function
+/**
+ * Create a const instance of `FunctionDoc` that contains 3 attributes:
+ *  1. Short description
+ *  2. Long  description (limited to 78 characters)
+ *  3. Name of input arguments
+ */
+const FunctionDoc named_scalar_fn_doc {
+   "Unary function that calculates a hash for each row of the input"
+  ,"This function uses an xxHash-like algorithm which produces 32-bit hashes."
+  ,{ "input_array" }
+};
+
+
+// >> Kernel implementations for a compute function
+/**
+ * Create implementations that will be associated with our compute function. When a
+ * compute function is invoked, the compute API framework will delegate execution to an
+ * associated kernel that matches: (1) input argument types/shapes and (2) output argument
+ * types/shapes.
+ *
+ * Kernel implementations may be functions or may be methods (functions within a class or
+ * struct).
+ */
+struct NamedScalarFn {
+
+  /**
+   * A kernel implementation that expects a single array as input, and outputs an array of
+   * uint32 values. We write this implementation knowing what function we want to
+   * associate it with ("NamedScalarFn"), but that association is made later (see
+   * `RegisterScalarFnKernels()` below).
+   */
+  static Status
+  Exec(KernelContext *ctx, const ExecSpan &input_arg, ExecResult *out) {
+    StartRecipe("DefineAComputeKernel");
+
+    if (input_arg.num_values() != 1 or not input_arg[0].is_array()) {
+      return Status::Invalid("Unsupported argument types or shape");
+    }
+
+    // >> Initialize stack-based memory allocator with an allocator and memory size
+    TempVectorStack stack_memallocator;
+    auto            input_dtype_width = input_arg[0].type()->bit_width();
+    if (input_dtype_width > 0) {
+      ARROW_RETURN_NOT_OK(
+        stack_memallocator.Init(
+           ctx->exec_context()->memory_pool()
+          ,input_dtype_width * max_batchsize
+        )
+      );
+    }
+
+    // >> Prepare input data structure for propagation to hash function
+    // NOTE: "start row index" and "row count" can potentially be options in the future
+    ArraySpan hash_input    = input_arg[0].array;
+    int64_t   hash_startrow = 0;
+    int64_t   hash_rowcount = hash_input.length;
+    ARROW_ASSIGN_OR_RAISE(
+       KeyColumnArray input_keycol
+      ,ColumnArrayFromArrayData(hash_input.ToArrayData(), hash_startrow, hash_rowcount)
+    );

Review Comment:
   While for an actual implementation, we would want to reuse Arrow hashing code, I wonder if this example would be clearer/more focused if we implemented a hash function inline (even a fairly trivial one) and removed the use of semi-internal APIs



##########
cpp/code/compute_fn.cc:
##########
@@ -0,0 +1,270 @@
+// ------------------------------
+// Dependencies
+
+// standard dependencies
+#include <stdint.h>
+#include <string>
+#include <iostream>
+
+// arrow dependencies
+#include <arrow/api.h>
+#include <arrow/compute/api.h>
+#include <arrow/compute/exec/key_hash.h>
+
+#include "common.h"
+
+
+// >> aliases for types in standard library
+using std::shared_ptr;
+using std::vector;
+
+// arrow util types
+using arrow::Result;
+using arrow::Status;
+using arrow::Datum;
+
+// arrow data types and helpers
+using arrow::UInt32Builder;
+using arrow::Int32Builder;
+
+using arrow::Array;
+using arrow::ArraySpan;
+
+
+// aliases for types used in `NamedScalarFn`
+//    |> kernel parameters
+using arrow::compute::KernelContext;
+using arrow::compute::ExecSpan;
+using arrow::compute::ExecResult;
+
+//    |> other context types
+using arrow::compute::ExecContext;
+using arrow::compute::LightContext;
+
+//    |> common types for compute functions
+using arrow::compute::FunctionRegistry;
+using arrow::compute::FunctionDoc;
+using arrow::compute::InputType;
+using arrow::compute::OutputType;
+using arrow::compute::Arity;
+
+//    |> the "kind" of function we want
+using arrow::compute::ScalarFunction;
+
+//    |> structs and classes for hashing
+using arrow::util::MiniBatch;
+using arrow::util::TempVectorStack;
+
+using arrow::compute::KeyColumnArray;
+using arrow::compute::Hashing32;
+
+//    |> functions used for hashing
+using arrow::compute::ColumnArrayFromArrayData;
+
+
+// ------------------------------
+// Structs and Classes
+
+// >> Documentation for a compute function
+/**
+ * Create a const instance of `FunctionDoc` that contains 3 attributes:
+ *  1. Short description
+ *  2. Long  description (limited to 78 characters)
+ *  3. Name of input arguments
+ */
+const FunctionDoc named_scalar_fn_doc {
+   "Unary function that calculates a hash for each row of the input"
+  ,"This function uses an xxHash-like algorithm which produces 32-bit hashes."
+  ,{ "input_array" }
+};
+
+
+// >> Kernel implementations for a compute function
+/**
+ * Create implementations that will be associated with our compute function. When a
+ * compute function is invoked, the compute API framework will delegate execution to an
+ * associated kernel that matches: (1) input argument types/shapes and (2) output argument
+ * types/shapes.
+ *
+ * Kernel implementations may be functions or may be methods (functions within a class or
+ * struct).
+ */
+struct NamedScalarFn {
+
+  /**
+   * A kernel implementation that expects a single array as input, and outputs an array of
+   * uint32 values. We write this implementation knowing what function we want to
+   * associate it with ("NamedScalarFn"), but that association is made later (see
+   * `RegisterScalarFnKernels()` below).
+   */
+  static Status
+  Exec(KernelContext *ctx, const ExecSpan &input_arg, ExecResult *out) {
+    StartRecipe("DefineAComputeKernel");
+
+    if (input_arg.num_values() != 1 or not input_arg[0].is_array()) {
+      return Status::Invalid("Unsupported argument types or shape");
+    }
+
+    // >> Initialize stack-based memory allocator with an allocator and memory size
+    TempVectorStack stack_memallocator;
+    auto            input_dtype_width = input_arg[0].type()->bit_width();
+    if (input_dtype_width > 0) {
+      ARROW_RETURN_NOT_OK(
+        stack_memallocator.Init(
+           ctx->exec_context()->memory_pool()
+          ,input_dtype_width * max_batchsize
+        )
+      );
+    }
+
+    // >> Prepare input data structure for propagation to hash function
+    // NOTE: "start row index" and "row count" can potentially be options in the future

Review Comment:
   You would just slice the input (0-copy). Scalar functions have to provide a row of output per row of input so such options wouldn't make sense.



##########
cpp/code/compute_fn.cc:
##########
@@ -0,0 +1,270 @@
+// ------------------------------
+// Dependencies
+
+// standard dependencies
+#include <stdint.h>
+#include <string>
+#include <iostream>
+
+// arrow dependencies
+#include <arrow/api.h>
+#include <arrow/compute/api.h>
+#include <arrow/compute/exec/key_hash.h>
+
+#include "common.h"
+
+
+// >> aliases for types in standard library
+using std::shared_ptr;
+using std::vector;
+
+// arrow util types
+using arrow::Result;
+using arrow::Status;
+using arrow::Datum;
+
+// arrow data types and helpers
+using arrow::UInt32Builder;
+using arrow::Int32Builder;
+
+using arrow::Array;
+using arrow::ArraySpan;
+
+
+// aliases for types used in `NamedScalarFn`
+//    |> kernel parameters
+using arrow::compute::KernelContext;
+using arrow::compute::ExecSpan;
+using arrow::compute::ExecResult;
+
+//    |> other context types
+using arrow::compute::ExecContext;
+using arrow::compute::LightContext;
+
+//    |> common types for compute functions
+using arrow::compute::FunctionRegistry;
+using arrow::compute::FunctionDoc;
+using arrow::compute::InputType;
+using arrow::compute::OutputType;
+using arrow::compute::Arity;
+
+//    |> the "kind" of function we want
+using arrow::compute::ScalarFunction;
+
+//    |> structs and classes for hashing
+using arrow::util::MiniBatch;
+using arrow::util::TempVectorStack;
+
+using arrow::compute::KeyColumnArray;
+using arrow::compute::Hashing32;
+
+//    |> functions used for hashing
+using arrow::compute::ColumnArrayFromArrayData;
+
+
+// ------------------------------
+// Structs and Classes
+
+// >> Documentation for a compute function
+/**
+ * Create a const instance of `FunctionDoc` that contains 3 attributes:
+ *  1. Short description
+ *  2. Long  description (limited to 78 characters)
+ *  3. Name of input arguments
+ */
+const FunctionDoc named_scalar_fn_doc {
+   "Unary function that calculates a hash for each row of the input"
+  ,"This function uses an xxHash-like algorithm which produces 32-bit hashes."
+  ,{ "input_array" }
+};
+
+
+// >> Kernel implementations for a compute function
+/**
+ * Create implementations that will be associated with our compute function. When a
+ * compute function is invoked, the compute API framework will delegate execution to an
+ * associated kernel that matches: (1) input argument types/shapes and (2) output argument
+ * types/shapes.
+ *
+ * Kernel implementations may be functions or may be methods (functions within a class or
+ * struct).
+ */
+struct NamedScalarFn {
+
+  /**
+   * A kernel implementation that expects a single array as input, and outputs an array of
+   * uint32 values. We write this implementation knowing what function we want to
+   * associate it with ("NamedScalarFn"), but that association is made later (see
+   * `RegisterScalarFnKernels()` below).
+   */
+  static Status
+  Exec(KernelContext *ctx, const ExecSpan &input_arg, ExecResult *out) {
+    StartRecipe("DefineAComputeKernel");
+
+    if (input_arg.num_values() != 1 or not input_arg[0].is_array()) {
+      return Status::Invalid("Unsupported argument types or shape");
+    }
+
+    // >> Initialize stack-based memory allocator with an allocator and memory size
+    TempVectorStack stack_memallocator;
+    auto            input_dtype_width = input_arg[0].type()->bit_width();
+    if (input_dtype_width > 0) {
+      ARROW_RETURN_NOT_OK(
+        stack_memallocator.Init(
+           ctx->exec_context()->memory_pool()
+          ,input_dtype_width * max_batchsize
+        )
+      );
+    }

Review Comment:
   Wouldn't this fail for 0-width types (if that's even a thing)? Or really, is the conditional really necessary?



##########
cpp/code/compute_fn.cc:
##########
@@ -0,0 +1,270 @@
+// ------------------------------
+// Dependencies
+
+// standard dependencies
+#include <stdint.h>
+#include <string>
+#include <iostream>
+
+// arrow dependencies
+#include <arrow/api.h>
+#include <arrow/compute/api.h>
+#include <arrow/compute/exec/key_hash.h>
+
+#include "common.h"
+
+
+// >> aliases for types in standard library
+using std::shared_ptr;
+using std::vector;
+
+// arrow util types
+using arrow::Result;
+using arrow::Status;
+using arrow::Datum;
+
+// arrow data types and helpers
+using arrow::UInt32Builder;
+using arrow::Int32Builder;
+
+using arrow::Array;
+using arrow::ArraySpan;
+
+
+// aliases for types used in `NamedScalarFn`
+//    |> kernel parameters
+using arrow::compute::KernelContext;
+using arrow::compute::ExecSpan;
+using arrow::compute::ExecResult;
+
+//    |> other context types
+using arrow::compute::ExecContext;
+using arrow::compute::LightContext;
+
+//    |> common types for compute functions
+using arrow::compute::FunctionRegistry;
+using arrow::compute::FunctionDoc;
+using arrow::compute::InputType;
+using arrow::compute::OutputType;
+using arrow::compute::Arity;
+
+//    |> the "kind" of function we want
+using arrow::compute::ScalarFunction;
+
+//    |> structs and classes for hashing
+using arrow::util::MiniBatch;
+using arrow::util::TempVectorStack;
+
+using arrow::compute::KeyColumnArray;
+using arrow::compute::Hashing32;
+
+//    |> functions used for hashing
+using arrow::compute::ColumnArrayFromArrayData;
+
+
+// ------------------------------
+// Structs and Classes
+
+// >> Documentation for a compute function
+/**
+ * Create a const instance of `FunctionDoc` that contains 3 attributes:
+ *  1. Short description
+ *  2. Long  description (limited to 78 characters)
+ *  3. Name of input arguments
+ */
+const FunctionDoc named_scalar_fn_doc {
+   "Unary function that calculates a hash for each row of the input"
+  ,"This function uses an xxHash-like algorithm which produces 32-bit hashes."
+  ,{ "input_array" }
+};
+
+
+// >> Kernel implementations for a compute function
+/**
+ * Create implementations that will be associated with our compute function. When a
+ * compute function is invoked, the compute API framework will delegate execution to an
+ * associated kernel that matches: (1) input argument types/shapes and (2) output argument
+ * types/shapes.
+ *
+ * Kernel implementations may be functions or may be methods (functions within a class or
+ * struct).
+ */
+struct NamedScalarFn {
+
+  /**
+   * A kernel implementation that expects a single array as input, and outputs an array of
+   * uint32 values. We write this implementation knowing what function we want to
+   * associate it with ("NamedScalarFn"), but that association is made later (see
+   * `RegisterScalarFnKernels()` below).
+   */
+  static Status
+  Exec(KernelContext *ctx, const ExecSpan &input_arg, ExecResult *out) {
+    StartRecipe("DefineAComputeKernel");

Review Comment:
   The StartRecipe/EndRecipe might be a little too limiting in this case, since presumably we want to show the entire struct



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] drin commented on pull request #227: [C++] Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
drin commented on PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#issuecomment-1225999482

   @wjones127 or @lidavidm if either of you have time, could you check this out?
   
   I wasn't able to locally confirm that code excerpt directives worked correctly, but I am hoping to reference this cookbook from documentation for authoring compute kernels


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] wjones127 commented on pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
wjones127 commented on PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#issuecomment-1180543034

   In order to get these recipes to appear on https://arrow.apache.org/cookbook/cpp/index.html, we should add a `compute.rst` file in `cpp/source` that provides some narrative around these examples. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] drin commented on pull request #227: [C++] Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
drin commented on PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#issuecomment-1230740207

   I am having trouble setting up a build to figure out why it couldn't find the recipe, so I'm just going to shelve it for a bit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] drin commented on a diff in pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
drin commented on code in PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#discussion_r930115323


##########
cpp/code/compute_fn.cc:
##########
@@ -0,0 +1,270 @@
+// ------------------------------
+// Dependencies
+
+// standard dependencies
+#include <stdint.h>
+#include <string>
+#include <iostream>
+
+// arrow dependencies
+#include <arrow/api.h>
+#include <arrow/compute/api.h>
+#include <arrow/compute/exec/key_hash.h>
+
+#include "common.h"
+
+
+// >> aliases for types in standard library
+using std::shared_ptr;
+using std::vector;
+
+// arrow util types
+using arrow::Result;
+using arrow::Status;
+using arrow::Datum;
+
+// arrow data types and helpers
+using arrow::UInt32Builder;
+using arrow::Int32Builder;
+
+using arrow::Array;
+using arrow::ArraySpan;
+
+
+// aliases for types used in `NamedScalarFn`
+//    |> kernel parameters
+using arrow::compute::KernelContext;
+using arrow::compute::ExecSpan;
+using arrow::compute::ExecResult;
+
+//    |> other context types
+using arrow::compute::ExecContext;
+using arrow::compute::LightContext;
+
+//    |> common types for compute functions
+using arrow::compute::FunctionRegistry;
+using arrow::compute::FunctionDoc;
+using arrow::compute::InputType;
+using arrow::compute::OutputType;
+using arrow::compute::Arity;
+
+//    |> the "kind" of function we want
+using arrow::compute::ScalarFunction;
+
+//    |> structs and classes for hashing
+using arrow::util::MiniBatch;
+using arrow::util::TempVectorStack;
+
+using arrow::compute::KeyColumnArray;
+using arrow::compute::Hashing32;
+
+//    |> functions used for hashing
+using arrow::compute::ColumnArrayFromArrayData;
+
+
+// ------------------------------
+// Structs and Classes
+
+// >> Documentation for a compute function
+/**
+ * Create a const instance of `FunctionDoc` that contains 3 attributes:
+ *  1. Short description
+ *  2. Long  description (limited to 78 characters)
+ *  3. Name of input arguments
+ */
+const FunctionDoc named_scalar_fn_doc {
+   "Unary function that calculates a hash for each row of the input"
+  ,"This function uses an xxHash-like algorithm which produces 32-bit hashes."
+  ,{ "input_array" }
+};
+
+
+// >> Kernel implementations for a compute function
+/**
+ * Create implementations that will be associated with our compute function. When a
+ * compute function is invoked, the compute API framework will delegate execution to an
+ * associated kernel that matches: (1) input argument types/shapes and (2) output argument
+ * types/shapes.
+ *
+ * Kernel implementations may be functions or may be methods (functions within a class or
+ * struct).
+ */
+struct NamedScalarFn {
+
+  /**
+   * A kernel implementation that expects a single array as input, and outputs an array of
+   * uint32 values. We write this implementation knowing what function we want to
+   * associate it with ("NamedScalarFn"), but that association is made later (see
+   * `RegisterScalarFnKernels()` below).
+   */
+  static Status
+  Exec(KernelContext *ctx, const ExecSpan &input_arg, ExecResult *out) {
+    StartRecipe("DefineAComputeKernel");
+
+    if (input_arg.num_values() != 1 or not input_arg[0].is_array()) {
+      return Status::Invalid("Unsupported argument types or shape");
+    }
+
+    // >> Initialize stack-based memory allocator with an allocator and memory size
+    TempVectorStack stack_memallocator;
+    auto            input_dtype_width = input_arg[0].type()->bit_width();
+    if (input_dtype_width > 0) {
+      ARROW_RETURN_NOT_OK(
+        stack_memallocator.Init(
+           ctx->exec_context()->memory_pool()
+          ,input_dtype_width * max_batchsize
+        )
+      );
+    }
+
+    // >> Prepare input data structure for propagation to hash function
+    // NOTE: "start row index" and "row count" can potentially be options in the future

Review Comment:
   oh yeah, you're right



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] drin commented on a diff in pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
drin commented on code in PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#discussion_r930112083


##########
cpp/code/compute_fn.cc:
##########
@@ -0,0 +1,270 @@
+// ------------------------------
+// Dependencies
+
+// standard dependencies
+#include <stdint.h>
+#include <string>
+#include <iostream>
+
+// arrow dependencies
+#include <arrow/api.h>
+#include <arrow/compute/api.h>
+#include <arrow/compute/exec/key_hash.h>
+
+#include "common.h"
+
+
+// >> aliases for types in standard library
+using std::shared_ptr;
+using std::vector;
+
+// arrow util types
+using arrow::Result;
+using arrow::Status;
+using arrow::Datum;
+
+// arrow data types and helpers
+using arrow::UInt32Builder;
+using arrow::Int32Builder;
+
+using arrow::Array;
+using arrow::ArraySpan;
+
+
+// aliases for types used in `NamedScalarFn`
+//    |> kernel parameters
+using arrow::compute::KernelContext;
+using arrow::compute::ExecSpan;
+using arrow::compute::ExecResult;
+
+//    |> other context types
+using arrow::compute::ExecContext;
+using arrow::compute::LightContext;
+
+//    |> common types for compute functions
+using arrow::compute::FunctionRegistry;
+using arrow::compute::FunctionDoc;
+using arrow::compute::InputType;
+using arrow::compute::OutputType;
+using arrow::compute::Arity;
+
+//    |> the "kind" of function we want
+using arrow::compute::ScalarFunction;
+
+//    |> structs and classes for hashing
+using arrow::util::MiniBatch;
+using arrow::util::TempVectorStack;
+
+using arrow::compute::KeyColumnArray;
+using arrow::compute::Hashing32;
+
+//    |> functions used for hashing
+using arrow::compute::ColumnArrayFromArrayData;
+
+
+// ------------------------------
+// Structs and Classes
+
+// >> Documentation for a compute function
+/**
+ * Create a const instance of `FunctionDoc` that contains 3 attributes:
+ *  1. Short description
+ *  2. Long  description (limited to 78 characters)

Review Comment:
   yeah. I didn't realize this at the time, so I'll update it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] drin commented on a diff in pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
drin commented on code in PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#discussion_r931514049


##########
cpp/code/compute_fn.cc:
##########
@@ -0,0 +1,270 @@
+// ------------------------------
+// Dependencies
+
+// standard dependencies
+#include <stdint.h>
+#include <string>
+#include <iostream>
+
+// arrow dependencies
+#include <arrow/api.h>
+#include <arrow/compute/api.h>
+#include <arrow/compute/exec/key_hash.h>
+
+#include "common.h"
+
+
+// >> aliases for types in standard library
+using std::shared_ptr;
+using std::vector;
+
+// arrow util types
+using arrow::Result;
+using arrow::Status;
+using arrow::Datum;
+
+// arrow data types and helpers
+using arrow::UInt32Builder;
+using arrow::Int32Builder;
+
+using arrow::Array;
+using arrow::ArraySpan;
+
+
+// aliases for types used in `NamedScalarFn`
+//    |> kernel parameters
+using arrow::compute::KernelContext;
+using arrow::compute::ExecSpan;
+using arrow::compute::ExecResult;
+
+//    |> other context types
+using arrow::compute::ExecContext;
+using arrow::compute::LightContext;
+
+//    |> common types for compute functions
+using arrow::compute::FunctionRegistry;
+using arrow::compute::FunctionDoc;
+using arrow::compute::InputType;
+using arrow::compute::OutputType;
+using arrow::compute::Arity;
+
+//    |> the "kind" of function we want
+using arrow::compute::ScalarFunction;
+
+//    |> structs and classes for hashing
+using arrow::util::MiniBatch;
+using arrow::util::TempVectorStack;
+
+using arrow::compute::KeyColumnArray;
+using arrow::compute::Hashing32;
+
+//    |> functions used for hashing
+using arrow::compute::ColumnArrayFromArrayData;
+
+
+// ------------------------------
+// Structs and Classes
+
+// >> Documentation for a compute function
+/**
+ * Create a const instance of `FunctionDoc` that contains 3 attributes:
+ *  1. Short description
+ *  2. Long  description (limited to 78 characters)
+ *  3. Name of input arguments
+ */
+const FunctionDoc named_scalar_fn_doc {
+   "Unary function that calculates a hash for each row of the input"
+  ,"This function uses an xxHash-like algorithm which produces 32-bit hashes."
+  ,{ "input_array" }
+};
+
+
+// >> Kernel implementations for a compute function
+/**
+ * Create implementations that will be associated with our compute function. When a
+ * compute function is invoked, the compute API framework will delegate execution to an
+ * associated kernel that matches: (1) input argument types/shapes and (2) output argument
+ * types/shapes.
+ *
+ * Kernel implementations may be functions or may be methods (functions within a class or
+ * struct).
+ */
+struct NamedScalarFn {
+
+  /**
+   * A kernel implementation that expects a single array as input, and outputs an array of
+   * uint32 values. We write this implementation knowing what function we want to
+   * associate it with ("NamedScalarFn"), but that association is made later (see
+   * `RegisterScalarFnKernels()` below).
+   */
+  static Status
+  Exec(KernelContext *ctx, const ExecSpan &input_arg, ExecResult *out) {
+    StartRecipe("DefineAComputeKernel");

Review Comment:
   I'll look at how the Start/End Recipe functions work and figure out how to include the struct itself



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] drin commented on pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
drin commented on PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#issuecomment-1197475897

   rebased to try and build the cookbooks, but I am not sure how to build the documentation for the C++ code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] drin commented on a diff in pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
drin commented on code in PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#discussion_r932537934


##########
cpp/source/compute.rst:
##########
@@ -0,0 +1,151 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+====================================
+Defining and Using Compute Functions
+====================================
+
+This section contains (or will contain) a number of recipes illustrating how to
+define new "compute functions" or how to use existing ones. Arrow contains a "Compute
+API," which primarily consists of a "registry" of functions that can be invoked.
+Currently, Arrow populates a default registry with a variety of useful functions. The
+recipes provided in this section show some approaches to define a compute function as well
+as how to invoke a compute function by name, given a registry.
+
+
+.. contents::
+
+Invoke a Compute Function
+=========================
+
+When invoking a compute function, the function must exist in a function registry. In this
+recipe, we use `CallFunction()` to invoke the function with name "named_scalar_fn".
+
+.. recipe:: ../code/compute_fn.cc InvokeByCallFunction
+  :caption: Use CallFunction() to invoke a compute function by name
+  :dedent: 2
+
+.. note::
+    This method allows us to specify arguments as a vector and a custom ExecContext.
+
+If an `ExecContext` is not passed to `CallFunction` (it is null), then the default
+FunctionRegistry will be used to call the function from.
+
+If we have defined a convenience function that wraps `CallFunction()`, then we can call
+that function instead. Various compute functions provided by Arrow have these convenience
+functions defined, such as `Add` or `Subtract`.
+
+.. recipe:: ../code/compute_fn.cc InvokeByConvenienceFunction
+  :caption: Use a convenience invocation function to call a compute function
+  :dedent: 2
+
+
+Adding a Custom Compute Function
+================================
+
+To make a custom compute function available, there are 3 primary steps:
+
+1. Define kernels for the function (these implement the actual logic)
+
+2. Associate the kernels with a function object
+
+3. Add the function object to a function registry
+
+
+Define Function Kernels
+-----------------------
+
+A kernel is a particular function that implements desired logic for a compute function.

Review Comment:
   mmmm.. I wrote the paragraph to help contextualize what the recipe covers and what it doesn't. Also to implicitly hint at why the kernel function is named `Exec` and clarify that both an execution kernel and intialization kernel are basically just functions associated with "the compute function". Also that they are associated as a pair, so initialization and execution kernels are not dispatched individually.
   
   I definitely see your point, so I don't know if I should just move this to some other documentation which may not yet exist or if I should leave it here to be moved later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] ksuarez1423 commented on a diff in pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
ksuarez1423 commented on code in PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#discussion_r932525354


##########
cpp/source/compute.rst:
##########
@@ -0,0 +1,151 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+====================================
+Defining and Using Compute Functions
+====================================
+
+This section contains (or will contain) a number of recipes illustrating how to
+define new "compute functions" or how to use existing ones. Arrow contains a "Compute
+API," which primarily consists of a "registry" of functions that can be invoked.
+Currently, Arrow populates a default registry with a variety of useful functions. The
+recipes provided in this section show some approaches to define a compute function as well
+as how to invoke a compute function by name, given a registry.
+
+
+.. contents::
+
+Invoke a Compute Function
+=========================
+
+When invoking a compute function, the function must exist in a function registry. In this
+recipe, we use `CallFunction()` to invoke the function with name "named_scalar_fn".
+
+.. recipe:: ../code/compute_fn.cc InvokeByCallFunction
+  :caption: Use CallFunction() to invoke a compute function by name
+  :dedent: 2
+
+.. note::
+    This method allows us to specify arguments as a vector and a custom ExecContext.
+
+If an `ExecContext` is not passed to `CallFunction` (it is null), then the default
+FunctionRegistry will be used to call the function from.

Review Comment:
   Overlap is fine -- the point of a cookbook is, in the end, to let people who just need to grab one snippet of concept or code be able to get it without reading the whole tutorial again. 
   
   But yes, for now, we can just mention that there's a FunctionRegistry, and in part 1, you're using the default one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] drin commented on a diff in pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
drin commented on code in PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#discussion_r930114607


##########
cpp/code/compute_fn.cc:
##########
@@ -0,0 +1,270 @@
+// ------------------------------
+// Dependencies
+
+// standard dependencies
+#include <stdint.h>
+#include <string>
+#include <iostream>
+
+// arrow dependencies
+#include <arrow/api.h>
+#include <arrow/compute/api.h>
+#include <arrow/compute/exec/key_hash.h>
+
+#include "common.h"
+
+
+// >> aliases for types in standard library
+using std::shared_ptr;
+using std::vector;
+
+// arrow util types
+using arrow::Result;
+using arrow::Status;
+using arrow::Datum;
+
+// arrow data types and helpers
+using arrow::UInt32Builder;
+using arrow::Int32Builder;
+
+using arrow::Array;
+using arrow::ArraySpan;
+
+
+// aliases for types used in `NamedScalarFn`
+//    |> kernel parameters
+using arrow::compute::KernelContext;
+using arrow::compute::ExecSpan;
+using arrow::compute::ExecResult;
+
+//    |> other context types
+using arrow::compute::ExecContext;
+using arrow::compute::LightContext;
+
+//    |> common types for compute functions
+using arrow::compute::FunctionRegistry;
+using arrow::compute::FunctionDoc;
+using arrow::compute::InputType;
+using arrow::compute::OutputType;
+using arrow::compute::Arity;
+
+//    |> the "kind" of function we want
+using arrow::compute::ScalarFunction;
+
+//    |> structs and classes for hashing
+using arrow::util::MiniBatch;
+using arrow::util::TempVectorStack;
+
+using arrow::compute::KeyColumnArray;
+using arrow::compute::Hashing32;
+
+//    |> functions used for hashing
+using arrow::compute::ColumnArrayFromArrayData;
+
+
+// ------------------------------
+// Structs and Classes
+
+// >> Documentation for a compute function
+/**
+ * Create a const instance of `FunctionDoc` that contains 3 attributes:
+ *  1. Short description
+ *  2. Long  description (limited to 78 characters)
+ *  3. Name of input arguments
+ */
+const FunctionDoc named_scalar_fn_doc {
+   "Unary function that calculates a hash for each row of the input"
+  ,"This function uses an xxHash-like algorithm which produces 32-bit hashes."
+  ,{ "input_array" }
+};
+
+
+// >> Kernel implementations for a compute function
+/**
+ * Create implementations that will be associated with our compute function. When a
+ * compute function is invoked, the compute API framework will delegate execution to an
+ * associated kernel that matches: (1) input argument types/shapes and (2) output argument
+ * types/shapes.
+ *
+ * Kernel implementations may be functions or may be methods (functions within a class or
+ * struct).
+ */
+struct NamedScalarFn {
+
+  /**
+   * A kernel implementation that expects a single array as input, and outputs an array of
+   * uint32 values. We write this implementation knowing what function we want to
+   * associate it with ("NamedScalarFn"), but that association is made later (see
+   * `RegisterScalarFnKernels()` below).
+   */
+  static Status
+  Exec(KernelContext *ctx, const ExecSpan &input_arg, ExecResult *out) {
+    StartRecipe("DefineAComputeKernel");
+
+    if (input_arg.num_values() != 1 or not input_arg[0].is_array()) {
+      return Status::Invalid("Unsupported argument types or shape");
+    }
+
+    // >> Initialize stack-based memory allocator with an allocator and memory size
+    TempVectorStack stack_memallocator;
+    auto            input_dtype_width = input_arg[0].type()->bit_width();
+    if (input_dtype_width > 0) {
+      ARROW_RETURN_NOT_OK(
+        stack_memallocator.Init(
+           ctx->exec_context()->memory_pool()
+          ,input_dtype_width * max_batchsize
+        )
+      );
+    }

Review Comment:
   You may be right. I haven't played around enough with a variety of data types, but I think I did this so that I could handle variable length types differently, which I don't even have here. I should have time to play around with it later today or tomorrow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] drin commented on a diff in pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
drin commented on code in PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#discussion_r931511694


##########
cpp/code/compute_fn.cc:
##########
@@ -0,0 +1,270 @@
+// ------------------------------
+// Dependencies
+
+// standard dependencies
+#include <stdint.h>
+#include <string>
+#include <iostream>
+
+// arrow dependencies
+#include <arrow/api.h>
+#include <arrow/compute/api.h>
+#include <arrow/compute/exec/key_hash.h>
+
+#include "common.h"
+
+
+// >> aliases for types in standard library
+using std::shared_ptr;
+using std::vector;
+
+// arrow util types
+using arrow::Result;
+using arrow::Status;
+using arrow::Datum;
+
+// arrow data types and helpers
+using arrow::UInt32Builder;
+using arrow::Int32Builder;
+
+using arrow::Array;
+using arrow::ArraySpan;
+
+
+// aliases for types used in `NamedScalarFn`
+//    |> kernel parameters
+using arrow::compute::KernelContext;
+using arrow::compute::ExecSpan;
+using arrow::compute::ExecResult;
+
+//    |> other context types
+using arrow::compute::ExecContext;
+using arrow::compute::LightContext;
+
+//    |> common types for compute functions
+using arrow::compute::FunctionRegistry;
+using arrow::compute::FunctionDoc;
+using arrow::compute::InputType;
+using arrow::compute::OutputType;
+using arrow::compute::Arity;
+
+//    |> the "kind" of function we want
+using arrow::compute::ScalarFunction;
+
+//    |> structs and classes for hashing
+using arrow::util::MiniBatch;
+using arrow::util::TempVectorStack;
+
+using arrow::compute::KeyColumnArray;
+using arrow::compute::Hashing32;
+
+//    |> functions used for hashing
+using arrow::compute::ColumnArrayFromArrayData;
+
+
+// ------------------------------
+// Structs and Classes
+
+// >> Documentation for a compute function
+/**
+ * Create a const instance of `FunctionDoc` that contains 3 attributes:
+ *  1. Short description
+ *  2. Long  description (limited to 78 characters)
+ *  3. Name of input arguments
+ */
+const FunctionDoc named_scalar_fn_doc {
+   "Unary function that calculates a hash for each row of the input"
+  ,"This function uses an xxHash-like algorithm which produces 32-bit hashes."
+  ,{ "input_array" }
+};
+
+
+// >> Kernel implementations for a compute function
+/**
+ * Create implementations that will be associated with our compute function. When a
+ * compute function is invoked, the compute API framework will delegate execution to an
+ * associated kernel that matches: (1) input argument types/shapes and (2) output argument
+ * types/shapes.
+ *
+ * Kernel implementations may be functions or may be methods (functions within a class or
+ * struct).
+ */
+struct NamedScalarFn {
+
+  /**
+   * A kernel implementation that expects a single array as input, and outputs an array of
+   * uint32 values. We write this implementation knowing what function we want to
+   * associate it with ("NamedScalarFn"), but that association is made later (see
+   * `RegisterScalarFnKernels()` below).
+   */
+  static Status
+  Exec(KernelContext *ctx, const ExecSpan &input_arg, ExecResult *out) {
+    StartRecipe("DefineAComputeKernel");
+
+    if (input_arg.num_values() != 1 or not input_arg[0].is_array()) {
+      return Status::Invalid("Unsupported argument types or shape");
+    }
+
+    // >> Initialize stack-based memory allocator with an allocator and memory size
+    TempVectorStack stack_memallocator;
+    auto            input_dtype_width = input_arg[0].type()->bit_width();
+    if (input_dtype_width > 0) {
+      ARROW_RETURN_NOT_OK(
+        stack_memallocator.Init(
+           ctx->exec_context()->memory_pool()
+          ,input_dtype_width * max_batchsize
+        )
+      );
+    }

Review Comment:
   I guess this recipe can just be specific to a single data type for conciseness, so I'll remove this. I'll maybe make another recipe that shows how to work with various data types.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] drin commented on pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
drin commented on PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#issuecomment-1197510115

   I am not sure how to add whole function definitions to the recipe instead of portions of the body.
   
   For example: https://github.com/apache/arrow-cookbook/blob/411d471fcecd41ee6d93c295940108c6bc76e5df/cpp/code/compute_fn.cc#L88-L90


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] ksuarez1423 commented on a diff in pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
ksuarez1423 commented on code in PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#discussion_r932465726


##########
cpp/source/compute.rst:
##########
@@ -0,0 +1,151 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+====================================
+Defining and Using Compute Functions
+====================================
+
+This section contains (or will contain) a number of recipes illustrating how to
+define new "compute functions" or how to use existing ones. Arrow contains a "Compute
+API," which primarily consists of a "registry" of functions that can be invoked.
+Currently, Arrow populates a default registry with a variety of useful functions. The
+recipes provided in this section show some approaches to define a compute function as well
+as how to invoke a compute function by name, given a registry.
+
+
+.. contents::
+
+Invoke a Compute Function
+=========================
+
+When invoking a compute function, the function must exist in a function registry. In this
+recipe, we use `CallFunction()` to invoke the function with name "named_scalar_fn".
+
+.. recipe:: ../code/compute_fn.cc InvokeByCallFunction
+  :caption: Use CallFunction() to invoke a compute function by name
+  :dedent: 2
+
+.. note::
+    This method allows us to specify arguments as a vector and a custom ExecContext.
+
+If an `ExecContext` is not passed to `CallFunction` (it is null), then the default
+FunctionRegistry will be used to call the function from.
+
+If we have defined a convenience function that wraps `CallFunction()`, then we can call
+that function instead. Various compute functions provided by Arrow have these convenience
+functions defined, such as `Add` or `Subtract`.
+
+.. recipe:: ../code/compute_fn.cc InvokeByConvenienceFunction
+  :caption: Use a convenience invocation function to call a compute function
+  :dedent: 2
+
+
+Adding a Custom Compute Function
+================================
+
+To make a custom compute function available, there are 3 primary steps:
+
+1. Define kernels for the function (these implement the actual logic)
+
+2. Associate the kernels with a function object
+
+3. Add the function object to a function registry
+
+
+Define Function Kernels
+-----------------------
+
+A kernel is a particular function that implements desired logic for a compute function.
+There are at least a couple of types of function kernels, such as initialization kernels
+and execution kernels. An initialization kernel prepares the initial state of a compute
+function, while an execution kernel executes the main processing logic of the compute
+function. The body of a function kernel may use other functions, but the kernel function
+itself is a singular instance that will be associated with the desired compute function.
+While compute functions can be associated with an initialization and execution kernel
+pair, this recipe only shows the definition of an execution kernel.
+
+The signature of an execution kernel is relatively standardized: it returns a `Status` and
+takes a context, some arguments, and a pointer to an output result. The context wraps an
+`ExecContext` and other metadata about the environment in which the kernel function should
+be executed. The input arguments are contained within an `ExecSpan` (newly added in place
+of `ExecBatch`), which holds non-owning references to argument data. Finally, the
+`ExecResult` pointed to should be set to an appropriate `ArraySpan` or `ArrayData`
+instance, depending on ownership semantics of the kernel's output.
+
+.. recipe:: ../code/compute_fn.cc DefineAComputeKernel
+  :caption: Define an example compute kernel that uses ScalarHelper from hashing.h to hash
+            input values
+  :dedent: 2
+
+This recipe shows basic validation of `input_arg` which contains a vector of input
+arguments. Then, the input `Array` is accessed from `input_arg` and a `Buffer` is
+allocated to hold output results. After the main loop is completed, the allocated `Buffer`
+is wrapped in an `ArrayData` instance and referenced by `out`.
+
+
+Associate Kernels with a Function
+---------------------------------
+
+The process of adding kernels to a compute function is easy: (1) create an appropriate
+`Function` instance--`ScalarFunction` in this case--and (2) call the `AddKernel` function.
+The more difficult part of this process is repeating for the desired data types and
+knowing how the signatures work.

Review Comment:
   For a cookbook, I don't think we need to talk about what's easy or hard -- just put forth what to do. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-cookbook] ksuarez1423 commented on a diff in pull request #227: Adding recipe for custom compute functions

Posted by GitBox <gi...@apache.org>.
ksuarez1423 commented on code in PR #227:
URL: https://github.com/apache/arrow-cookbook/pull/227#discussion_r932595572


##########
cpp/source/compute.rst:
##########
@@ -0,0 +1,151 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+====================================
+Defining and Using Compute Functions
+====================================
+
+This section contains (or will contain) a number of recipes illustrating how to
+define new "compute functions" or how to use existing ones. Arrow contains a "Compute
+API," which primarily consists of a "registry" of functions that can be invoked.
+Currently, Arrow populates a default registry with a variety of useful functions. The
+recipes provided in this section show some approaches to define a compute function as well
+as how to invoke a compute function by name, given a registry.
+
+
+.. contents::
+
+Invoke a Compute Function
+=========================
+
+When invoking a compute function, the function must exist in a function registry. In this
+recipe, we use `CallFunction()` to invoke the function with name "named_scalar_fn".
+
+.. recipe:: ../code/compute_fn.cc InvokeByCallFunction
+  :caption: Use CallFunction() to invoke a compute function by name
+  :dedent: 2
+
+.. note::
+    This method allows us to specify arguments as a vector and a custom ExecContext.
+
+If an `ExecContext` is not passed to `CallFunction` (it is null), then the default
+FunctionRegistry will be used to call the function from.
+
+If we have defined a convenience function that wraps `CallFunction()`, then we can call
+that function instead. Various compute functions provided by Arrow have these convenience
+functions defined, such as `Add` or `Subtract`.
+
+.. recipe:: ../code/compute_fn.cc InvokeByConvenienceFunction
+  :caption: Use a convenience invocation function to call a compute function
+  :dedent: 2
+
+
+Adding a Custom Compute Function
+================================
+
+To make a custom compute function available, there are 3 primary steps:
+
+1. Define kernels for the function (these implement the actual logic)
+
+2. Associate the kernels with a function object
+
+3. Add the function object to a function registry
+
+
+Define Function Kernels
+-----------------------
+
+A kernel is a particular function that implements desired logic for a compute function.

Review Comment:
   I think the answer is honestly that it needs to be in some documentation which does not exist yet. Which means maybe it shouldn't be here, and we should just note down this is vital moving forward. I'm uncertain. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org