You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "js8544 (via GitHub)" <gi...@apache.org> on 2023/06/28 11:48:47 UTC

[GitHub] [arrow] js8544 opened a new pull request, #35787: GH-35786: [C++] Add pairwise_diff function

js8544 opened a new pull request, #35787:
URL: https://github.com/apache/arrow/pull/35787

   <!--
   Thanks for opening a pull request!
   If this is your first pull request you can find detailed information on how 
   to contribute here:
     * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
     * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
   
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   -->
   
   ### Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   Add a `pairwise_diff` function similar to pandas' [Series.Diff](https://pandas.pydata.org/docs/reference/api/pandas.Series.diff.html), the function computes the first order difference of an array.
   ### What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   I followed [these instructions](https://github.com/apache/arrow/pull/12460#issuecomment-1057520554). The function is implemented for numerical, temporal and decimal types. Chuck arrays are not yet supported.
   
   ### Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   Yes. They are tested in vector_pairwise_test.cc and in python/pyarrow/tests/compute.py.
   
   ### Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   Yes, and docs are also updated in this PR.
   
   <!--
   If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking.
   -->
   <!-- **This PR includes breaking changes to public APIs.** -->
   
   <!--
   Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious.
   -->
   <!-- **This PR contains a "Critical Fix".** -->
   * Closes: #35786


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on PR #35787:
URL: https://github.com/apache/arrow/pull/35787#issuecomment-1598024605

   @lidavidm @pitrou @westonpace Hi guys, would any of you mind taking a look at this PR? It's been sitting around for almost a month. Thanks in advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on PR #35787:
URL: https://github.com/apache/arrow/pull/35787#issuecomment-1611595843

   @bkietz I've changed it to looping over subtract's kernels and wrapping their signature and kernel exec.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on a diff in pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #35787:
URL: https://github.com/apache/arrow/pull/35787#discussion_r1245360012


##########
cpp/src/arrow/compute/kernel.h:
##########
@@ -283,7 +283,12 @@ class ARROW_EXPORT OutputType {
   ///
   /// This function SHOULD _not_ be used to check for arity, that is to be
   /// performed one or more layers above.
-  using Resolver = Result<TypeHolder> (*)(KernelContext*, const std::vector<TypeHolder>&);
+  using Resolver =
+      std::function<Result<TypeHolder>(KernelContext*, const std::vector<TypeHolder>&)>;
+
+  // For backward compatibility
+  using ResolverFuncPtr = Result<TypeHolder> (*)(KernelContext*,
+                                                 const std::vector<TypeHolder>&);

Review Comment:
   To reuse subtract's output resolver, I would have to change Resolver to std::function because I need a capturing lambda which cannot be converted to a function pointer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35787:
URL: https://github.com/apache/arrow/pull/35787#issuecomment-1564209324

   * Closes: #35786


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on a diff in pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #35787:
URL: https://github.com/apache/arrow/pull/35787#discussion_r1245431774


##########
cpp/src/arrow/compute/kernel.h:
##########
@@ -293,6 +298,10 @@ class ARROW_EXPORT OutputType {
   OutputType(Resolver resolver)  // NOLINT implicit construction
       : kind_(COMPUTED), resolver_(std::move(resolver)) {}
 
+  /// \brief For backward compatibility
+  OutputType(ResolverFuncPtr resolver)  // NOLINT implicit construction
+      : kind_(COMPUTED), resolver_(std::move(resolver)) {}

Review Comment:
   Actually making this a templated method is more elegant. Problem solved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] bkietz merged pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "bkietz (via GitHub)" <gi...@apache.org>.
bkietz merged PR #35787:
URL: https://github.com/apache/arrow/pull/35787


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on a diff in pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #35787:
URL: https://github.com/apache/arrow/pull/35787#discussion_r1245355801


##########
cpp/src/arrow/compute/kernels/vector_pairwise.cc:
##########
@@ -0,0 +1,272 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Vector kernels for pairwise computation
+
+#include <memory>
+#include "arrow/builder.h"
+#include "arrow/compute/api_vector.h"
+#include "arrow/compute/exec.h"
+#include "arrow/compute/function.h"
+#include "arrow/compute/kernel.h"
+#include "arrow/compute/kernels/base_arithmetic_internal.h"
+#include "arrow/compute/kernels/codegen_internal.h"
+#include "arrow/compute/registry.h"
+#include "arrow/status.h"
+#include "arrow/type.h"
+#include "arrow/type_fwd.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/checked_cast.h"
+#include "arrow/util/logging.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow::compute::internal {
+
+template <typename InputType>
+Result<std::shared_ptr<DataType>> GetDiffOutputType(
+    const std::shared_ptr<arrow::DataType>& type) {
+  std::shared_ptr<DataType> output_type;
+  if constexpr (is_timestamp_type<InputType>::value) {  // timestamp -> duration with same
+                                                        // time unit
+    const auto* real_type = checked_cast<const TimestampType*>(type.get());
+    return std::make_shared<DurationType>(real_type->unit());
+  } else if constexpr (is_time_type<InputType>::value) {  // time -> duration with same
+                                                          // time unit
+    const auto* real_type = checked_cast<const InputType*>(type.get());
+    return std::make_shared<DurationType>(real_type->unit());
+  } else if constexpr (is_date_type<InputType>::value) {  // date -> duration
+    if constexpr (InputType::type_id == Type::DATE32) {   // date32 -> second
+      return duration(TimeUnit::SECOND);
+    } else {  // date64 -> millisecond
+      return duration(TimeUnit::MILLI);
+    }
+  } else if constexpr (is_decimal_type<InputType>::value) {  // decimal -> decimal with
+                                                             // precision + 1
+    const auto* real_type = checked_cast<const InputType*>(type.get());
+    if constexpr (InputType::type_id == Type::DECIMAL128) {
+      return Decimal128Type::Make(real_type->precision() + 1, real_type->scale());
+    } else {
+      return Decimal256Type::Make(real_type->precision() + 1, real_type->scale());
+    }
+  } else {
+    return type;
+  }
+}
+
+/// A generic pairwise implementation that can be reused by different Ops.
+template <typename InputType, typename OutputType, typename Op>
+Status PairwiseKernelImpl(const ArraySpan& input, int64_t periods,
+                          const std::shared_ptr<DataType>& output_type,
+                          std::shared_ptr<ArrayData>* result) {
+  typename TypeTraits<OutputType>::BuilderType builder(output_type,
+                                                       default_memory_pool());
+  RETURN_NOT_OK(builder.Reserve(input.length));
+
+  Status status;
+  auto valid_func = [&](typename GetViewType<InputType>::T left,
+                        typename GetViewType<InputType>::T right) {
+    auto result = Op::template Call<typename GetOutputType<OutputType>::T>(
+        nullptr, left, right, &status);
+    builder.UnsafeAppend(result);
+  };
+  auto null_func = [&]() { builder.UnsafeAppendNull(); };
+
+  if (periods > 0) {
+    periods = std::min(periods, input.length);
+    RETURN_NOT_OK(builder.AppendNulls(periods));
+    ArraySpan left(input);
+    left.SetSlice(periods, input.length - periods);
+    ArraySpan right(input);
+    right.SetSlice(0, input.length - periods);
+    VisitTwoArrayValuesInline<InputType, InputType>(left, right, valid_func, null_func);
+    RETURN_NOT_OK(status);
+  } else {
+    periods = std::max(periods, -input.length);
+    ArraySpan left(input);
+    left.SetSlice(0, input.length + periods);
+    ArraySpan right(input);
+    right.SetSlice(-periods, input.length + periods);
+    VisitTwoArrayValuesInline<InputType, InputType>(left, right, valid_func, null_func);
+    RETURN_NOT_OK(status);
+    RETURN_NOT_OK(builder.AppendNulls(-periods));
+  }
+  RETURN_NOT_OK(builder.FinishInternal(result));
+  return Status::OK();
+}
+
+template <typename InputType, typename OutputType, typename Op>
+Status PairwiseDiffKernel(KernelContext* ctx, const ExecSpan& batch, ExecResult* out) {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on a diff in pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #35787:
URL: https://github.com/apache/arrow/pull/35787#discussion_r1245419203


##########
cpp/src/arrow/compute/kernel.h:
##########
@@ -293,6 +298,10 @@ class ARROW_EXPORT OutputType {
   OutputType(Resolver resolver)  // NOLINT implicit construction
       : kind_(COMPUTED), resolver_(std::move(resolver)) {}
 
+  /// \brief For backward compatibility
+  OutputType(ResolverFuncPtr resolver)  // NOLINT implicit construction
+      : kind_(COMPUTED), resolver_(std::move(resolver)) {}

Review Comment:
   Nevermind, some compilation fails due to ambiguous call. I guess I'll have to change all occurences.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] bkietz commented on a diff in pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "bkietz (via GitHub)" <gi...@apache.org>.
bkietz commented on code in PR #35787:
URL: https://github.com/apache/arrow/pull/35787#discussion_r1245512682


##########
cpp/src/arrow/compute/kernels/vector_pairwise.cc:
##########
@@ -0,0 +1,182 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Vector kernels for pairwise computation
+
+#include <iostream>
+#include <memory>
+#include "arrow/builder.h"
+#include "arrow/compute/api_vector.h"
+#include "arrow/compute/exec.h"
+#include "arrow/compute/function.h"
+#include "arrow/compute/kernel.h"
+#include "arrow/compute/kernels/base_arithmetic_internal.h"
+#include "arrow/compute/kernels/codegen_internal.h"
+#include "arrow/compute/registry.h"
+#include "arrow/compute/util.h"
+#include "arrow/status.h"
+#include "arrow/type.h"
+#include "arrow/type_fwd.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bit_util.h"
+#include "arrow/util/checked_cast.h"
+#include "arrow/util/logging.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow::compute::internal {
+
+// We reuse the kernel exec function of a scalar binary function to compute pairwise
+// results. For example, for pairwise_diff, we reuse subtract's kernel exec.
+struct PairwiseState : KernelState {
+  PairwiseState(const PairwiseOptions& options, const ArrayKernelExec& scalar_exec)
+      : periods(options.periods), scalar_exec(scalar_exec) {}
+
+  int64_t periods;
+  const ArrayKernelExec& scalar_exec;
+};
+
+KernelInit GeneratePairwiseInit(const ArrayKernelExec& scalar_exec) {
+  return [&scalar_exec](KernelContext* ctx, const KernelInitArgs& args) {
+    return std::make_unique<PairwiseState>(
+        checked_cast<const PairwiseOptions&>(*args.options), scalar_exec);
+  };
+}
+
+/// A generic pairwise implementation that can be reused by different ops.
+Status PairwiseExecImpl(KernelContext* ctx, const ArraySpan& input,
+                        const ArrayKernelExec& scalar_exec, int64_t periods,
+                        ArrayData* result) {
+  auto offset = abs(periods);
+  offset = std::min(offset, input.length);
+  auto exec_length = input.length - offset;
+  // prepare bitmap
+  auto null_start = periods > 0 ? 0 : exec_length;
+  auto non_null_start = periods > 0 ? offset : 0;

Review Comment:
   Nit: could we call these two regions "computed" and "margin"? I think that'll be more obvious for the next maintainer, and it'd help if we move *all* the start/length calculation to this preamble too
   
   ```suggestion
     // We only compute values in the region where the input-with-offset overlaps
     // the original input. The margin where these do not overlap gets filled with null.
     auto margin_length = std::min(abs(periods), input.length);
     auto computed_length = input.length - margin_length;
     auto margin_start = periods > 0 ? 0 : computed_length;
     auto left_start = periods > 0 ? margin_length : 0;
     auto right_start = periods > 0 ? 0 : margin_length;
     // ...
     // prepare bitmap
   ```



##########
cpp/src/arrow/compute/kernels/vector_pairwise.cc:
##########
@@ -0,0 +1,182 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Vector kernels for pairwise computation
+
+#include <iostream>
+#include <memory>
+#include "arrow/builder.h"
+#include "arrow/compute/api_vector.h"
+#include "arrow/compute/exec.h"
+#include "arrow/compute/function.h"
+#include "arrow/compute/kernel.h"
+#include "arrow/compute/kernels/base_arithmetic_internal.h"
+#include "arrow/compute/kernels/codegen_internal.h"
+#include "arrow/compute/registry.h"
+#include "arrow/compute/util.h"
+#include "arrow/status.h"
+#include "arrow/type.h"
+#include "arrow/type_fwd.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bit_util.h"
+#include "arrow/util/checked_cast.h"
+#include "arrow/util/logging.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow::compute::internal {
+
+// We reuse the kernel exec function of a scalar binary function to compute pairwise
+// results. For example, for pairwise_diff, we reuse subtract's kernel exec.
+struct PairwiseState : KernelState {
+  PairwiseState(const PairwiseOptions& options, const ArrayKernelExec& scalar_exec)
+      : periods(options.periods), scalar_exec(scalar_exec) {}
+
+  int64_t periods;
+  const ArrayKernelExec& scalar_exec;
+};
+
+KernelInit GeneratePairwiseInit(const ArrayKernelExec& scalar_exec) {
+  return [&scalar_exec](KernelContext* ctx, const KernelInitArgs& args) {
+    return std::make_unique<PairwiseState>(
+        checked_cast<const PairwiseOptions&>(*args.options), scalar_exec);
+  };
+}
+
+/// A generic pairwise implementation that can be reused by different ops.
+Status PairwiseExecImpl(KernelContext* ctx, const ArraySpan& input,
+                        const ArrayKernelExec& scalar_exec, int64_t periods,
+                        ArrayData* result) {
+  auto offset = abs(periods);
+  offset = std::min(offset, input.length);
+  auto exec_length = input.length - offset;
+  // prepare bitmap
+  auto null_start = periods > 0 ? 0 : exec_length;
+  auto non_null_start = periods > 0 ? offset : 0;
+  bit_util::ClearBitmap(result->buffers[0]->mutable_data(), null_start, offset);
+  for (int64_t i = non_null_start; i < non_null_start + exec_length; i++) {
+    if (input.IsValid(i) && input.IsValid(i - periods)) {
+      bit_util::SetBit(result->buffers[0]->mutable_data(), i);
+    } else {
+      bit_util::ClearBit(result->buffers[0]->mutable_data(), i);
+    }
+  }
+  // prepare input span
+  ArraySpan left(input);
+  left.SetSlice(periods > 0 ? offset : 0, exec_length);
+  ArraySpan right(input);
+  right.SetSlice(periods > 0 ? 0 : offset, exec_length);
+  // prepare output span
+  ArraySpan output_span;
+  output_span.SetMembers(*result);
+  output_span.offset = periods > 0 ? offset : 0;
+  output_span.length = exec_length;
+  ExecResult output{output_span};
+  // execute scalar function
+  RETURN_NOT_OK(scalar_exec(ctx, ExecSpan({left, right}, exec_length), &output));
+
+  return Status::OK();
+}
+
+Status PairwiseExec(KernelContext* ctx, const ExecSpan& batch, ExecResult* out) {
+  const auto& state = checked_cast<const PairwiseState&>(*ctx->state());
+  auto input = batch[0].array;
+  RETURN_NOT_OK(PairwiseExecImpl(ctx, batch[0].array, state.scalar_exec, state.periods,
+                                 out->array_data_mutable()));
+  return Status::OK();
+}
+
+const FunctionDoc pairwise_diff_doc(
+    "Compute first order difference of an array",
+    ("Computes the first order difference of an array, It internally calls \n"
+     "the scalar function \"subtract\" to compute \n differences, so its \n"
+     "behavior and supported types are the same as \n"
+     "\"subtract\". The period can be specified in :struct:`PairwiseOptions`.\n"
+     "\n"
+     "Results will wrap around on integer overflow. Use function \n"
+     "\"pairwise_diff_checked\" if you want overflow to return an error."),
+    {"input"}, "PairwiseOptions");
+
+const FunctionDoc pairwise_diff_checked_doc(
+    "Compute first order difference of an array",
+    ("Computes the first order difference of an array, It internally calls \n"
+     "the scalar function \"subtract_checked\" (or the checked variant) to compute \n"
+     "differences, so its behavior and supported types are the same as \n"
+     "\"subtract_checked\". The period can be specified in :struct:`PairwiseOptions`.\n"
+     "\n"
+     "This function returns an error on overflow. For a variant that doesn't \n"
+     "fail on overflow, use function \"pairwise_diff\"."),
+    {"input"}, "PairwiseOptions");
+
+const PairwiseOptions* GetDefaultPairwiseOptions() {
+  static const auto kDefaultPairwiseOptions = PairwiseOptions::Defaults();
+  return &kDefaultPairwiseOptions;
+}
+
+struct PairwiseKernelData {
+  InputType input;
+  OutputType output;
+  ArrayKernelExec exec;
+};
+
+void RegisterPairwiseDiffKernels(std::string_view func_name,
+                                 std::string_view base_func_name, const FunctionDoc& doc,
+                                 FunctionRegistry* registry) {
+  VectorKernel base_kernel;
+  base_kernel.can_execute_chunkwise = false;
+  base_kernel.null_handling = NullHandling::COMPUTED_PREALLOCATE;
+  base_kernel.mem_allocation = MemAllocation::PREALLOCATE;
+  base_kernel.init = OptionsWrapper<PairwiseOptions>::Init;
+  auto func = std::make_shared<VectorFunction>(std::string(func_name), Arity::Unary(),
+                                               doc, GetDefaultPairwiseOptions());
+
+  auto base_func_result = registry->GetFunction(std::string(base_func_name));
+  DCHECK_OK(base_func_result.status());
+  const auto& base_func = checked_cast<const ScalarFunction&>(**base_func_result);
+  DCHECK_EQ(base_func.arity().num_args, 2);
+
+  for (const auto& base_func_kernel : base_func.kernels()) {
+    const auto& base_func_kernel_sig = base_func_kernel->signature;
+    if (base_func_kernel_sig->in_types()[0].Equals(base_func_kernel_sig->in_types()[1])) {

Review Comment:
   Nit: de-nest
   ```suggestion
       if (!base_func_kernel_sig->in_types()[0].Equals(base_func_kernel_sig->in_types()[1])) {
         continue;
       }
   ```



##########
cpp/src/arrow/compute/kernels/vector_pairwise.cc:
##########
@@ -0,0 +1,182 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Vector kernels for pairwise computation
+
+#include <iostream>
+#include <memory>
+#include "arrow/builder.h"
+#include "arrow/compute/api_vector.h"
+#include "arrow/compute/exec.h"
+#include "arrow/compute/function.h"
+#include "arrow/compute/kernel.h"
+#include "arrow/compute/kernels/base_arithmetic_internal.h"
+#include "arrow/compute/kernels/codegen_internal.h"
+#include "arrow/compute/registry.h"
+#include "arrow/compute/util.h"
+#include "arrow/status.h"
+#include "arrow/type.h"
+#include "arrow/type_fwd.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bit_util.h"
+#include "arrow/util/checked_cast.h"
+#include "arrow/util/logging.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow::compute::internal {
+
+// We reuse the kernel exec function of a scalar binary function to compute pairwise
+// results. For example, for pairwise_diff, we reuse subtract's kernel exec.
+struct PairwiseState : KernelState {
+  PairwiseState(const PairwiseOptions& options, const ArrayKernelExec& scalar_exec)
+      : periods(options.periods), scalar_exec(scalar_exec) {}
+
+  int64_t periods;
+  const ArrayKernelExec& scalar_exec;
+};
+
+KernelInit GeneratePairwiseInit(const ArrayKernelExec& scalar_exec) {
+  return [&scalar_exec](KernelContext* ctx, const KernelInitArgs& args) {
+    return std::make_unique<PairwiseState>(
+        checked_cast<const PairwiseOptions&>(*args.options), scalar_exec);
+  };
+}
+
+/// A generic pairwise implementation that can be reused by different ops.
+Status PairwiseExecImpl(KernelContext* ctx, const ArraySpan& input,
+                        const ArrayKernelExec& scalar_exec, int64_t periods,
+                        ArrayData* result) {
+  auto offset = abs(periods);
+  offset = std::min(offset, input.length);
+  auto exec_length = input.length - offset;
+  // prepare bitmap
+  auto null_start = periods > 0 ? 0 : exec_length;
+  auto non_null_start = periods > 0 ? offset : 0;
+  bit_util::ClearBitmap(result->buffers[0]->mutable_data(), null_start, offset);
+  for (int64_t i = non_null_start; i < non_null_start + exec_length; i++) {
+    if (input.IsValid(i) && input.IsValid(i - periods)) {
+      bit_util::SetBit(result->buffers[0]->mutable_data(), i);
+    } else {
+      bit_util::ClearBit(result->buffers[0]->mutable_data(), i);
+    }
+  }
+  // prepare input span
+  ArraySpan left(input);
+  left.SetSlice(periods > 0 ? offset : 0, exec_length);
+  ArraySpan right(input);
+  right.SetSlice(periods > 0 ? 0 : offset, exec_length);
+  // prepare output span
+  ArraySpan output_span;
+  output_span.SetMembers(*result);
+  output_span.offset = periods > 0 ? offset : 0;
+  output_span.length = exec_length;
+  ExecResult output{output_span};
+  // execute scalar function
+  RETURN_NOT_OK(scalar_exec(ctx, ExecSpan({left, right}, exec_length), &output));
+
+  return Status::OK();
+}
+
+Status PairwiseExec(KernelContext* ctx, const ExecSpan& batch, ExecResult* out) {
+  const auto& state = checked_cast<const PairwiseState&>(*ctx->state());
+  auto input = batch[0].array;
+  RETURN_NOT_OK(PairwiseExecImpl(ctx, batch[0].array, state.scalar_exec, state.periods,
+                                 out->array_data_mutable()));
+  return Status::OK();
+}
+
+const FunctionDoc pairwise_diff_doc(
+    "Compute first order difference of an array",
+    ("Computes the first order difference of an array, It internally calls \n"
+     "the scalar function \"subtract\" to compute \n differences, so its \n"
+     "behavior and supported types are the same as \n"
+     "\"subtract\". The period can be specified in :struct:`PairwiseOptions`.\n"
+     "\n"
+     "Results will wrap around on integer overflow. Use function \n"
+     "\"pairwise_diff_checked\" if you want overflow to return an error."),
+    {"input"}, "PairwiseOptions");
+
+const FunctionDoc pairwise_diff_checked_doc(
+    "Compute first order difference of an array",
+    ("Computes the first order difference of an array, It internally calls \n"
+     "the scalar function \"subtract_checked\" (or the checked variant) to compute \n"
+     "differences, so its behavior and supported types are the same as \n"
+     "\"subtract_checked\". The period can be specified in :struct:`PairwiseOptions`.\n"
+     "\n"
+     "This function returns an error on overflow. For a variant that doesn't \n"
+     "fail on overflow, use function \"pairwise_diff\"."),
+    {"input"}, "PairwiseOptions");
+
+const PairwiseOptions* GetDefaultPairwiseOptions() {
+  static const auto kDefaultPairwiseOptions = PairwiseOptions::Defaults();
+  return &kDefaultPairwiseOptions;
+}
+
+struct PairwiseKernelData {
+  InputType input;
+  OutputType output;
+  ArrayKernelExec exec;
+};
+
+void RegisterPairwiseDiffKernels(std::string_view func_name,
+                                 std::string_view base_func_name, const FunctionDoc& doc,
+                                 FunctionRegistry* registry) {
+  VectorKernel base_kernel;

Review Comment:
   Nit: since the base function here is "subtract", this naming could cause readers to conclude that the kernel is one of "subtract"'s kernels.
   ```suggestion
     VectorKernel kernel;
   ```



##########
docs/source/cpp/compute.rst:
##########
@@ -1847,3 +1847,25 @@ replaced, based on the remaining inputs.
   results in a corresponding null in the output.
 
   Also see: :ref:`if_else <cpp-compute-scalar-selections>`.
+
+Pairwise functions
+~~~~~~~~~~~~~~~~~~~~
+Pairwise functions are unary vector functions that perform a binary operation on 
+a pair of elements in the input array, typically on adjacent elements. The n-th
+output is computed by applying the binary operation on the n-th and (n-p)-th, 
+where p is the period. The default period is 1. The period can also be negative.

Review Comment:
   ```suggestion
   output is computed by applying the binary operation to the n-th and (n-p)-th inputs, 
   where p is the period. The default period is 1, in which case the binary
   operation is applied to adjacent pairs of inputs. The period can also be
   negative, in which case the n-th output is computed by applying the binary
   operation to the n-th and (n+abs(p))-th inputs.
   ```



##########
cpp/src/arrow/compute/kernels/vector_pairwise.cc:
##########
@@ -0,0 +1,182 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Vector kernels for pairwise computation
+
+#include <iostream>
+#include <memory>
+#include "arrow/builder.h"
+#include "arrow/compute/api_vector.h"
+#include "arrow/compute/exec.h"
+#include "arrow/compute/function.h"
+#include "arrow/compute/kernel.h"
+#include "arrow/compute/kernels/base_arithmetic_internal.h"
+#include "arrow/compute/kernels/codegen_internal.h"
+#include "arrow/compute/registry.h"
+#include "arrow/compute/util.h"
+#include "arrow/status.h"
+#include "arrow/type.h"
+#include "arrow/type_fwd.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bit_util.h"
+#include "arrow/util/checked_cast.h"
+#include "arrow/util/logging.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow::compute::internal {
+
+// We reuse the kernel exec function of a scalar binary function to compute pairwise
+// results. For example, for pairwise_diff, we reuse subtract's kernel exec.
+struct PairwiseState : KernelState {
+  PairwiseState(const PairwiseOptions& options, const ArrayKernelExec& scalar_exec)
+      : periods(options.periods), scalar_exec(scalar_exec) {}
+
+  int64_t periods;
+  const ArrayKernelExec& scalar_exec;
+};
+
+KernelInit GeneratePairwiseInit(const ArrayKernelExec& scalar_exec) {
+  return [&scalar_exec](KernelContext* ctx, const KernelInitArgs& args) {

Review Comment:
   `ArrayKernelExec` is just a pointer so I think it's better to capture by value to make the lifetime obviously okay
   ```suggestion
     return [scalar_exec](KernelContext* ctx, const KernelInitArgs& args) {
   ```
   
   ... it may be worthwhile to remove `GeneratePairwiseInit` too; it's only used once and writing the lambda inline is quite clear



##########
cpp/src/arrow/compute/kernels/vector_pairwise.cc:
##########
@@ -0,0 +1,182 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Vector kernels for pairwise computation
+
+#include <iostream>
+#include <memory>
+#include "arrow/builder.h"
+#include "arrow/compute/api_vector.h"
+#include "arrow/compute/exec.h"
+#include "arrow/compute/function.h"
+#include "arrow/compute/kernel.h"
+#include "arrow/compute/kernels/base_arithmetic_internal.h"
+#include "arrow/compute/kernels/codegen_internal.h"
+#include "arrow/compute/registry.h"
+#include "arrow/compute/util.h"
+#include "arrow/status.h"
+#include "arrow/type.h"
+#include "arrow/type_fwd.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bit_util.h"
+#include "arrow/util/checked_cast.h"
+#include "arrow/util/logging.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow::compute::internal {
+
+// We reuse the kernel exec function of a scalar binary function to compute pairwise
+// results. For example, for pairwise_diff, we reuse subtract's kernel exec.
+struct PairwiseState : KernelState {
+  PairwiseState(const PairwiseOptions& options, const ArrayKernelExec& scalar_exec)
+      : periods(options.periods), scalar_exec(scalar_exec) {}
+
+  int64_t periods;
+  const ArrayKernelExec& scalar_exec;
+};
+
+KernelInit GeneratePairwiseInit(const ArrayKernelExec& scalar_exec) {
+  return [&scalar_exec](KernelContext* ctx, const KernelInitArgs& args) {
+    return std::make_unique<PairwiseState>(
+        checked_cast<const PairwiseOptions&>(*args.options), scalar_exec);
+  };
+}
+
+/// A generic pairwise implementation that can be reused by different ops.
+Status PairwiseExecImpl(KernelContext* ctx, const ArraySpan& input,
+                        const ArrayKernelExec& scalar_exec, int64_t periods,
+                        ArrayData* result) {
+  auto offset = abs(periods);
+  offset = std::min(offset, input.length);
+  auto exec_length = input.length - offset;
+  // prepare bitmap
+  auto null_start = periods > 0 ? 0 : exec_length;
+  auto non_null_start = periods > 0 ? offset : 0;
+  bit_util::ClearBitmap(result->buffers[0]->mutable_data(), null_start, offset);
+  for (int64_t i = non_null_start; i < non_null_start + exec_length; i++) {
+    if (input.IsValid(i) && input.IsValid(i - periods)) {
+      bit_util::SetBit(result->buffers[0]->mutable_data(), i);
+    } else {
+      bit_util::ClearBit(result->buffers[0]->mutable_data(), i);
+    }
+  }
+  // prepare input span
+  ArraySpan left(input);
+  left.SetSlice(periods > 0 ? offset : 0, exec_length);
+  ArraySpan right(input);
+  right.SetSlice(periods > 0 ? 0 : offset, exec_length);
+  // prepare output span
+  ArraySpan output_span;
+  output_span.SetMembers(*result);
+  output_span.offset = periods > 0 ? offset : 0;
+  output_span.length = exec_length;
+  ExecResult output{output_span};
+  // execute scalar function
+  RETURN_NOT_OK(scalar_exec(ctx, ExecSpan({left, right}, exec_length), &output));
+
+  return Status::OK();
+}
+
+Status PairwiseExec(KernelContext* ctx, const ExecSpan& batch, ExecResult* out) {
+  const auto& state = checked_cast<const PairwiseState&>(*ctx->state());
+  auto input = batch[0].array;
+  RETURN_NOT_OK(PairwiseExecImpl(ctx, batch[0].array, state.scalar_exec, state.periods,
+                                 out->array_data_mutable()));
+  return Status::OK();
+}
+
+const FunctionDoc pairwise_diff_doc(
+    "Compute first order difference of an array",
+    ("Computes the first order difference of an array, It internally calls \n"
+     "the scalar function \"subtract\" to compute \n differences, so its \n"
+     "behavior and supported types are the same as \n"
+     "\"subtract\". The period can be specified in :struct:`PairwiseOptions`.\n"
+     "\n"
+     "Results will wrap around on integer overflow. Use function \n"
+     "\"pairwise_diff_checked\" if you want overflow to return an error."),
+    {"input"}, "PairwiseOptions");
+
+const FunctionDoc pairwise_diff_checked_doc(
+    "Compute first order difference of an array",
+    ("Computes the first order difference of an array, It internally calls \n"
+     "the scalar function \"subtract_checked\" (or the checked variant) to compute \n"
+     "differences, so its behavior and supported types are the same as \n"
+     "\"subtract_checked\". The period can be specified in :struct:`PairwiseOptions`.\n"
+     "\n"
+     "This function returns an error on overflow. For a variant that doesn't \n"
+     "fail on overflow, use function \"pairwise_diff\"."),
+    {"input"}, "PairwiseOptions");
+
+const PairwiseOptions* GetDefaultPairwiseOptions() {
+  static const auto kDefaultPairwiseOptions = PairwiseOptions::Defaults();
+  return &kDefaultPairwiseOptions;
+}
+
+struct PairwiseKernelData {
+  InputType input;
+  OutputType output;
+  ArrayKernelExec exec;
+};
+
+void RegisterPairwiseDiffKernels(std::string_view func_name,
+                                 std::string_view base_func_name, const FunctionDoc& doc,
+                                 FunctionRegistry* registry) {
+  VectorKernel base_kernel;
+  base_kernel.can_execute_chunkwise = false;
+  base_kernel.null_handling = NullHandling::COMPUTED_PREALLOCATE;
+  base_kernel.mem_allocation = MemAllocation::PREALLOCATE;
+  base_kernel.init = OptionsWrapper<PairwiseOptions>::Init;
+  auto func = std::make_shared<VectorFunction>(std::string(func_name), Arity::Unary(),
+                                               doc, GetDefaultPairwiseOptions());
+
+  auto base_func_result = registry->GetFunction(std::string(base_func_name));
+  DCHECK_OK(base_func_result.status());
+  const auto& base_func = checked_cast<const ScalarFunction&>(**base_func_result);
+  DCHECK_EQ(base_func.arity().num_args, 2);
+
+  for (const auto& base_func_kernel : base_func.kernels()) {
+    const auto& base_func_kernel_sig = base_func_kernel->signature;
+    if (base_func_kernel_sig->in_types()[0].Equals(base_func_kernel_sig->in_types()[1])) {
+      OutputType out_type(base_func_kernel_sig->out_type());
+      // Need to wrap base output resolver
+      if (out_type.kind() == OutputType::COMPUTED) {
+        const auto& base_resolver = base_func_kernel_sig->out_type().resolver();
+        auto resolver = [&base_resolver](KernelContext* ctx,
+                                         const std::vector<TypeHolder>& input_types) {
+          return base_resolver(ctx, {input_types[0], input_types[0]});
+        };
+        out_type = OutputType(resolver);

Review Comment:
   Again, I think it's better to err on the side of safety when we can and just copy:
   ```suggestion
           out_type = OutputType([base_resolver=base_func_kernel_sig->out_type().resolver()
           ](KernelContext* ctx,
                                            const std::vector<TypeHolder>& input_types) {
             return base_resolver(ctx, {input_types[0], input_types[0]});
           });
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on PR #35787:
URL: https://github.com/apache/arrow/pull/35787#issuecomment-1611056184

   > This mostly looks good to me. However, in light of the strong relationship between this Function and subtract I'd prefer to see more reuse of that function's existing logic. For example when retrieving the output type for pairwise_diff($in_type), couldn't we just as easily call subtract's DispatchExact with ($in_type, $in_type)? And instead of referencing Subtract's Ops directly, couldn't we use the kernel retrieved from subtract's DispatchExact- just passing the relevant slices of the input as the arguments to the subtract kernel? I think this would have negligible impact on performance and would greatly reduce the future maintenance burden for this function, since any new types added to subtract will then automatically be supported by pairwise_diff.
   
   Hi @bkietz, do you mean something like this:
   ```cpp
     auto subtract_func = registry->GetFunction("subtract").ValueOrDie();
     for (const auto& type : types) {
       auto kernel = subtract_func->DispatchExact({type, type}).ValueOrDie();
       // reuse kernel's exec and signature
     }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] bkietz closed pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "bkietz (via GitHub)" <gi...@apache.org>.
bkietz closed pull request #35787: GH-35786: [C++] Add pairwise_diff function
URL: https://github.com/apache/arrow/pull/35787


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on PR #35787:
URL: https://github.com/apache/arrow/pull/35787#issuecomment-1607040941

   @pitrou @bkietz Would you mind having some extra look at this PR? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on a diff in pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #35787:
URL: https://github.com/apache/arrow/pull/35787#discussion_r1245364297


##########
cpp/src/arrow/compute/kernel.h:
##########
@@ -293,6 +298,10 @@ class ARROW_EXPORT OutputType {
   OutputType(Resolver resolver)  // NOLINT implicit construction
       : kind_(COMPUTED), resolver_(std::move(resolver)) {}
 
+  /// \brief For backward compatibility
+  OutputType(ResolverFuncPtr resolver)  // NOLINT implicit construction
+      : kind_(COMPUTED), resolver_(std::move(resolver)) {}

Review Comment:
   Since the constructors are not marked explicit, many existing codes implicitly pass a function pointer as `OutputType`. To make them compile, I'll have to either add this constructor or change all occurences. I chose the easier path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] bkietz commented on pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "bkietz (via GitHub)" <gi...@apache.org>.
bkietz commented on PR #35787:
URL: https://github.com/apache/arrow/pull/35787#issuecomment-1611257128

   @js8544 yes, exactly. I think we could go a step further and loop over subtracts kernels directly, without the need to go through dispatchExact


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on a diff in pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #35787:
URL: https://github.com/apache/arrow/pull/35787#discussion_r1246045629


##########
cpp/src/arrow/compute/kernels/vector_pairwise.cc:
##########
@@ -0,0 +1,182 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Vector kernels for pairwise computation
+
+#include <iostream>
+#include <memory>
+#include "arrow/builder.h"
+#include "arrow/compute/api_vector.h"
+#include "arrow/compute/exec.h"
+#include "arrow/compute/function.h"
+#include "arrow/compute/kernel.h"
+#include "arrow/compute/kernels/base_arithmetic_internal.h"
+#include "arrow/compute/kernels/codegen_internal.h"
+#include "arrow/compute/registry.h"
+#include "arrow/compute/util.h"
+#include "arrow/status.h"
+#include "arrow/type.h"
+#include "arrow/type_fwd.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bit_util.h"
+#include "arrow/util/checked_cast.h"
+#include "arrow/util/logging.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow::compute::internal {
+
+// We reuse the kernel exec function of a scalar binary function to compute pairwise
+// results. For example, for pairwise_diff, we reuse subtract's kernel exec.
+struct PairwiseState : KernelState {
+  PairwiseState(const PairwiseOptions& options, const ArrayKernelExec& scalar_exec)
+      : periods(options.periods), scalar_exec(scalar_exec) {}
+
+  int64_t periods;
+  const ArrayKernelExec& scalar_exec;
+};
+
+KernelInit GeneratePairwiseInit(const ArrayKernelExec& scalar_exec) {
+  return [&scalar_exec](KernelContext* ctx, const KernelInitArgs& args) {

Review Comment:
   Updated to inline lambda and capturing by value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35787:
URL: https://github.com/apache/arrow/pull/35787#issuecomment-1564209370

   :warning: GitHub issue #35786 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #35787:
URL: https://github.com/apache/arrow/pull/35787#issuecomment-1619863519

   Conbench analyzed the 6 benchmark runs on commit `26c25d1b`.
   
   There were 4 benchmark results indicating a performance regression:
   
   - Commit Run on `arm64-m6g-linux-compute` at [2023-06-29 14:28:22Z](http://conbench.ursa.dev/compare/runs/33d388e65ccb4035b5d601bf4fb08d46...7e3643806db244338cd79a1694bed8d3/)
     - [params=1048576/100/min_time:1.000, source=cpp-micro, suite=arrow-compute-vector-partition-benchmark](http://conbench.ursa.dev/compare/benchmarks/0649d8efe5f2711880009ee58b9df503...0649d9553d107494800054f53279a515)
   
   - Commit Run on `ursa-thinkcentre-m75q` at [2023-07-02 13:25:33Z](http://conbench.ursa.dev/compare/runs/6a52b07cae0344528e43bc9d64f26724...8fe6806401474e1aaca474dc7b2a36be/)
     - [params=1048576/10000, source=cpp-micro, suite=arrow-acero-aggregate-benchmark](http://conbench.ursa.dev/compare/benchmarks/064a1508669a7a3080006989a567110a...064a17aff26e7f1780005e8d208aba08)
   - and 2 more (see the report linked below)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14761556435) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] bkietz commented on a diff in pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "bkietz (via GitHub)" <gi...@apache.org>.
bkietz commented on code in PR #35787:
URL: https://github.com/apache/arrow/pull/35787#discussion_r1242346402


##########
cpp/src/arrow/compute/api_vector.h:
##########
@@ -603,6 +614,24 @@ Result<Datum> CumulativeSum(
     const CumulativeSumOptions& options = CumulativeSumOptions::Defaults(),
     ExecContext* ctx = NULLPTR);
 
+/// \brief Return the first order difference of an array.
+///
+/// Computes the first order difference of an array, i.e. output[i] = input[i] - input[i -
+/// p] if i >= p, otherwise output[i] = null, where p is the period. For example, with p =
+/// 1, Diff([1, 4, 9, 10, 15]) = [null, 3, 5, 1, 5]. With p = 2, Diff([1, 4, 9, 10, 15]) =
+/// [null, null, 8, 6, 6]

Review Comment:
   Looks like you had formatting here which got munged, guessing:
   ```suggestion
   /// Computes the first order difference of an array, i.e.
   ///   output[i] = input[i] - input[i - p]  if i >= p
   ///   output[i] = null                     otherwise
   /// where p is the period. For example, with p = 1,
   ///   Diff([1, 4, 9, 10, 15]) = [null, 3, 5, 1, 5].
   /// With p = 2,
   ///   Diff([1, 4, 9, 10, 15]) = [null, null, 8, 6, 6]
   ```



##########
docs/source/cpp/compute.rst:
##########
@@ -1835,3 +1835,25 @@ replaced, based on the remaining inputs.
   results in a corresponding null in the output.
 
   Also see: :ref:`if_else <cpp-compute-scalar-selections>`.
+
+Pairwise functions
+~~~~~~~~~~~~~~~~~~~~
+Pairwise functions are unary vector functions that performs a binary operation on 

Review Comment:
   ```suggestion
   Pairwise functions are unary vector functions that perform a binary operation on 
   ```



##########
cpp/src/arrow/compute/kernels/vector_pairwise.cc:
##########
@@ -0,0 +1,272 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Vector kernels for pairwise computation
+
+#include <memory>
+#include "arrow/builder.h"
+#include "arrow/compute/api_vector.h"
+#include "arrow/compute/exec.h"
+#include "arrow/compute/function.h"
+#include "arrow/compute/kernel.h"
+#include "arrow/compute/kernels/base_arithmetic_internal.h"
+#include "arrow/compute/kernels/codegen_internal.h"
+#include "arrow/compute/registry.h"
+#include "arrow/status.h"
+#include "arrow/type.h"
+#include "arrow/type_fwd.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/checked_cast.h"
+#include "arrow/util/logging.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow::compute::internal {
+
+template <typename InputType>
+Result<std::shared_ptr<DataType>> GetDiffOutputType(
+    const std::shared_ptr<arrow::DataType>& type) {
+  std::shared_ptr<DataType> output_type;
+  if constexpr (is_timestamp_type<InputType>::value) {  // timestamp -> duration with same
+                                                        // time unit
+    const auto* real_type = checked_cast<const TimestampType*>(type.get());
+    return std::make_shared<DurationType>(real_type->unit());
+  } else if constexpr (is_time_type<InputType>::value) {  // time -> duration with same
+                                                          // time unit
+    const auto* real_type = checked_cast<const InputType*>(type.get());
+    return std::make_shared<DurationType>(real_type->unit());
+  } else if constexpr (is_date_type<InputType>::value) {  // date -> duration
+    if constexpr (InputType::type_id == Type::DATE32) {   // date32 -> second
+      return duration(TimeUnit::SECOND);
+    } else {  // date64 -> millisecond
+      return duration(TimeUnit::MILLI);
+    }
+  } else if constexpr (is_decimal_type<InputType>::value) {  // decimal -> decimal with
+                                                             // precision + 1
+    const auto* real_type = checked_cast<const InputType*>(type.get());
+    if constexpr (InputType::type_id == Type::DECIMAL128) {
+      return Decimal128Type::Make(real_type->precision() + 1, real_type->scale());
+    } else {
+      return Decimal256Type::Make(real_type->precision() + 1, real_type->scale());
+    }
+  } else {
+    return type;
+  }
+}
+
+/// A generic pairwise implementation that can be reused by different Ops.
+template <typename InputType, typename OutputType, typename Op>
+Status PairwiseKernelImpl(const ArraySpan& input, int64_t periods,
+                          const std::shared_ptr<DataType>& output_type,
+                          std::shared_ptr<ArrayData>* result) {
+  typename TypeTraits<OutputType>::BuilderType builder(output_type,
+                                                       default_memory_pool());
+  RETURN_NOT_OK(builder.Reserve(input.length));
+
+  Status status;
+  auto valid_func = [&](typename GetViewType<InputType>::T left,
+                        typename GetViewType<InputType>::T right) {
+    auto result = Op::template Call<typename GetOutputType<OutputType>::T>(
+        nullptr, left, right, &status);
+    builder.UnsafeAppend(result);
+  };
+  auto null_func = [&]() { builder.UnsafeAppendNull(); };
+
+  if (periods > 0) {
+    periods = std::min(periods, input.length);
+    RETURN_NOT_OK(builder.AppendNulls(periods));
+    ArraySpan left(input);
+    left.SetSlice(periods, input.length - periods);
+    ArraySpan right(input);
+    right.SetSlice(0, input.length - periods);
+    VisitTwoArrayValuesInline<InputType, InputType>(left, right, valid_func, null_func);
+    RETURN_NOT_OK(status);
+  } else {
+    periods = std::max(periods, -input.length);
+    ArraySpan left(input);
+    left.SetSlice(0, input.length + periods);
+    ArraySpan right(input);
+    right.SetSlice(-periods, input.length + periods);
+    VisitTwoArrayValuesInline<InputType, InputType>(left, right, valid_func, null_func);
+    RETURN_NOT_OK(status);
+    RETURN_NOT_OK(builder.AppendNulls(-periods));
+  }
+  RETURN_NOT_OK(builder.FinishInternal(result));
+  return Status::OK();
+}
+
+template <typename InputType, typename OutputType, typename Op>
+Status PairwiseDiffKernel(KernelContext* ctx, const ExecSpan& batch, ExecResult* out) {

Review Comment:
   Nit:
   ```suggestion
   Status PairwiseDiffExec(KernelContext* ctx, const ExecSpan& batch, ExecResult* out) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35787: Add pairwise diff function

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35787:
URL: https://github.com/apache/arrow/pull/35787#issuecomment-1564208909

   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on PR #35787:
URL: https://github.com/apache/arrow/pull/35787#issuecomment-1611263547

   > I think we might be able to go a step further and loop over subtract's kernels directly, without the need to list input types explicitly and go through DispatchExact.
   
   That sounds better indeed! I'll try this now. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on a diff in pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #35787:
URL: https://github.com/apache/arrow/pull/35787#discussion_r1246048838


##########
cpp/src/arrow/compute/kernels/vector_pairwise.cc:
##########
@@ -0,0 +1,182 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Vector kernels for pairwise computation
+
+#include <iostream>
+#include <memory>
+#include "arrow/builder.h"
+#include "arrow/compute/api_vector.h"
+#include "arrow/compute/exec.h"
+#include "arrow/compute/function.h"
+#include "arrow/compute/kernel.h"
+#include "arrow/compute/kernels/base_arithmetic_internal.h"
+#include "arrow/compute/kernels/codegen_internal.h"
+#include "arrow/compute/registry.h"
+#include "arrow/compute/util.h"
+#include "arrow/status.h"
+#include "arrow/type.h"
+#include "arrow/type_fwd.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bit_util.h"
+#include "arrow/util/checked_cast.h"
+#include "arrow/util/logging.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow::compute::internal {
+
+// We reuse the kernel exec function of a scalar binary function to compute pairwise
+// results. For example, for pairwise_diff, we reuse subtract's kernel exec.
+struct PairwiseState : KernelState {
+  PairwiseState(const PairwiseOptions& options, const ArrayKernelExec& scalar_exec)
+      : periods(options.periods), scalar_exec(scalar_exec) {}
+
+  int64_t periods;
+  const ArrayKernelExec& scalar_exec;
+};
+
+KernelInit GeneratePairwiseInit(const ArrayKernelExec& scalar_exec) {
+  return [&scalar_exec](KernelContext* ctx, const KernelInitArgs& args) {
+    return std::make_unique<PairwiseState>(
+        checked_cast<const PairwiseOptions&>(*args.options), scalar_exec);
+  };
+}
+
+/// A generic pairwise implementation that can be reused by different ops.
+Status PairwiseExecImpl(KernelContext* ctx, const ArraySpan& input,
+                        const ArrayKernelExec& scalar_exec, int64_t periods,
+                        ArrayData* result) {
+  auto offset = abs(periods);
+  offset = std::min(offset, input.length);
+  auto exec_length = input.length - offset;
+  // prepare bitmap
+  auto null_start = periods > 0 ? 0 : exec_length;
+  auto non_null_start = periods > 0 ? offset : 0;
+  bit_util::ClearBitmap(result->buffers[0]->mutable_data(), null_start, offset);
+  for (int64_t i = non_null_start; i < non_null_start + exec_length; i++) {
+    if (input.IsValid(i) && input.IsValid(i - periods)) {
+      bit_util::SetBit(result->buffers[0]->mutable_data(), i);
+    } else {
+      bit_util::ClearBit(result->buffers[0]->mutable_data(), i);
+    }
+  }
+  // prepare input span
+  ArraySpan left(input);
+  left.SetSlice(periods > 0 ? offset : 0, exec_length);
+  ArraySpan right(input);
+  right.SetSlice(periods > 0 ? 0 : offset, exec_length);
+  // prepare output span
+  ArraySpan output_span;
+  output_span.SetMembers(*result);
+  output_span.offset = periods > 0 ? offset : 0;
+  output_span.length = exec_length;
+  ExecResult output{output_span};
+  // execute scalar function
+  RETURN_NOT_OK(scalar_exec(ctx, ExecSpan({left, right}, exec_length), &output));
+
+  return Status::OK();
+}
+
+Status PairwiseExec(KernelContext* ctx, const ExecSpan& batch, ExecResult* out) {
+  const auto& state = checked_cast<const PairwiseState&>(*ctx->state());
+  auto input = batch[0].array;
+  RETURN_NOT_OK(PairwiseExecImpl(ctx, batch[0].array, state.scalar_exec, state.periods,
+                                 out->array_data_mutable()));
+  return Status::OK();
+}
+
+const FunctionDoc pairwise_diff_doc(
+    "Compute first order difference of an array",
+    ("Computes the first order difference of an array, It internally calls \n"
+     "the scalar function \"subtract\" to compute \n differences, so its \n"
+     "behavior and supported types are the same as \n"
+     "\"subtract\". The period can be specified in :struct:`PairwiseOptions`.\n"
+     "\n"
+     "Results will wrap around on integer overflow. Use function \n"
+     "\"pairwise_diff_checked\" if you want overflow to return an error."),
+    {"input"}, "PairwiseOptions");
+
+const FunctionDoc pairwise_diff_checked_doc(
+    "Compute first order difference of an array",
+    ("Computes the first order difference of an array, It internally calls \n"
+     "the scalar function \"subtract_checked\" (or the checked variant) to compute \n"
+     "differences, so its behavior and supported types are the same as \n"
+     "\"subtract_checked\". The period can be specified in :struct:`PairwiseOptions`.\n"
+     "\n"
+     "This function returns an error on overflow. For a variant that doesn't \n"
+     "fail on overflow, use function \"pairwise_diff\"."),
+    {"input"}, "PairwiseOptions");
+
+const PairwiseOptions* GetDefaultPairwiseOptions() {
+  static const auto kDefaultPairwiseOptions = PairwiseOptions::Defaults();
+  return &kDefaultPairwiseOptions;
+}
+
+struct PairwiseKernelData {
+  InputType input;
+  OutputType output;
+  ArrayKernelExec exec;
+};
+
+void RegisterPairwiseDiffKernels(std::string_view func_name,
+                                 std::string_view base_func_name, const FunctionDoc& doc,
+                                 FunctionRegistry* registry) {
+  VectorKernel base_kernel;
+  base_kernel.can_execute_chunkwise = false;
+  base_kernel.null_handling = NullHandling::COMPUTED_PREALLOCATE;
+  base_kernel.mem_allocation = MemAllocation::PREALLOCATE;
+  base_kernel.init = OptionsWrapper<PairwiseOptions>::Init;
+  auto func = std::make_shared<VectorFunction>(std::string(func_name), Arity::Unary(),
+                                               doc, GetDefaultPairwiseOptions());
+
+  auto base_func_result = registry->GetFunction(std::string(base_func_name));
+  DCHECK_OK(base_func_result.status());
+  const auto& base_func = checked_cast<const ScalarFunction&>(**base_func_result);
+  DCHECK_EQ(base_func.arity().num_args, 2);
+
+  for (const auto& base_func_kernel : base_func.kernels()) {
+    const auto& base_func_kernel_sig = base_func_kernel->signature;
+    if (base_func_kernel_sig->in_types()[0].Equals(base_func_kernel_sig->in_types()[1])) {
+      OutputType out_type(base_func_kernel_sig->out_type());
+      // Need to wrap base output resolver
+      if (out_type.kind() == OutputType::COMPUTED) {
+        const auto& base_resolver = base_func_kernel_sig->out_type().resolver();
+        auto resolver = [&base_resolver](KernelContext* ctx,
+                                         const std::vector<TypeHolder>& input_types) {
+          return base_resolver(ctx, {input_types[0], input_types[0]});
+        };
+        out_type = OutputType(resolver);

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] bkietz commented on pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "bkietz (via GitHub)" <gi...@apache.org>.
bkietz commented on PR #35787:
URL: https://github.com/apache/arrow/pull/35787#issuecomment-1613108543

   macOs glib&python failures are network failures while communicating with brew https://github.com/apache/arrow/actions/runs/5408688637/jobs/9835499198?pr=35787
   
   macOS c++ failure is a known issue https://github.com/apache/arrow/issues/36329
   
   ubuntu c++ failure is an s3 flake https://github.com/apache/arrow/actions/runs/5408688635/jobs/9828017525?pr=35787#step:7:4892
   
   R test failure is a known issue https://github.com/apache/arrow/issues/36346


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on a diff in pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #35787:
URL: https://github.com/apache/arrow/pull/35787#discussion_r1246046104


##########
cpp/src/arrow/compute/kernels/vector_pairwise.cc:
##########
@@ -0,0 +1,182 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Vector kernels for pairwise computation
+
+#include <iostream>
+#include <memory>
+#include "arrow/builder.h"
+#include "arrow/compute/api_vector.h"
+#include "arrow/compute/exec.h"
+#include "arrow/compute/function.h"
+#include "arrow/compute/kernel.h"
+#include "arrow/compute/kernels/base_arithmetic_internal.h"
+#include "arrow/compute/kernels/codegen_internal.h"
+#include "arrow/compute/registry.h"
+#include "arrow/compute/util.h"
+#include "arrow/status.h"
+#include "arrow/type.h"
+#include "arrow/type_fwd.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bit_util.h"
+#include "arrow/util/checked_cast.h"
+#include "arrow/util/logging.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow::compute::internal {
+
+// We reuse the kernel exec function of a scalar binary function to compute pairwise
+// results. For example, for pairwise_diff, we reuse subtract's kernel exec.
+struct PairwiseState : KernelState {
+  PairwiseState(const PairwiseOptions& options, const ArrayKernelExec& scalar_exec)
+      : periods(options.periods), scalar_exec(scalar_exec) {}
+
+  int64_t periods;
+  const ArrayKernelExec& scalar_exec;
+};
+
+KernelInit GeneratePairwiseInit(const ArrayKernelExec& scalar_exec) {
+  return [&scalar_exec](KernelContext* ctx, const KernelInitArgs& args) {
+    return std::make_unique<PairwiseState>(
+        checked_cast<const PairwiseOptions&>(*args.options), scalar_exec);
+  };
+}
+
+/// A generic pairwise implementation that can be reused by different ops.
+Status PairwiseExecImpl(KernelContext* ctx, const ArraySpan& input,
+                        const ArrayKernelExec& scalar_exec, int64_t periods,
+                        ArrayData* result) {
+  auto offset = abs(periods);
+  offset = std::min(offset, input.length);
+  auto exec_length = input.length - offset;
+  // prepare bitmap
+  auto null_start = periods > 0 ? 0 : exec_length;
+  auto non_null_start = periods > 0 ? offset : 0;
+  bit_util::ClearBitmap(result->buffers[0]->mutable_data(), null_start, offset);
+  for (int64_t i = non_null_start; i < non_null_start + exec_length; i++) {
+    if (input.IsValid(i) && input.IsValid(i - periods)) {
+      bit_util::SetBit(result->buffers[0]->mutable_data(), i);
+    } else {
+      bit_util::ClearBit(result->buffers[0]->mutable_data(), i);
+    }
+  }
+  // prepare input span
+  ArraySpan left(input);
+  left.SetSlice(periods > 0 ? offset : 0, exec_length);
+  ArraySpan right(input);
+  right.SetSlice(periods > 0 ? 0 : offset, exec_length);
+  // prepare output span
+  ArraySpan output_span;
+  output_span.SetMembers(*result);
+  output_span.offset = periods > 0 ? offset : 0;
+  output_span.length = exec_length;
+  ExecResult output{output_span};
+  // execute scalar function
+  RETURN_NOT_OK(scalar_exec(ctx, ExecSpan({left, right}, exec_length), &output));
+
+  return Status::OK();
+}
+
+Status PairwiseExec(KernelContext* ctx, const ExecSpan& batch, ExecResult* out) {
+  const auto& state = checked_cast<const PairwiseState&>(*ctx->state());
+  auto input = batch[0].array;
+  RETURN_NOT_OK(PairwiseExecImpl(ctx, batch[0].array, state.scalar_exec, state.periods,
+                                 out->array_data_mutable()));
+  return Status::OK();
+}
+
+const FunctionDoc pairwise_diff_doc(
+    "Compute first order difference of an array",
+    ("Computes the first order difference of an array, It internally calls \n"
+     "the scalar function \"subtract\" to compute \n differences, so its \n"
+     "behavior and supported types are the same as \n"
+     "\"subtract\". The period can be specified in :struct:`PairwiseOptions`.\n"
+     "\n"
+     "Results will wrap around on integer overflow. Use function \n"
+     "\"pairwise_diff_checked\" if you want overflow to return an error."),
+    {"input"}, "PairwiseOptions");
+
+const FunctionDoc pairwise_diff_checked_doc(
+    "Compute first order difference of an array",
+    ("Computes the first order difference of an array, It internally calls \n"
+     "the scalar function \"subtract_checked\" (or the checked variant) to compute \n"
+     "differences, so its behavior and supported types are the same as \n"
+     "\"subtract_checked\". The period can be specified in :struct:`PairwiseOptions`.\n"
+     "\n"
+     "This function returns an error on overflow. For a variant that doesn't \n"
+     "fail on overflow, use function \"pairwise_diff\"."),
+    {"input"}, "PairwiseOptions");
+
+const PairwiseOptions* GetDefaultPairwiseOptions() {
+  static const auto kDefaultPairwiseOptions = PairwiseOptions::Defaults();
+  return &kDefaultPairwiseOptions;
+}
+
+struct PairwiseKernelData {
+  InputType input;
+  OutputType output;
+  ArrayKernelExec exec;
+};
+
+void RegisterPairwiseDiffKernels(std::string_view func_name,
+                                 std::string_view base_func_name, const FunctionDoc& doc,
+                                 FunctionRegistry* registry) {
+  VectorKernel base_kernel;

Review Comment:
   Updated. Sorry for the confusion!



##########
cpp/src/arrow/compute/kernels/vector_pairwise.cc:
##########
@@ -0,0 +1,182 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Vector kernels for pairwise computation
+
+#include <iostream>
+#include <memory>
+#include "arrow/builder.h"
+#include "arrow/compute/api_vector.h"
+#include "arrow/compute/exec.h"
+#include "arrow/compute/function.h"
+#include "arrow/compute/kernel.h"
+#include "arrow/compute/kernels/base_arithmetic_internal.h"
+#include "arrow/compute/kernels/codegen_internal.h"
+#include "arrow/compute/registry.h"
+#include "arrow/compute/util.h"
+#include "arrow/status.h"
+#include "arrow/type.h"
+#include "arrow/type_fwd.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bit_util.h"
+#include "arrow/util/checked_cast.h"
+#include "arrow/util/logging.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow::compute::internal {
+
+// We reuse the kernel exec function of a scalar binary function to compute pairwise
+// results. For example, for pairwise_diff, we reuse subtract's kernel exec.
+struct PairwiseState : KernelState {
+  PairwiseState(const PairwiseOptions& options, const ArrayKernelExec& scalar_exec)
+      : periods(options.periods), scalar_exec(scalar_exec) {}
+
+  int64_t periods;
+  const ArrayKernelExec& scalar_exec;
+};
+
+KernelInit GeneratePairwiseInit(const ArrayKernelExec& scalar_exec) {
+  return [&scalar_exec](KernelContext* ctx, const KernelInitArgs& args) {
+    return std::make_unique<PairwiseState>(
+        checked_cast<const PairwiseOptions&>(*args.options), scalar_exec);
+  };
+}
+
+/// A generic pairwise implementation that can be reused by different ops.
+Status PairwiseExecImpl(KernelContext* ctx, const ArraySpan& input,
+                        const ArrayKernelExec& scalar_exec, int64_t periods,
+                        ArrayData* result) {
+  auto offset = abs(periods);
+  offset = std::min(offset, input.length);
+  auto exec_length = input.length - offset;
+  // prepare bitmap
+  auto null_start = periods > 0 ? 0 : exec_length;
+  auto non_null_start = periods > 0 ? offset : 0;
+  bit_util::ClearBitmap(result->buffers[0]->mutable_data(), null_start, offset);
+  for (int64_t i = non_null_start; i < non_null_start + exec_length; i++) {
+    if (input.IsValid(i) && input.IsValid(i - periods)) {
+      bit_util::SetBit(result->buffers[0]->mutable_data(), i);
+    } else {
+      bit_util::ClearBit(result->buffers[0]->mutable_data(), i);
+    }
+  }
+  // prepare input span
+  ArraySpan left(input);
+  left.SetSlice(periods > 0 ? offset : 0, exec_length);
+  ArraySpan right(input);
+  right.SetSlice(periods > 0 ? 0 : offset, exec_length);
+  // prepare output span
+  ArraySpan output_span;
+  output_span.SetMembers(*result);
+  output_span.offset = periods > 0 ? offset : 0;
+  output_span.length = exec_length;
+  ExecResult output{output_span};
+  // execute scalar function
+  RETURN_NOT_OK(scalar_exec(ctx, ExecSpan({left, right}, exec_length), &output));
+
+  return Status::OK();
+}
+
+Status PairwiseExec(KernelContext* ctx, const ExecSpan& batch, ExecResult* out) {
+  const auto& state = checked_cast<const PairwiseState&>(*ctx->state());
+  auto input = batch[0].array;
+  RETURN_NOT_OK(PairwiseExecImpl(ctx, batch[0].array, state.scalar_exec, state.periods,
+                                 out->array_data_mutable()));
+  return Status::OK();
+}
+
+const FunctionDoc pairwise_diff_doc(
+    "Compute first order difference of an array",
+    ("Computes the first order difference of an array, It internally calls \n"
+     "the scalar function \"subtract\" to compute \n differences, so its \n"
+     "behavior and supported types are the same as \n"
+     "\"subtract\". The period can be specified in :struct:`PairwiseOptions`.\n"
+     "\n"
+     "Results will wrap around on integer overflow. Use function \n"
+     "\"pairwise_diff_checked\" if you want overflow to return an error."),
+    {"input"}, "PairwiseOptions");
+
+const FunctionDoc pairwise_diff_checked_doc(
+    "Compute first order difference of an array",
+    ("Computes the first order difference of an array, It internally calls \n"
+     "the scalar function \"subtract_checked\" (or the checked variant) to compute \n"
+     "differences, so its behavior and supported types are the same as \n"
+     "\"subtract_checked\". The period can be specified in :struct:`PairwiseOptions`.\n"
+     "\n"
+     "This function returns an error on overflow. For a variant that doesn't \n"
+     "fail on overflow, use function \"pairwise_diff\"."),
+    {"input"}, "PairwiseOptions");
+
+const PairwiseOptions* GetDefaultPairwiseOptions() {
+  static const auto kDefaultPairwiseOptions = PairwiseOptions::Defaults();
+  return &kDefaultPairwiseOptions;
+}
+
+struct PairwiseKernelData {
+  InputType input;
+  OutputType output;
+  ArrayKernelExec exec;
+};
+
+void RegisterPairwiseDiffKernels(std::string_view func_name,
+                                 std::string_view base_func_name, const FunctionDoc& doc,
+                                 FunctionRegistry* registry) {
+  VectorKernel base_kernel;
+  base_kernel.can_execute_chunkwise = false;
+  base_kernel.null_handling = NullHandling::COMPUTED_PREALLOCATE;
+  base_kernel.mem_allocation = MemAllocation::PREALLOCATE;
+  base_kernel.init = OptionsWrapper<PairwiseOptions>::Init;
+  auto func = std::make_shared<VectorFunction>(std::string(func_name), Arity::Unary(),
+                                               doc, GetDefaultPairwiseOptions());
+
+  auto base_func_result = registry->GetFunction(std::string(base_func_name));
+  DCHECK_OK(base_func_result.status());
+  const auto& base_func = checked_cast<const ScalarFunction&>(**base_func_result);
+  DCHECK_EQ(base_func.arity().num_args, 2);
+
+  for (const auto& base_func_kernel : base_func.kernels()) {
+    const auto& base_func_kernel_sig = base_func_kernel->signature;
+    if (base_func_kernel_sig->in_types()[0].Equals(base_func_kernel_sig->in_types()[1])) {

Review Comment:
   Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on a diff in pull request #35787: GH-35786: [C++] Add pairwise_diff function

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on code in PR #35787:
URL: https://github.com/apache/arrow/pull/35787#discussion_r1246045952


##########
cpp/src/arrow/compute/kernels/vector_pairwise.cc:
##########
@@ -0,0 +1,182 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// Vector kernels for pairwise computation
+
+#include <iostream>
+#include <memory>
+#include "arrow/builder.h"
+#include "arrow/compute/api_vector.h"
+#include "arrow/compute/exec.h"
+#include "arrow/compute/function.h"
+#include "arrow/compute/kernel.h"
+#include "arrow/compute/kernels/base_arithmetic_internal.h"
+#include "arrow/compute/kernels/codegen_internal.h"
+#include "arrow/compute/registry.h"
+#include "arrow/compute/util.h"
+#include "arrow/status.h"
+#include "arrow/type.h"
+#include "arrow/type_fwd.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/bit_util.h"
+#include "arrow/util/checked_cast.h"
+#include "arrow/util/logging.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow::compute::internal {
+
+// We reuse the kernel exec function of a scalar binary function to compute pairwise
+// results. For example, for pairwise_diff, we reuse subtract's kernel exec.
+struct PairwiseState : KernelState {
+  PairwiseState(const PairwiseOptions& options, const ArrayKernelExec& scalar_exec)
+      : periods(options.periods), scalar_exec(scalar_exec) {}
+
+  int64_t periods;
+  const ArrayKernelExec& scalar_exec;
+};
+
+KernelInit GeneratePairwiseInit(const ArrayKernelExec& scalar_exec) {
+  return [&scalar_exec](KernelContext* ctx, const KernelInitArgs& args) {
+    return std::make_unique<PairwiseState>(
+        checked_cast<const PairwiseOptions&>(*args.options), scalar_exec);
+  };
+}
+
+/// A generic pairwise implementation that can be reused by different ops.
+Status PairwiseExecImpl(KernelContext* ctx, const ArraySpan& input,
+                        const ArrayKernelExec& scalar_exec, int64_t periods,
+                        ArrayData* result) {
+  auto offset = abs(periods);
+  offset = std::min(offset, input.length);
+  auto exec_length = input.length - offset;
+  // prepare bitmap
+  auto null_start = periods > 0 ? 0 : exec_length;
+  auto non_null_start = periods > 0 ? offset : 0;

Review Comment:
   Updated. It's much clearer now. thanks for the suggestion!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org