You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/20 10:12:44 UTC

[GitHub] [arrow] cyb70289 commented on a change in pull request #10358: ARROW-2665: [C++][Python] Add index() kernel

cyb70289 commented on a change in pull request #10358:
URL: https://github.com/apache/arrow/pull/10358#discussion_r635964852



##########
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##########
@@ -218,6 +218,65 @@ Result<std::unique_ptr<KernelState>> AllInit(KernelContext*, const KernelInitArg
   return ::arrow::internal::make_unique<BooleanAllImpl>();
 }
 
+// ----------------------------------------------------------------------
+// Index implementation
+
+struct IndexImpl : public ScalarAggregator {
+  explicit IndexImpl(IndexOptions options, int64_t seen, int64_t index)
+      : options(std::move(options)), seen{seen}, index{index} {}
+
+  Status Consume(KernelContext* ctx, const ExecBatch& batch) override {
+    // short-circuit
+    if (index >= 0 || !options.value->is_valid) {
+      return Status::OK();
+    }
+
+    const auto& data = *batch[0].array();
+    seen = data.length;
+    ARROW_ASSIGN_OR_RAISE(
+        auto result, CallFunction("equal", {data, options.value}, ctx->exec_context()));

Review comment:
       Looks not very efficient as "equal" scans the whole input array (and builds a bitmap) but we only want index of the first match.
   I think we can leave it as follow up task to benchmark if visiting the array directly will give better performance.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org