You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/05 07:48:35 UTC

[GitHub] [arrow] 9prady9 opened a new pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

9prady9 opened a new pull request #12080:
URL: https://github.com/apache/arrow/pull/12080


   Don't set bitmap buffer for scalar kernels when all values of inputs are valid


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] 9prady9 commented on a change in pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
9prady9 commented on a change in pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#discussion_r779375284



##########
File path: cpp/src/arrow/compute/exec.cc
##########
@@ -370,12 +370,15 @@ class NullPropagator {
  public:
   NullPropagator(KernelContext* ctx, const ExecBatch& batch, ArrayData* output)
       : ctx_(ctx), batch_(batch), output_(output) {
+    bool are_all_ins_valids = false;
     for (const Datum& datum : batch_.values) {
       auto null_generalization = NullGeneralization::Get(datum);
 
       if (null_generalization == NullGeneralization::ALL_NULL) {
         is_all_null_ = true;
       }
+      are_all_ins_valids =
+          are_all_ins_valids && (null_generalization == NullGeneralization::ALL_VALID);

Review comment:
       yes, I have corrected that locally already. In fact, I have reverted this change. Now. I am trying to set `validity_preallocate_` inside `SetupPreallocation` by passing `vector<Datum> args` to it.
   
   With the new change, I am hitting some other issue - working on that currently




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] 9prady9 commented on a change in pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
9prady9 commented on a change in pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#discussion_r813914506



##########
File path: cpp/src/arrow/compute/exec.cc
##########
@@ -772,14 +769,27 @@ class ScalarExecutor : public KernelExecutorImpl<ScalarKernel> {
     return Status::OK();
   }
 
-  Status SetupPreallocation(int64_t total_length) {
+  Status SetupPreallocation(int64_t total_length, const std::vector<Datum>& args) {
     output_num_buffers_ = static_cast<int>(output_descr_.type->layout().buffers.size());
-
-    // Decide if we need to preallocate memory for this kernel
-    validity_preallocated_ =
-        (kernel_->null_handling != NullHandling::COMPUTED_NO_PREALLOCATE &&
-         kernel_->null_handling != NullHandling::OUTPUT_NOT_NULL &&
-         output_descr_.type->id() != Type::NA);
+    auto out_type_id = output_descr_.type->id();
+    // Default to no validity pre-allocation for following cases:
+    // - Output Array is NullArray
+    // - kernel_->null_handling is COMPUTED_NO_PREALLOCATE or OUTPUT_NOT_NULL
+    validity_preallocated_ = false;
+    if (out_type_id != Type::NA) {
+      if (kernel_->null_handling == NullHandling::COMPUTED_PREALLOCATE) {
+        // Override the flag if kernel asks for pre-allocation
+        validity_preallocated_ = true;
+      } else if (kernel_->null_handling == NullHandling::INTERSECTION) {
+        bool are_all_inputs_valid = true;
+        for (auto& arg : args) {
+          auto null_gen = NullGeneralization::Get(arg) == NullGeneralization::ALL_VALID;
+          are_all_inputs_valid = are_all_inputs_valid && null_gen;
+        }
+        validity_preallocated_ =
+            !(are_all_inputs_valid || kernel_->can_write_into_slices);

Review comment:
       Good catch, I forgot to post the question about this one - some how without this some kernels are failing tests, I will provide errors soon. I didn't expect this to cause such failures because the name doesn't indicate anything related to allocation. Just added it temporarily to see if it resolves anything.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#issuecomment-1049923761


   The kernels infrastructure can execute the kernel over smaller chunks of the input. In this case, either the kernel needs to be able to write into a slice of a preallocated region, or else the output has to be chunked.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#discussion_r815918542



##########
File path: cpp/src/arrow/compute/exec.cc
##########
@@ -334,23 +334,20 @@ struct NullGeneralization {
     if (datum.type()->id() == Type::NA) {
       return ALL_NULL;
     }
-

Review comment:
       Note one might want to add:
   ```c++
     if (!HasValidityBitmap(datum.type()->id()))  {
       return ALL_VALID;
     }
   ```
   
   This will only have an effect for union types, so not very important though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#issuecomment-1055437057


   Benchmark runs are scheduled for baseline = abb6067a16ef7982a29020e6e5461b699631ad51 and contender = a6effbac98a5e3de675d256524f9c91c360f0cf8. a6effbac98a5e3de675d256524f9c91c360f0cf8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/d447eb924d664e9ca5f51f5bbd3f449f...21de6611681d4acc8f2bddafb8f020fd/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/7da858a8cafc4a20a4e7b6ad471d6761...52edebcb2b894e979fc442c77ad4da33/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/d4bdd11506bf49c8b02c65593524d630...4d8cc21113fc4245ac8c66b2042edbe8/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/80528155374346efa328e45dd19c3902...6e775d37c1894e3d839961796135777c/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on a change in pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
edponce commented on a change in pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#discussion_r779376706



##########
File path: cpp/src/arrow/compute/exec.cc
##########
@@ -370,12 +370,15 @@ class NullPropagator {
  public:
   NullPropagator(KernelContext* ctx, const ExecBatch& batch, ArrayData* output)
       : ctx_(ctx), batch_(batch), output_(output) {
+    bool are_all_ins_valids = false;

Review comment:
       True, so `are_all_inputs_valid`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] 9prady9 commented on a change in pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
9prady9 commented on a change in pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#discussion_r813557396



##########
File path: cpp/src/arrow/compute/kernels/scalar_if_else_test.cc
##########
@@ -3424,5 +3424,31 @@ TEST(TestChooseKernel, Errors) {
                    {ArrayFromJSON(int64(), "[-1]"), ArrayFromJSON(int32(), "[0]")}));
 }
 
+TYPED_TEST(TestIfElsePrimitive, DISABLED_Debug) {

Review comment:
       I will remove this before my final push.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] 9prady9 commented on pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
9prady9 commented on pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#issuecomment-1005454957


   Ideally, I think `validity_preallocated_` should be set to `false` inside `ScalarExecutor::SetupPreallocation`. But I am not sure how to fetch null count details inside ScalarExecutor. Any alternative suggestions or help is appreciated. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#issuecomment-1055437057


   Benchmark runs are scheduled for baseline = abb6067a16ef7982a29020e6e5461b699631ad51 and contender = a6effbac98a5e3de675d256524f9c91c360f0cf8. a6effbac98a5e3de675d256524f9c91c360f0cf8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/d447eb924d664e9ca5f51f5bbd3f449f...21de6611681d4acc8f2bddafb8f020fd/)
   [Finished :arrow_down:0.63% :arrow_up:0.42%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/7da858a8cafc4a20a4e7b6ad471d6761...52edebcb2b894e979fc442c77ad4da33/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/d4bdd11506bf49c8b02c65593524d630...4d8cc21113fc4245ac8c66b2042edbe8/)
   [Finished :arrow_down:0.34% :arrow_up:0.43%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/80528155374346efa328e45dd19c3902...6e775d37c1894e3d839961796135777c/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#discussion_r815884215



##########
File path: cpp/src/arrow/compute/exec.cc
##########
@@ -772,14 +769,26 @@ class ScalarExecutor : public KernelExecutorImpl<ScalarKernel> {
     return Status::OK();
   }
 
-  Status SetupPreallocation(int64_t total_length) {
+  Status SetupPreallocation(int64_t total_length, const std::vector<Datum>& args) {
     output_num_buffers_ = static_cast<int>(output_descr_.type->layout().buffers.size());
-
-    // Decide if we need to preallocate memory for this kernel
-    validity_preallocated_ =
-        (kernel_->null_handling != NullHandling::COMPUTED_NO_PREALLOCATE &&
-         kernel_->null_handling != NullHandling::OUTPUT_NOT_NULL &&
-         output_descr_.type->id() != Type::NA);
+    auto out_type_id = output_descr_.type->id();
+    // Default to no validity pre-allocation for following cases:
+    // - Output Array is NullArray
+    // - kernel_->null_handling is COMPUTED_NO_PREALLOCATE or OUTPUT_NOT_NULL
+    validity_preallocated_ = false;
+    if (out_type_id != Type::NA) {
+      if (kernel_->null_handling == NullHandling::COMPUTED_PREALLOCATE) {
+        // Override the flag if kernel asks for pre-allocation
+        validity_preallocated_ = true;
+      } else if (kernel_->null_handling == NullHandling::INTERSECTION) {
+        bool are_all_inputs_valid = true;
+        for (auto& arg : args) {

Review comment:
       nit: const auto&?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] 9prady9 commented on pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
9prady9 commented on pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#issuecomment-1051803064


   `C++ /AMD64 Windows Ming=GW 64 C++` job failed due to some sort io stream issue it seems


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on a change in pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
edponce commented on a change in pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#discussion_r779356911



##########
File path: cpp/src/arrow/compute/exec.cc
##########
@@ -370,12 +370,15 @@ class NullPropagator {
  public:
   NullPropagator(KernelContext* ctx, const ExecBatch& batch, ArrayData* output)
       : ctx_(ctx), batch_(batch), output_(output) {
+    bool are_all_ins_valids = false;

Review comment:
       Maybe `are_all_ins_valids` --> `are_all_values_valid`

##########
File path: cpp/src/arrow/compute/exec.cc
##########
@@ -370,12 +370,15 @@ class NullPropagator {
  public:
   NullPropagator(KernelContext* ctx, const ExecBatch& batch, ArrayData* output)
       : ctx_(ctx), batch_(batch), output_(output) {
+    bool are_all_ins_valids = false;
     for (const Datum& datum : batch_.values) {
       auto null_generalization = NullGeneralization::Get(datum);
 
       if (null_generalization == NullGeneralization::ALL_NULL) {
         is_all_null_ = true;
       }
+      are_all_ins_valids =
+          are_all_ins_valids && (null_generalization == NullGeneralization::ALL_VALID);

Review comment:
       This logic is not correct bc `are_all_ins_valids` is initialized with `false` and performing and AND will always result in `false`.
   
   Also, since `null_generalization` is already checked against "not ALL_VALID", you can invert the logic to reduce the number of operations and restructure the checks.
   ```c++
   // This would also reset the bitmap if there are no buffer values, which AFAIK should not occur and if it occurs, then resetting the null bitmap also makes sense.
   bool are_all_ins_valids = true;
   for (...) {
     if (null_generalization != NullGeneralization::ALL_VALID) {
       are_all_ins_valids = false;
       if (null_generalization == NullGeneralization::ALL_NULL) {
         is_all_null_ = true;
       }
       if (datum.kind() == Datum::ARRAY) {
         arrays_with_nulls_.push_back(datum.array().get());
       }
     }
     ...
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] 9prady9 edited a comment on pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
9prady9 edited a comment on pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#issuecomment-1005454957


   Ideally, I think `validity_preallocated_` should be set to `false` inside `ScalarExecutor::SetupPreallocation`. But I am not sure how to fetch null count details inside ScalarExecutor. Any alternative suggestions or help is appreciated. Thank you! @edponce @lidavidm @pitrou


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on a change in pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
edponce commented on a change in pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#discussion_r779366434



##########
File path: cpp/src/arrow/compute/exec.cc
##########
@@ -370,12 +370,15 @@ class NullPropagator {
  public:
   NullPropagator(KernelContext* ctx, const ExecBatch& batch, ArrayData* output)
       : ctx_(ctx), batch_(batch), output_(output) {
+    bool are_all_ins_valids = false;
     for (const Datum& datum : batch_.values) {
       auto null_generalization = NullGeneralization::Get(datum);
 
       if (null_generalization == NullGeneralization::ALL_NULL) {
         is_all_null_ = true;
       }
+      are_all_ins_valids =
+          are_all_ins_valids && (null_generalization == NullGeneralization::ALL_VALID);

Review comment:
       This logic is not correct bc `are_all_ins_valids` is initialized with `false` and performing an AND will always result in `false`.
   
   Also, since `null_generalization` is already checked against "not ALL_VALID", you can invert the logic to reduce the number of operations and restructure the checks.
   ```c++
   // This would also reset the bitmap if there are no buffer values, which AFAIK should not occur and if it occurs, then resetting the null bitmap also makes sense.
   bool are_all_ins_valids = true;
   for (...) {
     if (null_generalization != NullGeneralization::ALL_VALID) {
       are_all_ins_valids = false;
       if (null_generalization == NullGeneralization::ALL_NULL) {
         is_all_null_ = true;
       }
       if (datum.kind() == Datum::ARRAY) {
         arrays_with_nulls_.push_back(datum.array().get());
       }
     }
     ...
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#discussion_r813887878



##########
File path: cpp/src/arrow/compute/exec.cc
##########
@@ -772,14 +769,27 @@ class ScalarExecutor : public KernelExecutorImpl<ScalarKernel> {
     return Status::OK();
   }
 
-  Status SetupPreallocation(int64_t total_length) {
+  Status SetupPreallocation(int64_t total_length, const std::vector<Datum>& args) {
     output_num_buffers_ = static_cast<int>(output_descr_.type->layout().buffers.size());
-
-    // Decide if we need to preallocate memory for this kernel
-    validity_preallocated_ =
-        (kernel_->null_handling != NullHandling::COMPUTED_NO_PREALLOCATE &&
-         kernel_->null_handling != NullHandling::OUTPUT_NOT_NULL &&
-         output_descr_.type->id() != Type::NA);
+    auto out_type_id = output_descr_.type->id();
+    // Default to no validity pre-allocation for following cases:
+    // - Output Array is NullArray
+    // - kernel_->null_handling is COMPUTED_NO_PREALLOCATE or OUTPUT_NOT_NULL
+    validity_preallocated_ = false;
+    if (out_type_id != Type::NA) {
+      if (kernel_->null_handling == NullHandling::COMPUTED_PREALLOCATE) {
+        // Override the flag if kernel asks for pre-allocation
+        validity_preallocated_ = true;
+      } else if (kernel_->null_handling == NullHandling::INTERSECTION) {
+        bool are_all_inputs_valid = true;
+        for (auto& arg : args) {
+          auto null_gen = NullGeneralization::Get(arg) == NullGeneralization::ALL_VALID;
+          are_all_inputs_valid = are_all_inputs_valid && null_gen;
+        }
+        validity_preallocated_ =
+            !(are_all_inputs_valid || kernel_->can_write_into_slices);

Review comment:
       Why does `can_write_into_slices` disable preallocation?

##########
File path: cpp/src/arrow/compute/exec.cc
##########
@@ -792,7 +802,7 @@ class ScalarExecutor : public KernelExecutorImpl<ScalarKernel> {
     preallocate_contiguous_ =
         (exec_context()->preallocate_contiguous() && kernel_->can_write_into_slices &&
          validity_preallocated_ && !is_nested(output_descr_.type->id()) &&
-         !is_dictionary(output_descr_.type->id()) &&
+         !is_dictionary(out_type_id) &&

Review comment:
       nit: if we're going to update this here, also update the usage of `output_descr_.type->id()` on the previous line?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#issuecomment-1049878605


   Reading the test case, the context is requesting that the kernel be executed in smaller chunks. And inspecting the value in a debugger, the result is now a ChunkedArray. So it seems before, the output was preallocated and now it is not.
   
   The reason relates to my question above - why does `can_write_into_slices` disable preallocation? If `true`, what is expected to happen is that the output will be allocated all at once, and the kernel will be executed on small slices of the input, writing into slices of the output. But that breaks down if we prevent preallocation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] 9prady9 commented on pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
9prady9 commented on pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#issuecomment-1049900659


    @lidavidm  An Array as input , irrespective of the size, can result in output type of chucked array ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm closed pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
lidavidm closed pull request #12080:
URL: https://github.com/apache/arrow/pull/12080


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] 9prady9 commented on pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
9prady9 commented on pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#issuecomment-1049507634


   I have been able to successfully avoid bitmap allocation for the case where null handling is `INTERSECTION` and all input Datums have all valid values. One simple test case has been added for cast kernel, I would think adding such tests for all compute functions would be prudent.
   
   I did however encounter one unexpected error when in an existing unit test `TestCallScalarFunction.PreallocationCases`. At this moment, I am not sure why a modified context would change Datum kind. Please see the error message of test failure 
   ```
   /home/pradeep/gitroot/ArrowWorkspace/worktrees/15118/cpp/src/arrow/datum.cc:71:  
   Check failed: (Datum::ARRAY) == (this->kind()) 
   ```
   This is triggered from line 789 of `exec_test.cc`. There is an identical test just few lines above where context argument isn't passed along and that runs fine.
   
   Any hints at what I am missing here is appreciated. Thank you!
   
   Command to run the specific test to see the problem on my branch is given below:
   
   ```
   ./debug/arrow-compute-internals-test --gtest_filter=TestCallScalarFunction.PreallocationCases
   ```
   assuming build type is `debug`
   
   @lidavidm  @edponce 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#issuecomment-1054290742


   Thanks a lot @9prady9, this is a nice improvement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] 9prady9 commented on pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
9prady9 commented on pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#issuecomment-1054980273


   Rebased from latest master and added the improvement @pitrou suggested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on a change in pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
edponce commented on a change in pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#discussion_r779376706



##########
File path: cpp/src/arrow/compute/exec.cc
##########
@@ -370,12 +370,15 @@ class NullPropagator {
  public:
   NullPropagator(KernelContext* ctx, const ExecBatch& batch, ArrayData* output)
       : ctx_(ctx), batch_(batch), output_(output) {
+    bool are_all_ins_valids = false;

Review comment:
       True, so `are_all_inputs_valid` is more readable and grammatically correct.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] 9prady9 commented on a change in pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
9prady9 commented on a change in pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#discussion_r779374250



##########
File path: cpp/src/arrow/compute/exec.cc
##########
@@ -370,12 +370,15 @@ class NullPropagator {
  public:
   NullPropagator(KernelContext* ctx, const ExecBatch& batch, ArrayData* output)
       : ctx_(ctx), batch_(batch), output_(output) {
+    bool are_all_ins_valids = false;

Review comment:
       I would think it will be a bit ambiguous, since it is checking if all inputs have all valid values. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#discussion_r815917257



##########
File path: cpp/src/arrow/compute/exec.cc
##########
@@ -334,23 +334,20 @@ struct NullGeneralization {
     if (datum.type()->id() == Type::NA) {

Review comment:
       Note this might be replaced by `!HasValidityBitmap(datum.type()->id())`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#issuecomment-1055437057


   Benchmark runs are scheduled for baseline = abb6067a16ef7982a29020e6e5461b699631ad51 and contender = a6effbac98a5e3de675d256524f9c91c360f0cf8. a6effbac98a5e3de675d256524f9c91c360f0cf8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/d447eb924d664e9ca5f51f5bbd3f449f...21de6611681d4acc8f2bddafb8f020fd/)
   [Finished :arrow_down:0.63% :arrow_up:0.42%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/7da858a8cafc4a20a4e7b6ad471d6761...52edebcb2b894e979fc442c77ad4da33/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/d4bdd11506bf49c8b02c65593524d630...4d8cc21113fc4245ac8c66b2042edbe8/)
   [Finished :arrow_down:0.34% :arrow_up:0.43%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/80528155374346efa328e45dd19c3902...6e775d37c1894e3d839961796135777c/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#issuecomment-1055437057


   Benchmark runs are scheduled for baseline = abb6067a16ef7982a29020e6e5461b699631ad51 and contender = a6effbac98a5e3de675d256524f9c91c360f0cf8. a6effbac98a5e3de675d256524f9c91c360f0cf8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/d447eb924d664e9ca5f51f5bbd3f449f...21de6611681d4acc8f2bddafb8f020fd/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/7da858a8cafc4a20a4e7b6ad471d6761...52edebcb2b894e979fc442c77ad4da33/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/d4bdd11506bf49c8b02c65593524d630...4d8cc21113fc4245ac8c66b2042edbe8/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/80528155374346efa328e45dd19c3902...6e775d37c1894e3d839961796135777c/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#issuecomment-1005691660


   Right, you don't get to see the actual data during preallocation. However it's called from here: https://github.com/apache/arrow/blob/ecacff0f4e1059887ea6ff78254c4ae9ff3ea102/cpp/src/arrow/compute/exec.cc#L732
   
   It seems you could pass through the actual data in that case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] 9prady9 commented on a change in pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
9prady9 commented on a change in pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#discussion_r779375284



##########
File path: cpp/src/arrow/compute/exec.cc
##########
@@ -370,12 +370,15 @@ class NullPropagator {
  public:
   NullPropagator(KernelContext* ctx, const ExecBatch& batch, ArrayData* output)
       : ctx_(ctx), batch_(batch), output_(output) {
+    bool are_all_ins_valids = false;
     for (const Datum& datum : batch_.values) {
       auto null_generalization = NullGeneralization::Get(datum);
 
       if (null_generalization == NullGeneralization::ALL_NULL) {
         is_all_null_ = true;
       }
+      are_all_ins_valids =
+          are_all_ins_valids && (null_generalization == NullGeneralization::ALL_VALID);

Review comment:
       yes, I have corrected that locally already. In fact, I have reverted this change. Now. I am trying to set `validity_preallocate_` inside `SetupPreallocation` by passing `vector<Datum> args` to it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12080: ARROW-15118: [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12080:
URL: https://github.com/apache/arrow/pull/12080#issuecomment-1005454339


   https://issues.apache.org/jira/browse/ARROW-15118


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org