You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/15 19:38:49 UTC

[GitHub] [arrow] arw2019 opened a new pull request #8474: ARROW-10301: [C++] Implement "all" reduction kernel for boolean data

arw2019 opened a new pull request #8474:
URL: https://github.com/apache/arrow/pull/8474


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8474: ARROW-10301: [C++][Compute] Implement "all" reduction kernel for boolean data

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #8474:
URL: https://github.com/apache/arrow/pull/8474#discussion_r522953360



##########
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##########
@@ -151,6 +151,45 @@ std::unique_ptr<KernelState> MinMaxInit(KernelContext* ctx, const KernelInitArgs
   return visitor.Create();
 }
 
+// ----------------------------------------------------------------------
+// All implementation
+
+struct BooleanAllImpl : public ScalarAggregator {
+  void Consume(KernelContext*, const ExecBatch& batch) override {
+    // short-circuit if seen a false already
+    if (this->all == false) {
+      return;
+    }
+
+    const auto& data = *batch[0].array();
+    arrow::internal::OptionalBinaryBitBlockCounter counter(
+        data.buffers[1], data.offset, data.buffers[0], data.offset, data.length);
+    int64_t position = 0;
+    while (position < data.length) {
+      const auto block = counter.NextOrNotBlock();

Review comment:
       Just wondering but should it in theory not be a "AndNot" instead of "OrNot" ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8474: ARROW-10301: [C++][Compute] Implement "all" reduction kernel for boolean data

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8474:
URL: https://github.com/apache/arrow/pull/8474#discussion_r530200367



##########
File path: cpp/src/arrow/util/bit_block_counter.h
##########
@@ -244,6 +244,30 @@ class ARROW_EXPORT OptionalBinaryBitBlockCounter {
     }
   }
 
+  BitBlockCount NextOrNotBlock() {

Review comment:
       I think it's ok for now, we can revisit later if desired.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #8474: ARROW-10301: [C++][Compute] Implement "all" reduction kernel for boolean data

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #8474:
URL: https://github.com/apache/arrow/pull/8474#issuecomment-733393262


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8474?src=pr&el=h1) Report
   > Merging [#8474](https://codecov.io/gh/apache/arrow/pull/8474?src=pr&el=desc) (bd10727) into [master](https://codecov.io/gh/apache/arrow/commit/42564e49b2ae69f9a2e1c192ffad710853a1a8c5?el=desc) (42564e4) will **decrease** coverage by `0.21%`.
   > The diff coverage is `40.48%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8474/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8474?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8474      +/-   ##
   ==========================================
   - Coverage   84.40%   84.18%   -0.22%     
   ==========================================
     Files         186      186              
     Lines       45241    45445     +204     
   ==========================================
   + Hits        38184    38260      +76     
   - Misses       7057     7185     +128     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8474?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/benchmarks/src/bin/tpch.rs](https://codecov.io/gh/apache/arrow/pull/8474/diff?src=pr&el=tree#diff-cnVzdC9iZW5jaG1hcmtzL3NyYy9iaW4vdHBjaC5ycw==) | `0.00% <0.00%> (ø)` | |
   | [rust/datafusion/src/physical\_plan/expressions.rs](https://codecov.io/gh/apache/arrow/pull/8474/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zLnJz) | `83.42% <ø> (ø)` | |
   | [rust/datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow/pull/8474/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3BsYW5uZXIucnM=) | `71.81% <0.00%> (-5.64%)` | :arrow_down: |
   | [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/8474/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `58.89% <4.83%> (-14.90%)` | :arrow_down: |
   | [rust/datafusion/src/logical\_plan/expr.rs](https://codecov.io/gh/apache/arrow/pull/8474/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZXhwci5ycw==) | `70.68% <48.78%> (-8.27%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/comparison.rs](https://codecov.io/gh/apache/arrow/pull/8474/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbXBhcmlzb24ucnM=) | `96.27% <96.42%> (+0.49%)` | :arrow_up: |
   | [rust/datafusion/src/optimizer/filter\_push\_down.rs](https://codecov.io/gh/apache/arrow/pull/8474/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvZmlsdGVyX3B1c2hfZG93bi5ycw==) | `99.11% <100.00%> (+<0.01%)` | :arrow_up: |
   | [...t/datafusion/src/optimizer/projection\_push\_down.rs](https://codecov.io/gh/apache/arrow/pull/8474/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvcHJvamVjdGlvbl9wdXNoX2Rvd24ucnM=) | `87.84% <100.00%> (-0.15%)` | :arrow_down: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/8474/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `92.99% <0.00%> (-0.18%)` | :arrow_down: |
   | [rust/arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow/pull/8474/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL21vZC5ycw==) | `70.61% <0.00%> (+1.52%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8474?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8474?src=pr&el=footer). Last update [b699541...5ef9cff](https://codecov.io/gh/apache/arrow/pull/8474?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8474: ARROW-10301: [C++] Implement "all" reduction kernel for boolean data

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8474:
URL: https://github.com/apache/arrow/pull/8474#issuecomment-709553991


   https://issues.apache.org/jira/browse/ARROW-10301


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 edited a comment on pull request #8474: ARROW-10301: [C++][Compute] Implement "all" reduction kernel for boolean data

Posted by GitBox <gi...@apache.org>.
arw2019 edited a comment on pull request #8474:
URL: https://github.com/apache/arrow/pull/8474#issuecomment-720653983


   Ready for re-review. CI all green


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #8474: ARROW-10301: [C++] Implement "all" reduction kernel for boolean data

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #8474:
URL: https://github.com/apache/arrow/pull/8474#issuecomment-717670843


   @arw2019 it looks like quite a few CI failures? Is this still WIP?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on pull request #8474: ARROW-10301: [C++][Compute] Implement "all" reduction kernel for boolean data

Posted by GitBox <gi...@apache.org>.
arw2019 commented on pull request #8474:
URL: https://github.com/apache/arrow/pull/8474#issuecomment-733984955


   Thanks @pitrou @jorisvandenbossche for reviewing!!!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on a change in pull request #8474: ARROW-10301: [C++][Compute] Implement "all" reduction kernel for boolean data

Posted by GitBox <gi...@apache.org>.
arw2019 commented on a change in pull request #8474:
URL: https://github.com/apache/arrow/pull/8474#discussion_r523365931



##########
File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc
##########
@@ -151,6 +151,45 @@ std::unique_ptr<KernelState> MinMaxInit(KernelContext* ctx, const KernelInitArgs
   return visitor.Create();
 }
 
+// ----------------------------------------------------------------------
+// All implementation
+
+struct BooleanAllImpl : public ScalarAggregator {
+  void Consume(KernelContext*, const ExecBatch& batch) override {
+    // short-circuit if seen a false already
+    if (this->all == false) {
+      return;
+    }
+
+    const auto& data = *batch[0].array();
+    arrow::internal::OptionalBinaryBitBlockCounter counter(
+        data.buffers[1], data.offset, data.buffers[0], data.offset, data.length);
+    int64_t position = 0;
+    while (position < data.length) {
+      const auto block = counter.NextOrNotBlock();

Review comment:
       I named it after the bit operation we're using, which in this case is 
   https://github.com/apache/arrow/blob/7efb373ace6489dbebf2367dd91b05208e48a1bb/cpp/src/arrow/util/bit_block_counter.h#L196-L197




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on pull request #8474: ARROW-10301: [C++] Implement "all" reduction kernel for boolean data

Posted by GitBox <gi...@apache.org>.
arw2019 commented on pull request #8474:
URL: https://github.com/apache/arrow/pull/8474#issuecomment-720653983


   Ready for re-review. Most CI runs green, the outstanding failures look unrelated


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on pull request #8474: ARROW-10301: [C++] Implement "all" reduction kernel for boolean data

Posted by GitBox <gi...@apache.org>.
arw2019 commented on pull request #8474:
URL: https://github.com/apache/arrow/pull/8474#issuecomment-717691414


   > @arw2019 it looks like quite a few CI failures? Is this still WIP?
   
   Failures crept in at the recent rebase (was all green before). Marking as draft while I debug & will comment when ready for review


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on a change in pull request #8474: ARROW-10301: [C++][Compute] Implement "all" reduction kernel for boolean data

Posted by GitBox <gi...@apache.org>.
arw2019 commented on a change in pull request #8474:
URL: https://github.com/apache/arrow/pull/8474#discussion_r529918880



##########
File path: cpp/src/arrow/util/bit_block_counter.h
##########
@@ -244,6 +244,30 @@ class ARROW_EXPORT OptionalBinaryBitBlockCounter {
     }
   }
 
+  BitBlockCount NextOrNotBlock() {

Review comment:
       I'm duplicating code here. `NextOrBlock` is identical to 
   https://github.com/apache/arrow/blob/bd107272c076687212720a10514e7698eff25ffe/cpp/src/arrow/util/bit_block_counter.h#L223-L231
   except for `case HasBitmap::BOTH:` where `NextOrNotWord` is called. Is there a good way to avoid the repetition?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on pull request #8474: ARROW-10301: [C++][Compute] Implement "all" reduction kernel for boolean data

Posted by GitBox <gi...@apache.org>.
arw2019 commented on pull request #8474:
URL: https://github.com/apache/arrow/pull/8474#issuecomment-727132950


   > Since this is similar to #8294, and most review on the code happened there, does it makes sense to get that PR merged first?
   
   Yes, agreed - since this follows the pattern in #8294 very closely


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] arw2019 commented on pull request #8474: ARROW-10301: [C++][Compute] Implement "all" reduction kernel for boolean data

Posted by GitBox <gi...@apache.org>.
arw2019 commented on pull request #8474:
URL: https://github.com/apache/arrow/pull/8474#issuecomment-733262911


   @pitrou rebased + green so ready for re-review. 
   
   xref https://github.com/apache/arrow/pull/8474#discussion_r529918880 is a question about `OptionalBinaryBitBlockCounter`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #8474: ARROW-10301: [C++][Compute] Implement "all" reduction kernel for boolean data

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8474:
URL: https://github.com/apache/arrow/pull/8474


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8474: ARROW-10301: [C++][Compute] Implement "all" reduction kernel for boolean data

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8474:
URL: https://github.com/apache/arrow/pull/8474#issuecomment-733136694


   @arw2019  Do you want to revisit this now that the "any" kernel has been merged?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org