You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/12 10:06:54 UTC

[GitHub] [arrow] ovr opened a new pull request #9174: ARROW-11220: [Rust] Implement GROUP BY support for Boolean

ovr opened a new pull request #9174:
URL: https://github.com/apache/arrow/pull/9174


   Introduce support in DataFushion for GROUP BY on boolean values. Boolean type in Rust implements Eq and Hash traits which allow us to use GroupByScalar.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9174: ARROW-11220: [Rust] Implement GROUP BY support for Boolean

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9174:
URL: https://github.com/apache/arrow/pull/9174#issuecomment-761564205


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=h1) Report
   > Merging [#9174](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=desc) (3db6560) into [master](https://codecov.io/gh/apache/arrow/commit/1393188e1aa1b3d59993ce7d4ade7f7ac8570959?el=desc) (1393188) will **decrease** coverage by `0.00%`.
   > The diff coverage is `85.71%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9174/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9174      +/-   ##
   ==========================================
   - Coverage   81.61%   81.60%   -0.01%     
   ==========================================
     Files         215      215              
     Lines       51867    51885      +18     
   ==========================================
   + Hits        42329    42343      +14     
   - Misses       9538     9542       +4     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/src/physical\_plan/group\_scalar.rs](https://codecov.io/gh/apache/arrow/pull/9174/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2dyb3VwX3NjYWxhci5ycw==) | `68.00% <0.00%> (-2.84%)` | :arrow_down: |
   | [...ust/datafusion/src/physical\_plan/hash\_aggregate.rs](https://codecov.io/gh/apache/arrow/pull/9174/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2hhc2hfYWdncmVnYXRlLnJz) | `84.91% <100.00%> (+0.11%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/hash\_join.rs](https://codecov.io/gh/apache/arrow/pull/9174/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2hhc2hfam9pbi5ycw==) | `84.78% <100.00%> (+0.07%)` | :arrow_up: |
   | [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9174/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.84% <100.00%> (+<0.01%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9174/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `94.86% <0.00%> (-0.20%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=footer). Last update [eaa7b7a...3db6560](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on a change in pull request #9174: ARROW-11220: [Rust] Implement GROUP BY support for Boolean

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9174:
URL: https://github.com/apache/arrow/pull/9174#discussion_r555958013



##########
File path: rust/datafusion/src/physical_plan/hash_join.rs
##########
@@ -447,6 +447,11 @@ pub(crate) fn create_key(
                 // store the string value
                 vec.extend_from_slice(value.as_bytes());
             }
+            DataType::Boolean => {
+                let array = col.as_any().downcast_ref::<BooleanArray>().unwrap();
+                let x: u8 = if array.value(row) { 1 } else { 0 };

Review comment:
       This could probably also use `array.value(row) as u8`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #9174: ARROW-11220: [Rust] Implement GROUP BY support for Boolean

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9174:
URL: https://github.com/apache/arrow/pull/9174


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9174: ARROW-11220: [Rust] Implement GROUP BY support for Boolean

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9174:
URL: https://github.com/apache/arrow/pull/9174#issuecomment-758549335


   https://issues.apache.org/jira/browse/ARROW-11220


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9174: ARROW-11220: [Rust] Implement GROUP BY support for Boolean

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9174:
URL: https://github.com/apache/arrow/pull/9174#issuecomment-761773418


   I apologize for the delay in merging Rust PRs -- the 3.0 release is being finalized now and are planning to minimize entropy by postponing merging  changes not critical for the release until the process was complete. I hope the process is complete in the next few days. There is more [discussion](https://lists.apache.org/list.html?dev@arrow.apache.org) in the mailing list 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ovr commented on pull request #9174: ARROW-11220: [Rust] Implement GROUP BY support for Boolean

Posted by GitBox <gi...@apache.org>.
ovr commented on pull request #9174:
URL: https://github.com/apache/arrow/pull/9174#issuecomment-761564117


   @alamb Rebased, I added test. Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9174: ARROW-11220: [Rust] Implement GROUP BY support for Boolean

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9174:
URL: https://github.com/apache/arrow/pull/9174#issuecomment-761564205


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=h1) Report
   > Merging [#9174](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=desc) (03985bb) into [master](https://codecov.io/gh/apache/arrow/commit/1393188e1aa1b3d59993ce7d4ade7f7ac8570959?el=desc) (1393188) will **decrease** coverage by `0.01%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9174/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9174      +/-   ##
   ==========================================
   - Coverage   81.61%   81.59%   -0.02%     
   ==========================================
     Files         215      215              
     Lines       51867    51876       +9     
   ==========================================
     Hits        42329    42329              
   - Misses       9538     9547       +9     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/src/physical\_plan/group\_scalar.rs](https://codecov.io/gh/apache/arrow/pull/9174/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2dyb3VwX3NjYWxhci5ycw==) | `68.00% <0.00%> (-2.84%)` | :arrow_down: |
   | [...ust/datafusion/src/physical\_plan/hash\_aggregate.rs](https://codecov.io/gh/apache/arrow/pull/9174/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2hhc2hfYWdncmVnYXRlLnJz) | `84.14% <0.00%> (-0.66%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/hash\_join.rs](https://codecov.io/gh/apache/arrow/pull/9174/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2hhc2hfam9pbi5ycw==) | `84.07% <0.00%> (-0.64%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=footer). Last update [eaa7b7a...3db6560](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9174: ARROW-11220: [Rust] Implement GROUP BY support for Boolean

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9174:
URL: https://github.com/apache/arrow/pull/9174#issuecomment-761140983


   I think this PR needs a rebase and perhaps an end-to-end test for grouping (as you did on https://github.com/apache/arrow/pull/9175)  and it will be ready to go. Thanks again @ovr !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9174: ARROW-11220: [Rust] Implement GROUP BY support for Boolean

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9174:
URL: https://github.com/apache/arrow/pull/9174#issuecomment-764600978


   Thanks again for the contribution @ovr  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9174: ARROW-11220: [Rust] Implement GROUP BY support for Boolean

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9174:
URL: https://github.com/apache/arrow/pull/9174#issuecomment-761564205


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=h1) Report
   > Merging [#9174](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=desc) (9eb40b5) into [master](https://codecov.io/gh/apache/arrow/commit/1393188e1aa1b3d59993ce7d4ade7f7ac8570959?el=desc) (1393188) will **increase** coverage by `0.00%`.
   > The diff coverage is `86.36%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9174/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #9174   +/-   ##
   =======================================
     Coverage   81.61%   81.61%           
   =======================================
     Files         215      215           
     Lines       51867    51886   +19     
   =======================================
   + Hits        42329    42345   +16     
   - Misses       9538     9541    +3     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/src/physical\_plan/group\_scalar.rs](https://codecov.io/gh/apache/arrow/pull/9174/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2dyb3VwX3NjYWxhci5ycw==) | `68.00% <0.00%> (-2.84%)` | :arrow_down: |
   | [...ust/datafusion/src/physical\_plan/hash\_aggregate.rs](https://codecov.io/gh/apache/arrow/pull/9174/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2hhc2hfYWdncmVnYXRlLnJz) | `84.91% <100.00%> (+0.11%)` | :arrow_up: |
   | [rust/datafusion/src/physical\_plan/hash\_join.rs](https://codecov.io/gh/apache/arrow/pull/9174/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2hhc2hfam9pbi5ycw==) | `84.82% <100.00%> (+0.11%)` | :arrow_up: |
   | [rust/datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow/pull/9174/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3NxbC5ycw==) | `99.84% <100.00%> (+<0.01%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=footer). Last update [eaa7b7a...3db6560](https://codecov.io/gh/apache/arrow/pull/9174?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ovr commented on a change in pull request #9174: ARROW-11220: [Rust] Implement GROUP BY support for Boolean

Posted by GitBox <gi...@apache.org>.
ovr commented on a change in pull request #9174:
URL: https://github.com/apache/arrow/pull/9174#discussion_r558883490



##########
File path: rust/datafusion/src/physical_plan/hash_join.rs
##########
@@ -447,6 +447,11 @@ pub(crate) fn create_key(
                 // store the string value
                 vec.extend_from_slice(value.as_bytes());
             }
+            DataType::Boolean => {
+                let array = col.as_any().downcast_ref::<BooleanArray>().unwrap();
+                let x: u8 = if array.value(row) { 1 } else { 0 };

Review comment:
       Thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9174: ARROW-11220: [Rust] Implement GROUP BY support for Boolean

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9174:
URL: https://github.com/apache/arrow/pull/9174#issuecomment-764600978


   Thanks again for the contribution @ovr  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #9174: ARROW-11220: [Rust] Implement GROUP BY support for Boolean

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9174:
URL: https://github.com/apache/arrow/pull/9174


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org