You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/28 18:17:00 UTC

[GitHub] [arrow-datafusion] tustvold opened a new pull request, #2808: Use specialized dictionary kernels (#1178)

tustvold opened a new pull request, #2808:
URL: https://github.com/apache/arrow-datafusion/pull/2808

   _Draft as likely needs a bit more testing_
   
   # Which issue does this PR close?
   
   Part of #1178
   
    # Rationale for this change
   
   Arrow-rs supports native evaluation on dictionaries for comparison operations against other dictionaries and scalars. We should make use of this to avoid hydrating dictionaries unnecessarily
   
   # What changes are included in this PR?
   
   Tweaks the coercion rules to coerce to the dictionary type if supported by the operator 
   
   # Are there any user-facing changes?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on pull request #2808: Use specialized dictionary kernels (#1178)

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #2808:
URL: https://github.com/apache/arrow-datafusion/pull/2808#issuecomment-1169072866

   Unsurprisingly the performance benefits of this are quite pronounced
   
   
   ```
   scheduled: select count(*) from t where dict_10_required = 'prefix#0'                                                                             
                           time:   [4.0683 ms 4.0732 ms 4.0783 ms]
                           change: [-40.646% -40.476% -40.299%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   tokio: select count(*) from t where dict_100_required = 'prefix#0'                                                                             
                           time:   [4.4917 ms 4.5479 ms 4.6022 ms]
                           change: [-39.027% -38.266% -37.502%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   scheduled: select count(*) from t where dict_100_required = 'prefix#0'                                                                             
                           time:   [3.8694 ms 3.8755 ms 3.8815 ms]
                           change: [-33.176% -32.985% -32.795%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   tokio: select count(*) from t where dict_1000_required = 'prefix#0'                                                                             
                           time:   [4.7944 ms 4.8326 ms 4.8687 ms]
                           change: [-31.344% -30.719% -30.083%] (p = 0.00 < 0.05)
                           Performance has improved.
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #2808: Use specialized dictionary kernels (#1178)

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2808:
URL: https://github.com/apache/arrow-datafusion/pull/2808#discussion_r908804986


##########
datafusion/physical-expr/src/expressions/binary.rs:
##########
@@ -1009,11 +1009,28 @@ impl PhysicalExpr for BinaryExpr {
         let left_data_type = left_value.data_type();
         let right_data_type = right_value.data_type();
 
-        if left_data_type != right_data_type {
-            return Err(DataFusionError::Internal(format!(
-                "Cannot evaluate binary expression {:?} with types {:?} and {:?}",
-                self.op, left_data_type, right_data_type
-            )));
+        match (&left_value, &left_data_type, &right_value, &right_data_type) {
+            // Types are equal => valid
+            (_, l, _, r) if l == r => {}
+            // Allow comparing a dictionary value with its corresponding scalar value

Review Comment:
   This is actually necessary for correctness in addition to being beneficial for performance, because ScalarValue does not have a way to encode a dictionary data type



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2808: Use specialized dictionary kernels (#1178)

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2808:
URL: https://github.com/apache/arrow-datafusion/pull/2808#discussion_r910508235


##########
datafusion/expr/src/binary_rule.rs:
##########
@@ -155,14 +155,12 @@ pub fn comparison_eq_coercion(
     lhs_type: &DataType,
     rhs_type: &DataType,
 ) -> Option<DataType> {
-    // can't compare dictionaries directly due to
-    // https://github.com/apache/arrow-rs/issues/1201

Review Comment:
   ❤️ 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #2808: Use specialized dictionary kernels (#1178)

Posted by GitBox <gi...@apache.org>.
alamb merged PR #2808:
URL: https://github.com/apache/arrow-datafusion/pull/2808


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2808: Use specialized dictionary kernels (#1178)

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #2808:
URL: https://github.com/apache/arrow-datafusion/pull/2808#issuecomment-1169122276

   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2808?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#2808](https://codecov.io/gh/apache/arrow-datafusion/pull/2808?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (1513b88) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/7617d78809d4ff5bde31142e0744c70024e40635?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7617d78) will **increase** coverage by `0.11%`.
   > The diff coverage is `96.15%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #2808      +/-   ##
   ==========================================
   + Coverage   85.11%   85.22%   +0.11%     
   ==========================================
     Files         273      274       +1     
     Lines       48242    48634     +392     
   ==========================================
   + Hits        41059    41449     +390     
   - Misses       7183     7185       +2     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/2808?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [datafusion/physical-expr/src/expressions/binary.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2808/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9waHlzaWNhbC1leHByL3NyYy9leHByZXNzaW9ucy9iaW5hcnkucnM=) | `95.12% <75.00%> (+<0.01%)` | :arrow_up: |
   | [datafusion/expr/src/binary\_rule.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2808/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9leHByL3NyYy9iaW5hcnlfcnVsZS5ycw==) | `84.76% <100.00%> (+0.47%)` | :arrow_up: |
   | [datafusion/core/src/physical\_plan/join\_utils.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2808/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3NyYy9waHlzaWNhbF9wbGFuL2pvaW5fdXRpbHMucnM=) | `93.61% <0.00%> (-3.20%)` | :arrow_down: |
   | [datafusion/sql/src/planner.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2808/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcWwvc3JjL3BsYW5uZXIucnM=) | `81.36% <0.00%> (ø)` | |
   | [...fusion/optimizer/src/single\_distinct\_to\_groupby.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2808/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL3NpbmdsZV9kaXN0aW5jdF90b19ncm91cGJ5LnJz) | `98.80% <0.00%> (ø)` | |
   | [...on/core/src/physical\_optimizer/coalesce\_batches.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2808/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3NyYy9waHlzaWNhbF9vcHRpbWl6ZXIvY29hbGVzY2VfYmF0Y2hlcy5ycw==) | `100.00% <0.00%> (ø)` | |
   | [datafusion/optimizer/src/reduce\_outer\_join.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2808/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL3JlZHVjZV9vdXRlcl9qb2luLnJz) | `99.39% <0.00%> (ø)` | |
   | [datafusion/core/tests/sql/joins.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2808/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3Rlc3RzL3NxbC9qb2lucy5ycw==) | `99.31% <0.00%> (+0.20%)` | :arrow_up: |
   | [datafusion/expr/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2808/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9leHByL3NyYy9sb2dpY2FsX3BsYW4vcGxhbi5ycw==) | `74.40% <0.00%> (+0.29%)` | :arrow_up: |
   | [datafusion/core/src/config.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2808/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3NyYy9jb25maWcucnM=) | `90.76% <0.00%> (+0.44%)` | :arrow_up: |
   | ... and [3 more](https://codecov.io/gh/apache/arrow-datafusion/pull/2808/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2808?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2808?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [7617d78...1513b88](https://codecov.io/gh/apache/arrow-datafusion/pull/2808?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org