You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/08/05 15:30:14 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #3044: Fix `IsNull` pruning expression generation without null_count statistics

alamb opened a new pull request, #3044:
URL: https://github.com/apache/arrow-datafusion/pull/3044

   (this is a one line code change PR, the rest is tests)
   
   # Which issue does this PR close?
   
   Closes https://github.com/apache/arrow-datafusion/issues/3042
   
    # Rationale for this change
   See https://github.com/apache/arrow-datafusion/issues/3042
   
   # What changes are included in this PR?
   1. Set the `nullability` annotation correctly
   2. Add test coverage
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #3044: Fix `IsNull` pruning expression generation without null_count statistics

Posted by GitBox <gi...@apache.org>.
alamb merged PR #3044:
URL: https://github.com/apache/arrow-datafusion/pull/3044


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3044: Fix `IsNull` pruning expression generation without null_count statistics

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #3044:
URL: https://github.com/apache/arrow-datafusion/pull/3044#discussion_r938932402


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -639,7 +639,7 @@ fn build_is_null_column_expr(
         Expr::Column(ref col) => {
             let field = schema.field_with_name(&col.name).ok()?;
 
-            let null_count_field = &Field::new(field.name(), DataType::UInt64, false);
+            let null_count_field = &Field::new(field.name(), DataType::UInt64, true);

Review Comment:
   This is the fix



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #3044: Fix `IsNull` pruning expression generation without null_count statistics

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #3044:
URL: https://github.com/apache/arrow-datafusion/pull/3044#issuecomment-1208034550

   Thanks for the review @Dandandan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #3044: Fix `IsNull` pruning expression generation without null_count statistics

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #3044:
URL: https://github.com/apache/arrow-datafusion/pull/3044#issuecomment-1206610580

   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/3044?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#3044](https://codecov.io/gh/apache/arrow-datafusion/pull/3044?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a7ff091) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/581934d73dfca7b99e6cc66767b3af3bbad7755f?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (581934d) will **increase** coverage by `0.01%`.
   > The diff coverage is `100.00%`.
   
   > :exclamation: Current head a7ff091 differs from pull request most recent head ceb9bfd. Consider uploading reports for the commit ceb9bfd to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #3044      +/-   ##
   ==========================================
   + Coverage   85.85%   85.86%   +0.01%     
   ==========================================
     Files         286      286              
     Lines       51670    51704      +34     
   ==========================================
   + Hits        44359    44395      +36     
   + Misses       7311     7309       -2     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/3044?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [datafusion/core/src/physical\_optimizer/pruning.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/3044/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3NyYy9waHlzaWNhbF9vcHRpbWl6ZXIvcHJ1bmluZy5ycw==) | `94.75% <100.00%> (+0.58%)` | :arrow_up: |
   | [datafusion/core/src/physical\_plan/metrics/value.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/3044/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3NyYy9waHlzaWNhbF9wbGFuL21ldHJpY3MvdmFsdWUucnM=) | `86.93% <0.00%> (-0.51%)` | :arrow_down: |
   | [datafusion/expr/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/3044/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9leHByL3NyYy9sb2dpY2FsX3BsYW4vcGxhbi5ycw==) | `77.60% <0.00%> (ø)` | |
   | [datafusion/expr/src/window\_frame.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/3044/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9leHByL3NyYy93aW5kb3dfZnJhbWUucnM=) | `93.27% <0.00%> (+0.84%)` | :arrow_up: |
   
   :mega: Codecov can now indicate which changes are the most critical in Pull Requests. [Learn more](https://about.codecov.io/product/feature/runtime-insights/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3044: Fix `IsNull` pruning expression generation without null_count statistics

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #3044:
URL: https://github.com/apache/arrow-datafusion/pull/3044#discussion_r938932706


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -1761,4 +1819,39 @@ mod tests {
         let result = p.prune(&statistics).unwrap();
         assert_eq!(result, expected_ret);
     }
+
+    #[test]
+    fn prune_int32_is_null() {
+        let (schema, statistics) = int32_setup();
+
+        // Expression "i IS NULL" when there are no null statistics,
+        // should all be kept
+        let expected_ret = vec![true, true, true, true, true];
+
+        // i IS NULL, no null statistics
+        let expr = col("i").is_null();
+        let p = PruningPredicate::try_new(expr, schema.clone()).unwrap();

Review Comment:
   prior to the fix, this line would panic



##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -1761,4 +1819,39 @@ mod tests {
         let result = p.prune(&statistics).unwrap();
         assert_eq!(result, expected_ret);
     }
+
+    #[test]
+    fn prune_int32_is_null() {
+        let (schema, statistics) = int32_setup();
+
+        // Expression "i IS NULL" when there are no null statistics,
+        // should all be kept
+        let expected_ret = vec![true, true, true, true, true];
+
+        // i IS NULL, no null statistics
+        let expr = col("i").is_null();
+        let p = PruningPredicate::try_new(expr, schema.clone()).unwrap();
+        let result = p.prune(&statistics).unwrap();
+        assert_eq!(result, expected_ret);
+
+        // provide null counts for each column
+        let statistics = statistics.with_null_counts(
+            "i",
+            vec![
+                Some(0), // no nulls (don't keep)
+                Some(1), // 1 null
+                None,    // unknown nulls
+                None, // unknown nulls (min/max are both null too, like no stats at all)
+                Some(0), // 0 nulls (max=null too which means no known max) (don't keep)
+            ],
+        );
+
+        let expected_ret = vec![false, true, true, true, false];

Review Comment:
   This case simply didn't have coverage before that I could find



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ursabot commented on pull request #3044: Fix `IsNull` pruning expression generation without null_count statistics

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #3044:
URL: https://github.com/apache/arrow-datafusion/pull/3044#issuecomment-1208036244

   Benchmark runs are scheduled for baseline = acd1f40e7b5ba5d2100a9147785d28079bea48e6 and contender = 6e6f3bf9dec82b332e21f11699552c34f72493ac. 6e6f3bf9dec82b332e21f11699552c34f72493ac is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/3ed350d502e344dab9b63bede15cd5b1...0c52d33a4cb54f87b93a6d5567cb0617/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/799efd3376e146f787637a6813f48a4c...63dd871fe2ff4d24b7bce90ac06230d3/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/cfb503b79ed847bca7a023c44f7a1a9d...182dc179a58f49debb905ea7c7ca07b0/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/1324ec776b6d4800963f4aedfaded3b0...8ba8dbe62dbe493a85fc89d50b57f32d/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org