You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/08/05 15:30:14 UTC
[GitHub] [arrow-datafusion] alamb opened a new pull request, #3044: Fix `IsNull` pruning expression generation without null_count statistics
alamb opened a new pull request, #3044:
URL: https://github.com/apache/arrow-datafusion/pull/3044
(this is a one line code change PR, the rest is tests)
# Which issue does this PR close?
Closes https://github.com/apache/arrow-datafusion/issues/3042
# Rationale for this change
See https://github.com/apache/arrow-datafusion/issues/3042
# What changes are included in this PR?
1. Set the `nullability` annotation correctly
2. Add test coverage
# Are there any user-facing changes?
<!--
If there are user-facing changes then we may require documentation to be updated before approving the PR.
-->
<!--
If there are any breaking changes to public APIs, please add the `api change` label.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb merged pull request #3044: Fix `IsNull` pruning expression generation without null_count statistics
Posted by GitBox <gi...@apache.org>.
alamb merged PR #3044:
URL: https://github.com/apache/arrow-datafusion/pull/3044
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3044: Fix `IsNull` pruning expression generation without null_count statistics
Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #3044:
URL: https://github.com/apache/arrow-datafusion/pull/3044#discussion_r938932402
##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -639,7 +639,7 @@ fn build_is_null_column_expr(
Expr::Column(ref col) => {
let field = schema.field_with_name(&col.name).ok()?;
- let null_count_field = &Field::new(field.name(), DataType::UInt64, false);
+ let null_count_field = &Field::new(field.name(), DataType::UInt64, true);
Review Comment:
This is the fix
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on pull request #3044: Fix `IsNull` pruning expression generation without null_count statistics
Posted by GitBox <gi...@apache.org>.
alamb commented on PR #3044:
URL: https://github.com/apache/arrow-datafusion/pull/3044#issuecomment-1208034550
Thanks for the review @Dandandan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #3044: Fix `IsNull` pruning expression generation without null_count statistics
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #3044:
URL: https://github.com/apache/arrow-datafusion/pull/3044#issuecomment-1206610580
# [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/3044?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#3044](https://codecov.io/gh/apache/arrow-datafusion/pull/3044?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a7ff091) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/581934d73dfca7b99e6cc66767b3af3bbad7755f?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (581934d) will **increase** coverage by `0.01%`.
> The diff coverage is `100.00%`.
> :exclamation: Current head a7ff091 differs from pull request most recent head ceb9bfd. Consider uploading reports for the commit ceb9bfd to get more accurate results
```diff
@@ Coverage Diff @@
## master #3044 +/- ##
==========================================
+ Coverage 85.85% 85.86% +0.01%
==========================================
Files 286 286
Lines 51670 51704 +34
==========================================
+ Hits 44359 44395 +36
+ Misses 7311 7309 -2
```
| [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/3044?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [datafusion/core/src/physical\_optimizer/pruning.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/3044/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3NyYy9waHlzaWNhbF9vcHRpbWl6ZXIvcHJ1bmluZy5ycw==) | `94.75% <100.00%> (+0.58%)` | :arrow_up: |
| [datafusion/core/src/physical\_plan/metrics/value.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/3044/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3NyYy9waHlzaWNhbF9wbGFuL21ldHJpY3MvdmFsdWUucnM=) | `86.93% <0.00%> (-0.51%)` | :arrow_down: |
| [datafusion/expr/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/3044/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9leHByL3NyYy9sb2dpY2FsX3BsYW4vcGxhbi5ycw==) | `77.60% <0.00%> (ø)` | |
| [datafusion/expr/src/window\_frame.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/3044/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9leHByL3NyYy93aW5kb3dfZnJhbWUucnM=) | `93.27% <0.00%> (+0.84%)` | :arrow_up: |
:mega: Codecov can now indicate which changes are the most critical in Pull Requests. [Learn more](https://about.codecov.io/product/feature/runtime-insights/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3044: Fix `IsNull` pruning expression generation without null_count statistics
Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #3044:
URL: https://github.com/apache/arrow-datafusion/pull/3044#discussion_r938932706
##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -1761,4 +1819,39 @@ mod tests {
let result = p.prune(&statistics).unwrap();
assert_eq!(result, expected_ret);
}
+
+ #[test]
+ fn prune_int32_is_null() {
+ let (schema, statistics) = int32_setup();
+
+ // Expression "i IS NULL" when there are no null statistics,
+ // should all be kept
+ let expected_ret = vec![true, true, true, true, true];
+
+ // i IS NULL, no null statistics
+ let expr = col("i").is_null();
+ let p = PruningPredicate::try_new(expr, schema.clone()).unwrap();
Review Comment:
prior to the fix, this line would panic
##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -1761,4 +1819,39 @@ mod tests {
let result = p.prune(&statistics).unwrap();
assert_eq!(result, expected_ret);
}
+
+ #[test]
+ fn prune_int32_is_null() {
+ let (schema, statistics) = int32_setup();
+
+ // Expression "i IS NULL" when there are no null statistics,
+ // should all be kept
+ let expected_ret = vec![true, true, true, true, true];
+
+ // i IS NULL, no null statistics
+ let expr = col("i").is_null();
+ let p = PruningPredicate::try_new(expr, schema.clone()).unwrap();
+ let result = p.prune(&statistics).unwrap();
+ assert_eq!(result, expected_ret);
+
+ // provide null counts for each column
+ let statistics = statistics.with_null_counts(
+ "i",
+ vec![
+ Some(0), // no nulls (don't keep)
+ Some(1), // 1 null
+ None, // unknown nulls
+ None, // unknown nulls (min/max are both null too, like no stats at all)
+ Some(0), // 0 nulls (max=null too which means no known max) (don't keep)
+ ],
+ );
+
+ let expected_ret = vec![false, true, true, true, false];
Review Comment:
This case simply didn't have coverage before that I could find
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] ursabot commented on pull request #3044: Fix `IsNull` pruning expression generation without null_count statistics
Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #3044:
URL: https://github.com/apache/arrow-datafusion/pull/3044#issuecomment-1208036244
Benchmark runs are scheduled for baseline = acd1f40e7b5ba5d2100a9147785d28079bea48e6 and contender = 6e6f3bf9dec82b332e21f11699552c34f72493ac. 6e6f3bf9dec82b332e21f11699552c34f72493ac is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/3ed350d502e344dab9b63bede15cd5b1...0c52d33a4cb54f87b93a6d5567cb0617/)
[Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/799efd3376e146f787637a6813f48a4c...63dd871fe2ff4d24b7bce90ac06230d3/)
[Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/cfb503b79ed847bca7a023c44f7a1a9d...182dc179a58f49debb905ea7c7ca07b0/)
[Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/1324ec776b6d4800963f4aedfaded3b0...8ba8dbe62dbe493a85fc89d50b57f32d/)
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org