You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/06 15:53:34 UTC

[GitHub] [arrow] bkietz commented on a change in pull request #10253: ARROW-12659: [C++][Compute] Support is_valid as a guarantee

bkietz commented on a change in pull request #10253:
URL: https://github.com/apache/arrow/pull/10253#discussion_r627547092



##########
File path: cpp/src/arrow/dataset/file_parquet.cc
##########
@@ -198,17 +198,22 @@ static util::optional<compute::Expression> ColumnChunkStatisticsAsExpression(
   auto maybe_min = min->CastTo(field->type());
   auto maybe_max = max->CastTo(field->type());
   if (maybe_min.ok() && maybe_max.ok()) {
-    auto col_min = maybe_min.MoveValueUnsafe();
-    auto col_max = maybe_max.MoveValueUnsafe();
-    if (col_min->Equals(col_max)) {
-      return compute::equal(std::move(field_expr), compute::literal(std::move(col_min)));
+    min = maybe_min.MoveValueUnsafe();
+    max = maybe_max.MoveValueUnsafe();
+    if (min->Equals(max)) {
+      return compute::equal(std::move(field_expr), compute::literal(std::move(min)));

Review comment:
       Equality expressions will be picked up by the ExtractKnownFieldValues pass, which assumes the field will `min` and never null (questionably correct)
   
   Longer term: I think we need to be more rigorous with the guarantee contract. Specifically: any guarantee should be usable as a filter, and in that case will evaluate to an array containing only `true` (and no nulls).
   
   | Data, Guarantee | Filter | Evaluation of guarantee on Data | |
   |---|---|---|---|
   | [2, 2], i32 == 2 | i32 == 2 | [true, true] | can simplify to `true` |
   | [2, null], i32 == 2 or is_null(i32) | i32 == 2 | [true, null] | the second row should be excluded from the filter |
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org