You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/10/06 07:46:21 UTC

[GitHub] [iceberg] Fokko opened a new pull request, #5926: API: Simplify the ManifestEvaluator

Fokko opened a new pull request, #5926:
URL: https://github.com/apache/iceberg/pull/5926

   While writing tests for Python, I noticed some unreachable code. This also allows us to simplify the logic here by inlining the `allValuesAreNull` function.
   
   The `allValuesAreNull` check is redundant for the `isNan` check.
   
   Because we first check if there are NaN values:
   
   ```java
   @Override
   public <T> Boolean isNaN(BoundReference<T> ref) {
     int pos = Accessors.toPosition(ref.accessor());
   
     if (stats.get(pos).containsNaN() != null && !stats.get(pos).containsNaN()) {
       return ROWS_CANNOT_MATCH;
     }
   
     if (allValuesAreNull(stats.get(pos), ref.type().typeId())) {
       return ROWS_CANNOT_MATCH; // Unreachable code
     }
   
     return ROWS_MIGHT_MATCH;
   }
   ```
   
   And then we do the same in the `allValuesAreNull`:
   
   ```java
   private boolean allValuesAreNull(PartitionFieldSummary summary, Type.TypeID typeId) {
     // containsNull encodes whether at least one partition value is null,
     // lowerBound is null if all partition values are null
     boolean allNull = summary.containsNull() && summary.lowerBound() == null;
   
     if (allNull && (Type.TypeID.DOUBLE.equals(typeId) || Type.TypeID.FLOAT.equals(typeId))) {
       // floating point types may include NaN values, which we check separately.
       // In case bounds don't include NaN value, containsNaN needs to be checked against.
       allNull = summary.containsNaN() != null && !summary.containsNaN();
     }
     return allNull;
   }
   ```
   
   Since the `isNan` can only be applied to Floats and Doubles, we always take the branch. And then we come to the same conclusion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5926: API: Simplify the ManifestEvaluator

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5926:
URL: https://github.com/apache/iceberg/pull/5926#discussion_r989431779


##########
api/src/main/java/org/apache/iceberg/expressions/ManifestEvaluator.java:
##########
@@ -399,18 +406,5 @@ public <T> Boolean notStartsWith(BoundReference<T> ref, Literal<T> lit) {
 
       return ROWS_MIGHT_MATCH;
     }
-
-    private boolean allValuesAreNull(PartitionFieldSummary summary, Type.TypeID typeId) {

Review Comment:
   I think I'd still prefer to keep this as a separate function, rather than possibly recreating it (incorrectly) later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5926: API: Simplify the ManifestEvaluator

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5926:
URL: https://github.com/apache/iceberg/pull/5926#discussion_r989435730


##########
api/src/main/java/org/apache/iceberg/expressions/ManifestEvaluator.java:
##########
@@ -399,18 +406,5 @@ public <T> Boolean notStartsWith(BoundReference<T> ref, Literal<T> lit) {
 
       return ROWS_MIGHT_MATCH;
     }
-
-    private boolean allValuesAreNull(PartitionFieldSummary summary, Type.TypeID typeId) {

Review Comment:
   That is a fair point, I've reinstated the function and reverted the changes on the null check. Now it just removes the redundant check on `isNaN`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5926: API: Simplify the ManifestEvaluator

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5926:
URL: https://github.com/apache/iceberg/pull/5926#discussion_r989731857


##########
api/src/main/java/org/apache/iceberg/expressions/ManifestEvaluator.java:
##########
@@ -148,10 +148,6 @@ public <T> Boolean isNaN(BoundReference<T> ref) {
         return ROWS_CANNOT_MATCH;
       }
 
-      if (allValuesAreNull(stats.get(pos), ref.type().typeId())) {

Review Comment:
   Yes, you are right on that one! It didn't think of encoding NaN as None. Then this makes sense



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko closed pull request #5926: API: Simplify the ManifestEvaluator

Posted by GitBox <gi...@apache.org>.
Fokko closed pull request #5926: API: Simplify the ManifestEvaluator
URL: https://github.com/apache/iceberg/pull/5926


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5926: API: Simplify the ManifestEvaluator

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5926:
URL: https://github.com/apache/iceberg/pull/5926#discussion_r989431779


##########
api/src/main/java/org/apache/iceberg/expressions/ManifestEvaluator.java:
##########
@@ -399,18 +406,5 @@ public <T> Boolean notStartsWith(BoundReference<T> ref, Literal<T> lit) {
 
       return ROWS_MIGHT_MATCH;
     }
-
-    private boolean allValuesAreNull(PartitionFieldSummary summary, Type.TypeID typeId) {

Review Comment:
   I think I'd still prefer to keep this as a separate function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5926: API: Simplify the ManifestEvaluator

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5926:
URL: https://github.com/apache/iceberg/pull/5926#discussion_r989539742


##########
api/src/main/java/org/apache/iceberg/expressions/ManifestEvaluator.java:
##########
@@ -148,10 +148,6 @@ public <T> Boolean isNaN(BoundReference<T> ref) {
         return ROWS_CANNOT_MATCH;
       }
 
-      if (allValuesAreNull(stats.get(pos), ref.type().typeId())) {

Review Comment:
   I don't think this is correct. The `containsNaN` field was introduced later, so it can be null for metadata files that were written before we tracked NaN values in partitions. In that case, if we know that all values are null then none of them are NaN and can then eliminate scanning the manifest.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org