You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/11 09:31:30 UTC

[GitHub] [iceberg] hililiwei commented on a diff in pull request #4734: ORC: Upgrade orc version to use the built-in estimate memory method

hililiwei commented on code in PR #4734:
URL: https://github.com/apache/iceberg/pull/4734#discussion_r870081657


##########
data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilter.java:
##########
@@ -298,17 +298,32 @@ public void testNoNulls() {
 
   @Test
   public void testIsNaN() {
+    Assume.assumeTrue("Avro files do not have row group statistics", format != FileFormat.AVRO);
+
     boolean shouldRead = shouldRead(isNaN("all_nans"));
-    Assert.assertTrue("Should read: NaN counts are not tracked in Parquet metrics", shouldRead);
+    Assert.assertTrue(String.format("Should read: NaN counts are not tracked in %s metrics", format), shouldRead);
 
     shouldRead = shouldRead(isNaN("some_nans"));
-    Assert.assertTrue("Should read: NaN counts are not tracked in Parquet metrics", shouldRead);
-
-    shouldRead = shouldRead(isNaN("no_nans"));
-    Assert.assertTrue("Should read: NaN counts are not tracked in Parquet metrics", shouldRead);
+    Assert.assertTrue(String.format("Should read: NaN counts are not tracked in %s metrics", format), shouldRead);
 
     shouldRead = shouldRead(isNaN("all_nulls"));
     Assert.assertFalse("Should skip: all null column will not contain nan value", shouldRead);
+
+    if (format == FileFormat.ORC) {
+      testIsNanOrc();
+    } else {
+      testIsNanParquet();
+    }
+  }
+
+  private void testIsNanParquet() {
+    boolean shouldRead = shouldRead(isNaN("no_nans"));
+    Assert.assertTrue("Should read: NaN counts are not tracked in Parquet metrics", shouldRead);
+  }
+
+  private void testIsNanOrc() {
+    boolean shouldRead = shouldRead(isNaN("no_nans"));
+    Assert.assertFalse("Should skip: no nans will not contain nan value", shouldRead);

Review Comment:
   There seems to be only hasNull flag in ORC statistics used to better answer ‘IS NULL’. I ran a demo and didn't seem to find anything about `nan`. In `all nans` column, the `hasNull` is still `false`, and in `all nulls` , it is `true`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org