You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2023/01/17 07:28:24 UTC

[GitHub] [iceberg] youngxinler commented on a diff in pull request #6554: Parquet: Improve Test Coverage of RowGroupFilter Code with Nans #6518

youngxinler commented on code in PR #6554:
URL: https://github.com/apache/iceberg/pull/6554#discussion_r1071834035


##########
data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilter.java:
##########
@@ -341,6 +345,25 @@ public void testNotNaN() {
     Assert.assertTrue("Should read: NaN counts are not tracked in Parquet metrics", shouldRead);
   }
 
+
+  @Test
+  public void testDoubleWithNan() {

Review Comment:
   thanks for @RussellSpitzer suggestion,  but when i run test as your suggestion.  
   I found ORC also can't skip read.  so i think the negative test is work for parquet and ORC? 
   
   as for #6517,  "// Only ORC should be able to distinguish using min/max when NaN is present",  i add these tests.
   ```java
         // just for "format == FileFormat.ORC"
         // record.setField("_some_double_nans", (i % 10 == 0) ? Double.NaN : 2D); // includes some nan values
         shouldRead = shouldRead(equal("some_double_nans", 10.0));
         Assert.assertFalse("Should skip: column with some nans contains target value", shouldRead);
   
         shouldRead = shouldRead(greaterThan("some_double_nans", 10.0));
         Assert.assertFalse("Should skip: column with some nans contains target value", shouldRead);
   ```
   bacause the some_double_nans Max Value is 2D,  assert showRead will return false,  but all shouldRead is true,  it looks strange? 
   If my guess is correct, the pushdown of ORC containing nans will not work. Is this as expected? Should we open a new issue to discuss?
   
   ![image](https://user-images.githubusercontent.com/36697727/212830411-dbe3dd02-56ca-4c38-8a54-f53277957798.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org