You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "zhongyujiang (via GitHub)" <gi...@apache.org> on 2023/02/16 02:56:17 UTC

[GitHub] [iceberg] zhongyujiang commented on a diff in pull request #6836: Parquet: Optimize dictionary filter evaluation on notIn and notEq

zhongyujiang commented on code in PR #6836:
URL: https://github.com/apache/iceberg/pull/6836#discussion_r1107953666


##########
parquet/src/test/java/org/apache/iceberg/parquet/TestDictionaryRowGroupFilter.java:
##########
@@ -1030,7 +1030,7 @@ public void testMissingDictionaryPageForColumn() {
         IllegalStateException.class,
         "Failed to read required dictionary page for id: 5",
         () ->
-            new ParquetDictionaryRowGroupFilter(SCHEMA, notEqual("some_nulls", "some"))
+            new ParquetDictionaryRowGroupFilter(SCHEMA, equal("some_nulls", "some"))

Review Comment:
   Thanks for reviewing!
   
   I checked the PR #93 that introduced this UT and the relevant [discussion](https://github.com/apache/iceberg/pull/86#discussion_r253802993). Basically it wants to throw an exception when no dict is available to avoid NPE after calling `dict`. But as @rdblue mentioned [here](https://github.com/apache/iceberg/pull/86#discussion_r253943635),  all runs of `dict` in `ParquetDictionaryRowGroupFilter` is guaranteed that there must be a dictionary. So under normal circumstances, this exception will not be thrown out I think. The reason why this UT can catch the exception is because it mocked an abnormal [DictionaryPageReadStore](https://github.com/apache/iceberg/blob/49d833aa83f6d4681c6c9f8473080f6e78124a0e/parquet/src/test/java/org/apache/iceberg/parquet/TestDictionaryRowGroupFilter.java#L1034).
   Above is the context about this UT. So I think replacing it with `eq` can still serve the original purpose of verification, right? So I think moving null-check is reasonable because the run of `notEq` or `notIn` is guaranteed there will be a dictionary there. Moving it forward can help avoiding the dictionary decoding overhead when the column has nulls.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org