You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "stevenzwu (via GitHub)" <gi...@apache.org> on 2023/05/25 20:22:04 UTC

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7617: Parquet: skip writing bloom filter for deletes

stevenzwu commented on code in PR #7617:
URL: https://github.com/apache/iceberg/pull/7617#discussion_r1205969186


##########
parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java:
##########
@@ -519,8 +512,8 @@ static Context deleteContext(Map<String, String> config) {
             compressionLevel,
             rowGroupCheckMinRecordCount,
             rowGroupCheckMaxRecordCount,
-            bloomFilterMaxBytes,
-            columnBloomFilterEnabled,
+            PARQUET_BLOOM_FILTER_MAX_BYTES_DEFAULT,
+            ImmutableMap.of(),

Review Comment:
   for most use cases, this change makes sense.
   
   If some datasets have high updates rate and generates a lot of large delete files. would the bloom filter for delete file be useful too?
   
   if yes, we can introduce a config to enable/disable bloom filter for delete files only. If not, this change is good to me.
   
   @huaxingao @hililiwei @chenjunjiedada 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org