You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/22 09:57:17 UTC

[GitHub] [iceberg] deadwind4 commented on a diff in pull request #5313: Orc: Support row group bloom filters

deadwind4 commented on code in PR #5313:
URL: https://github.com/apache/iceberg/pull/5313#discussion_r951244160


##########
docs/configuration.md:
##########
@@ -64,6 +64,8 @@ Iceberg tables support table properties to configure table behavior, like the de
 | write.orc.block-size-bytes         | 268435456 (256 MB) | Define the default file system block size for ORC files |
 | write.orc.compression-codec        | zlib               | ORC compression codec: zstd, lz4, lzo, zlib, snappy, none |
 | write.orc.compression-strategy     | speed              | ORC compression strategy: speed, compression |
+| write.orc.bloom.filter.columns     | (not set)          | Comma separated list of column names for which a Bloom filter must be created |
+| write.orc.bloom.filter.fpp         | 0.05               | False positive probability for Bloom filter (must > 0.0 and < 1.0) |

Review Comment:
   IMO, in this PR, we align `write.parquet.bloom-filter-enabled.column` and `write.orc.bloom.filter.columns` with native parquet and orc options. 
   
   We can create a new issue to unify these options. 
   In the new issue, we discuss a new option of no parquet and orc tag like `write.rowgroup.bloom-filter.columns` or `write.rowgroup.bloom-filter-enabled.column` to unify file format options. 
   Maybe ·write.parquet.bloom-filter-enabled.column = id, name· or `[id, name]` is more elegant than `write.rowgroup.bloom-filter.columns.id = true`.
   
   PS: 
   I find the Trino iceberg connector use the style like `orc_bloom_filter_columns = [id, name]`
   https://trino.io/docs/current/connector/iceberg.html#iceberg-table-properties
   
   I'm happy to diver this.
   @kbendick What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org