You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/19 09:48:44 UTC

[GitHub] [iceberg] Zhangg7723 opened a new issue, #4813: [FEATURE REQUEST] The Bloom Filter for Parquet formats is necessary

Zhangg7723 opened a new issue, #4813:
URL: https://github.com/apache/iceberg/issues/4813

   Bloom filter in data files is useful in Iceberg reader and writer, we can also use it for delete compaction optimization, but there are no properties for the Parquet format in Iceberg, I found PRs  #2582 #2642 #2643 from @jshmchenxi , these PRs are still open.
   whatβ€˜s the plan about this feature? @rdblue @openinx @jshmchenxi 。
   
   Thanks
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #4813: [FEATURE REQUEST] The Bloom Filter for Parquet formats is necessary

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #4813:
URL: https://github.com/apache/iceberg/issues/4813#issuecomment-1133160841

   Hi @Zhangg7723! You are right that bloom filter in the data files will be useful.
   
   It is however somewhat difficult to get right, as a lot of tuning and potentially knowledge of NDV count would need to be known ahead of time (or waste a potentially significant amount of space in the bloom filter).
   
   I can say however, that this issue is being worked on.
   
   @huaxingao from Apple is working on this and has reached out to the original PR author. I believe they are going to merge Apple's code with that of @jshmchenxi. The two of them would know more about it than I would, but
   
   **TLDR** - This is an area of active work and not something that has been forgotten πŸ™‚ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #4813: [FEATURE REQUEST] The Bloom Filter for Parquet formats is necessary

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #4813:
URL: https://github.com/apache/iceberg/issues/4813#issuecomment-1133446713

   Please do reopen this issue if you think that's necessary. I'm just looking to close issues that have been addressed and I'd think any additional optimizations that need _extra_ code should be considered after the new implementation is opened (and  ideally in a new issue so as not to bug too many folks who were tagged here πŸ˜… ).
   
   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick closed issue #4813: [FEATURE REQUEST] The Bloom Filter for Parquet formats is necessary

Posted by GitBox <gi...@apache.org>.
kbendick closed issue #4813: [FEATURE REQUEST] The Bloom Filter for Parquet formats is necessary
URL: https://github.com/apache/iceberg/issues/4813


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #4813: [FEATURE REQUEST] The Bloom Filter for Parquet formats is necessary

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #4813:
URL: https://github.com/apache/iceberg/issues/4813#issuecomment-1133445982

   Thank you @huaxingao! I will close this issue for now to reduce noise amongst the issues as this work is definitely on going.
   
   Feel free to open a new issue regardings the delete compaction optimization etc @Zhangg7723, but I would suggest to wait until the new PR from Huaxing and Xi is up as anything we do will have to take that into consideration.
   
   I know Huaxing said that hopefully the PR would be up today, but I would ask that you please give them another week or so at least.
   
   Huaxing is really amazing at what she does (and I'm sure Xi is too), but there's always something that comes up. And open source always has extra things to consdier.
   
   But this feature is very actively being worked on and should be under review in a very reasonable time frame. This is a priority for many people. πŸ™‚ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] huaxingao commented on issue #4813: [FEATURE REQUEST] The Bloom Filter for Parquet formats is necessary

Posted by GitBox <gi...@apache.org>.
huaxingao commented on issue #4813:
URL: https://github.com/apache/iceberg/issues/4813#issuecomment-1133168823

   I am collaborating with @jshmchenxi and will submit the new PR soon (hopefully today).
   cc @RussellSpitzer @aokolnychyi @flyrain @szehon-ho 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org