You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Zoltán Borók-Nagy (Jira)" <ji...@apache.org> on 2020/03/06 17:08:00 UTC

[jira] [Created] (IMPALA-9470) Use Parquet bloom filters

Zoltán Borók-Nagy created IMPALA-9470:
-----------------------------------------

             Summary: Use Parquet bloom filters
                 Key: IMPALA-9470
                 URL: https://issues.apache.org/jira/browse/IMPALA-9470
             Project: IMPALA
          Issue Type: New Feature
            Reporter: Zoltán Borók-Nagy


PARQUET-41 has been closed recently. That means Parquet-MR is capable of writing and reading bloom filters.

Currently bloom filters per column chunk entries, this means with their help we can filter out entire row groups.

We already filter row groups in HdfsParquetScanner::NextRowGroup() based on column chunk statistics and dictionaries. Skipping row groups based on bloom filters could be also added to this funciton.

Impala could also write bloom filters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org