You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by "Csaba Ringhofer (Jira)" <ji...@apache.org> on 2021/02/16 13:46:00 UTC

[jira] [Created] (PARQUET-1981) Consider adding BloomFilterHeader to ColumnMetaData

Csaba Ringhofer created PARQUET-1981:
----------------------------------------

             Summary: Consider adding BloomFilterHeader to ColumnMetaData
                 Key: PARQUET-1981
                 URL: https://issues.apache.org/jira/browse/PARQUET-1981
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-format
            Reporter: Csaba Ringhofer


Currently ColumnMetaData only contains bloom_filter_offset, which points to BloomFilterHeader followed by the bloom filter data.

This solution is not optimal during reading, as two IO reads are needed once we know bloom_filter_offset - one to read the header, which contains the size of the bloom filter, then another to read the actual bloom filter to a buffer. Having the size near bloom_filter_offset would allow to do this in a single read.

Having algorithm/hash/compression could be also useful by allowing skipping the read of the bloom filter if one of those parameters is not supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)