You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Jorge Leitão (Jira)" <ji...@apache.org> on 2022/08/18 21:06:00 UTC

[jira] [Created] (ARROW-17465) [Parquet] DELTA_BINARY_PACKED constraint on num_bits is too restrict?

Jorge Leitão created ARROW-17465:
------------------------------------

             Summary: [Parquet] DELTA_BINARY_PACKED constraint on num_bits is too restrict?
                 Key: ARROW-17465
                 URL: https://issues.apache.org/jira/browse/ARROW-17465
             Project: Apache Arrow
          Issue Type: Bug
            Reporter: Jorge Leitão


Consider the sequence of (int32) values

[863490391,-816295192,1613070492,-1166045478,1856530847]

This sequence can be encoded as a single block, single miniblock with a bit_width of 33.

However, we currently require [1] the bit_width of each miniblock to be smaller than the bitwidth of the type it encodes.

We could consider lifting this constraint, as, as shown in the example above, the values representation's `bit_width` can be smaller than the delta's representation's `bit_width`.

[1] https://github.com/apache/arrow/blob/a376968089d7310f4a88d054822fa1eaf96c46f5/cpp/src/parquet/encoding.cc#L2173



--
This message was sent by Atlassian Jira
(v8.20.10#820010)