You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Jorge Leitão (Jira)" <ji...@apache.org> on 2022/08/18 21:06:00 UTC
[jira] [Created] (ARROW-17465) [Parquet] DELTA_BINARY_PACKED constraint on num_bits is too restrict?
Jorge Leitão created ARROW-17465:
------------------------------------
Summary: [Parquet] DELTA_BINARY_PACKED constraint on num_bits is too restrict?
Key: ARROW-17465
URL: https://issues.apache.org/jira/browse/ARROW-17465
Project: Apache Arrow
Issue Type: Bug
Reporter: Jorge Leitão
Consider the sequence of (int32) values
[863490391,-816295192,1613070492,-1166045478,1856530847]
This sequence can be encoded as a single block, single miniblock with a bit_width of 33.
However, we currently require [1] the bit_width of each miniblock to be smaller than the bitwidth of the type it encodes.
We could consider lifting this constraint, as, as shown in the example above, the values representation's `bit_width` can be smaller than the delta's representation's `bit_width`.
[1] https://github.com/apache/arrow/blob/a376968089d7310f4a88d054822fa1eaf96c46f5/cpp/src/parquet/encoding.cc#L2173
--
This message was sent by Atlassian Jira
(v8.20.10#820010)