You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Jan Finis (Jira)" <ji...@apache.org> on 2023/02/04 11:33:00 UTC
[jira] [Updated] (PARQUET-2238) Spec and parquet-mr disagree on DELTA_BYTE_ARRAY encoding
[ https://issues.apache.org/jira/browse/PARQUET-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jan Finis updated PARQUET-2238:
-------------------------------
Description:
The spec in parquet-format specifies that [DELTA_BYTE_ARRAY is only supported for the physical type BYTE_ARRAY|https://parquet.apache.org/docs/file-format/data-pages/encodings/#delta-length-byte-array-delta_length_byte_array--6]. Yet, [parquet-mr also uses it to encode FIXED_LEN_BYTE_ARRAY|https://github.com/apache/parquet-mr/blob/fd1326a8a56174320ea2f36d7c6c4e78105ab108/parquet-column/src/main/java/org/apache/parquet/column/values/factory/DefaultV2ValuesWriterFactory.java#L83].
So, I guess the spec should be updated to include FIXED_LEN_BYTE_ARRAY in the supported types of DELTA_BYTE_ARRAY encoding, or the code should be changed to no longer write this encoding for FIXED_LEN_BYTE_ARRAY.
was:
The spec in parquet-format specifies that [DELTA_BYTE_ARRAY is only supported for the physical type BYTE_ARRAY|https://parquet.apache.org/docs/file-format/data-pages/encodings/#delta-length-byte-array-delta_length_byte_array--6]. Yet, [parquet-mr also uses it to encode FIXED_LENGTH_BYTE_ARRAY|https://github.com/apache/parquet-mr/blob/fd1326a8a56174320ea2f36d7c6c4e78105ab108/parquet-column/src/main/java/org/apache/parquet/column/values/factory/DefaultV2ValuesWriterFactory.java#L83].
So, I guess the spec should be updated or the code should be changed to no longer write this encoding for FIXED_LENGTH_BYTE_ARRAY.
> Spec and parquet-mr disagree on DELTA_BYTE_ARRAY encoding
> ---------------------------------------------------------
>
> Key: PARQUET-2238
> URL: https://issues.apache.org/jira/browse/PARQUET-2238
> Project: Parquet
> Issue Type: Bug
> Components: parquet-format, parquet-mr
> Reporter: Jan Finis
> Priority: Minor
>
> The spec in parquet-format specifies that [DELTA_BYTE_ARRAY is only supported for the physical type BYTE_ARRAY|https://parquet.apache.org/docs/file-format/data-pages/encodings/#delta-length-byte-array-delta_length_byte_array--6]. Yet, [parquet-mr also uses it to encode FIXED_LEN_BYTE_ARRAY|https://github.com/apache/parquet-mr/blob/fd1326a8a56174320ea2f36d7c6c4e78105ab108/parquet-column/src/main/java/org/apache/parquet/column/values/factory/DefaultV2ValuesWriterFactory.java#L83].
> So, I guess the spec should be updated to include FIXED_LEN_BYTE_ARRAY in the supported types of DELTA_BYTE_ARRAY encoding, or the code should be changed to no longer write this encoding for FIXED_LEN_BYTE_ARRAY.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)