You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Roman Karlstetter (Jira)" <ji...@apache.org> on 2021/06/09 15:46:00 UTC

[jira] [Updated] (ARROW-13024) [C++][Parquet] Decoding byte stream split encoded columns fails when parquet file has nulls

     [ https://issues.apache.org/jira/browse/ARROW-13024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Roman Karlstetter updated ARROW-13024:
--------------------------------------
    Summary: [C++][Parquet] Decoding byte stream split encoded columns fails when parquet file has nulls  (was: Decoding byte stream split encoded parquet columns fails when file has nulls)

> [C++][Parquet] Decoding byte stream split encoded columns fails when parquet file has nulls
> -------------------------------------------------------------------------------------------
>
>                 Key: ARROW-13024
>                 URL: https://issues.apache.org/jira/browse/ARROW-13024
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Parquet
>    Affects Versions: 2.0.0, 3.0.0, 4.0.0
>            Reporter: Roman Karlstetter
>            Priority: Major
>
> Reading from a parquet file fails with the following error
> {{Data size too small for number of values (corrupted file?)}}.
> This happens for the case when there is a {{BYTE_STREAM_SPLIT}}-encoded column which has less values stored than number of rows, which is the case when the column has null values (definition levels are present).
> The problematic part is the condition checked in {{ByteStreamSplitDecoder<DType>::SetData}}, which raises the error if the number of values does not match the size of the data array.
> I'm unsure whether I have enough experience with the internals of the encoding/decoding part of this implementation to fix this issue, but my suggestion would be to initialize {{num_values_in_buffer_}} with {{len/static_cast<int64_t>(sizeof(T))}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)