You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2016/10/10 15:20:20 UTC

[jira] [Resolved] (PARQUET-739) Rle-decoding uses static buffer that is shared accross threads

     [ https://issues.apache.org/jira/browse/PARQUET-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney resolved PARQUET-739.
----------------------------------
       Resolution: Fixed
    Fix Version/s: cpp-0.1

Issue resolved by pull request 175
[https://github.com/apache/parquet-cpp/pull/175]

> Rle-decoding uses static buffer that is shared accross threads
> --------------------------------------------------------------
>
>                 Key: PARQUET-739
>                 URL: https://issues.apache.org/jira/browse/PARQUET-739
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>            Reporter: Florian Scheibner
>            Assignee: Florian Scheibner
>             Fix For: cpp-0.1
>
>
> Reading two parquet files in parallel lead to a memory corruption that caused a crash. The columns are rle dictionary encoded strings in an uncompressed page, created with parquet-mr. 
> Initial debugging showed that the indices for the dictionary returned by the rle decoder are garbage. So that data page got corrupted in memory. Reading the files in one thread works.
> I have a ColumnReader for each column and read one element from reach column to get a complete row.
> The indices are decoded into one global static buffer. So multiple threads all use the same buffer and overwrite each other's indices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)