You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2020/07/12 21:15:00 UTC

[jira] [Resolved] (PARQUET-1882) [C++] Writing an all-null column and then reading it with buffered_stream aborts the process

     [ https://issues.apache.org/jira/browse/PARQUET-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney resolved PARQUET-1882.
-----------------------------------
    Fix Version/s: cpp-1.6.0
       Resolution: Fixed

Issue resolved by pull request 7718
[https://github.com/apache/arrow/pull/7718]

> [C++] Writing an all-null column and then reading it with buffered_stream aborts the process
> --------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-1882
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1882
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>         Environment: Windows 10 64-bit, MSVC
>            Reporter: Eric Gorelik
>            Assignee: Micah Kornfield
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: cpp-1.6.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When writing a column unbuffered that contains only nulls, a 0-byte dictionary page gets written. When then reading the resulting file with buffered_stream enabled, the column reader gets the length of the page (which is 0), and then tries to read that many bytes from the underlying input stream.
> parquet/column_reader.cc, SerializedPageReader::NextPage
>  
> {code:java}
> int compressed_len = current_page_header_.compressed_page_size;
> int uncompressed_len = current_page_header_.uncompressed_page_size;
> // Read the compressed data page.
> std::shared_ptr<Buffer> page_buffer;
> PARQUET_THROW_NOT_OK(stream_->Read(compressed_len, &page_buffer));{code}
>  
> BufferedInputStream::Read, however, has an assertion that the bytes to read is strictly positive, so the assertion fails and aborts the process.
> arrow/io/buffered.cc, BufferedInputStream::Impl
>  
> {code:java}
> Status Read(int64_t nbytes, int64_t* bytes_read, void* out) {        
>   ARROW_CHECK_GT(nbytes, 0);
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)