You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2017/01/25 17:56:26 UTC

[jira] [Commented] (PARQUET-843) [C++] Impala unable to read files created by parquet-cpp

    [ https://issues.apache.org/jira/browse/PARQUET-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838241#comment-15838241 ] 

Wes McKinney commented on PARQUET-843:
--------------------------------------

I was able to get an error trace:

{code}
I0125 12:54:51.033644  7786 status.cc:50] File hdfs://localhost:20500/tmp/parquet-test-1/example.parquet corrupt. RLE level data bytes = -2011166459
    @     0x7f82276ad2a2  impala::Status::Status()
    @     0x7f82273d933a  impala::HdfsParquetScanner::LevelDecoder::Init()
    @     0x7f82273de478  impala::HdfsParquetScanner::BaseScalarColumnReader::ReadDataPage()
    @     0x7f82273de9e8  impala::HdfsParquetScanner::BaseScalarColumnReader::NextPage()
    @     0x7f82273ea6fd  impala::HdfsParquetScanner::BaseScalarColumnReader::NextLevels<>()
    @     0x7f82273e2781  impala::HdfsParquetScanner::ProcessSplit()
    @     0x7f82273a5866  impala::HdfsScanNode::ProcessSplit()
    @     0x7f82273a630b  impala::HdfsScanNode::ScannerThread()
    @     0x7f82250e4b87  impala::Thread::SuperviseThread()
    @     0x7f82250e5564  boost::detail::thread_data<>::run()
    @           0x6133fa  thread_proxy
    @     0x7f8224de2184  start_thread
    @     0x7f822215237d  (unknown)
I0125 12:54:51.038568  7786 status.cc:50] Could not read definition level, even though metadata states there are 29 values remaining in data page. file=hdfs://localhost:20500/tmp/parquet-test-1/example.parquet
    @     0x7f82276ad2a2  impala::Status::Status()
    @     0x7f82273e8e19  impala::HdfsParquetScanner::BaseScalarColumnReader::SetLevelError()
    @     0x7f82273ea6b5  impala::HdfsParquetScanner::BaseScalarColumnReader::NextLevels<>()
    @     0x7f82273e2781  impala::HdfsParquetScanner::ProcessSplit()
    @     0x7f82273a5866  impala::HdfsScanNode::ProcessSplit()
    @     0x7f82273a630b  impala::HdfsScanNode::ScannerThread()
    @     0x7f82250e4b87  impala::Thread::SuperviseThread()
    @     0x7f82250e5564  boost::detail::thread_data<>::run()
    @           0x6133fa  thread_proxy
    @     0x7f8224de2184  start_thread
    @     0x7f822215237d  (unknown)
I0125 12:54:51.038578  7786 runtime-state.
{code}

> [C++] Impala unable to read files created by parquet-cpp
> --------------------------------------------------------
>
>                 Key: PARQUET-843
>                 URL: https://issues.apache.org/jira/browse/PARQUET-843
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>            Reporter: Wes McKinney
>            Priority: Blocker
>         Attachments: example.parquet
>
>
> See attached example file. parquet-tools is able to read this. I have only tested on Impala 2.5.0, with some effort I could check on newer Impala, but it would be good to figure out what is the issue with older versions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)