You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by "Norbert Luksa (Jira)" <ji...@apache.org> on 2020/02/14 14:19:00 UTC

[jira] [Created] (ORC-601) Add more debug info to error messages in the scanner

Norbert Luksa created ORC-601:
---------------------------------

             Summary: Add more debug info to error messages in the scanner
                 Key: ORC-601
                 URL: https://issues.apache.org/jira/browse/ORC-601
             Project: ORC
          Issue Type: Improvement
            Reporter: Norbert Luksa


There are some exceptions which would be easier to debug if we had more debug info at hand. For instance, one frequently encountered error when Impala has stale metadata of an ORC file is:
{code:java}
Invalid ORC postscript length
{code}
It'd be better to also print the postscript length we read and the file size, so users can know whether the file is corrupt (so need data regeneration) or the metadata is stale (so need refresh).

Also, there are some cases where the same kind of exception results in different messages, eg. in the ColumnReader.cc [Decimal64ColumnReader::readBuffer|https://github.com/apache/orc/blob/master/c%2B%2B/src/ColumnReader.cc#L417] throws {code:c++}ParseError("bad read in DoubleColumnReader::next()");{code} on failing to read from the stream while [DoubleColumnReader::readByte|https://github.com/apache/orc/blob/master/c%2B%2B/src/ColumnReader.cc#L1401] throws {code:c++}ParseError("Read past end of stream in Decimal64ColumnReader " + valueStream->getName());{code}
It would be nice to unify these.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)