You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2019/12/09 03:55:00 UTC

[jira] [Updated] (IMPALA-9175) Revisit the error handling logics in ORC scanner

     [ https://issues.apache.org/jira/browse/IMPALA-9175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang updated IMPALA-9175:
-----------------------------------
    Description: 
This is a task to revisit all the corresponding error handling logics in the ORC scanner comparing to the Parquet scanner. For each kind of error handling in the parquet scanner, make sure we already handle it in the orc scanner, otherwise create separate JIRAs to handle them.

Also need to make sure whether the exposed error messages are enough for debugging. For instance, one frequently encountered error when Impala has stale metadata of an ORC file is:
{code:java}
Encountered parse error in tail of ORC file hdfs://hadoop2cluster/user/hive-0.13.1/warehouse/bi_ucar.db/alliance_driver_stat_hour_api/dt=2019-08-09/part-00006: Invalid ORC postscript length
{code}
It'd be better to also print the postscript length we read and the file size. So users can know whether the file is corrupt (so need data regeneration) or the metadata is stale (so need refresh). We may need changes in the ORC lib for these.

> Revisit the error handling logics in ORC scanner
> ------------------------------------------------
>
>                 Key: IMPALA-9175
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9175
>             Project: IMPALA
>          Issue Type: Task
>            Reporter: Quanlong Huang
>            Assignee: Norbert Luksa
>            Priority: Blocker
>
> This is a task to revisit all the corresponding error handling logics in the ORC scanner comparing to the Parquet scanner. For each kind of error handling in the parquet scanner, make sure we already handle it in the orc scanner, otherwise create separate JIRAs to handle them.
> Also need to make sure whether the exposed error messages are enough for debugging. For instance, one frequently encountered error when Impala has stale metadata of an ORC file is:
> {code:java}
> Encountered parse error in tail of ORC file hdfs://hadoop2cluster/user/hive-0.13.1/warehouse/bi_ucar.db/alliance_driver_stat_hour_api/dt=2019-08-09/part-00006: Invalid ORC postscript length
> {code}
> It'd be better to also print the postscript length we read and the file size. So users can know whether the file is corrupt (so need data regeneration) or the metadata is stale (so need refresh). We may need changes in the ORC lib for these.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org