You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2022/08/15 00:38:00 UTC

[jira] [Resolved] (IMPALA-11489) Async IO cannot handle >2GB ORC files

     [ https://issues.apache.org/jira/browse/IMPALA-11489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang resolved IMPALA-11489.
-------------------------------------
     Fix Version/s: Impala 4.2.0
    Target Version: Impala 4.2.0
        Resolution: Fixed

Thank [~csringhofer]! I'll also port this to 4.1.1

> Async IO cannot handle >2GB ORC files
> -------------------------------------
>
>                 Key: IMPALA-11489
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11489
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Csaba Ringhofer
>            Assignee: Csaba Ringhofer
>            Priority: Major
>             Fix For: Impala 4.2.0
>
>
> We assume that the size fits to an int:
> https://github.com/apache/impala/blob/308fda110758b0fc58e5b1f477d635aac29aea75/be/src/exec/hdfs-orc-scanner.cc#L253
> If the size overflows, then we can incorrectly hit the following error check (this check is meant to avoid crashing due to corrupt metadata). I see no other ways this could cause problems, if the catch still succeeds (because the overflow led to a valid looking length), then the data will be read correctly.
> This looks like a trivial fix, but I am concerned about lack of testing of >2GB files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)