You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@impala.apache.org by "Csaba Ringhofer (Jira)" <ji...@apache.org> on 2022/08/09 15:22:00 UTC

[jira] [Created] (IMPALA-11489) Async IO cannot handle >2GB ORC files

Csaba Ringhofer created IMPALA-11489:
----------------------------------------

             Summary: Async IO cannot handle >2GB ORC files
                 Key: IMPALA-11489
                 URL: https://issues.apache.org/jira/browse/IMPALA-11489
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
            Reporter: Csaba Ringhofer


We assume that the size fits to an int:
https://github.com/apache/impala/blob/308fda110758b0fc58e5b1f477d635aac29aea75/be/src/exec/hdfs-orc-scanner.cc#L253

If the size overflows, then we can incorrectly hit the following error check (this check is meant to avoid crashing due to corrupt metadata). I see no other ways this could cause problems, if the catch still succeeds (because the overflow led to a valid looking length), then the data will be read correctly.

This looks like a trivial fix, but I am concerned about lack of testing of >2GB files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)