You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/01/10 22:35:00 UTC

[jira] [Created] (IMPALA-6383) Memory from previous row groups can accumulate in Parquet scanner

Tim Armstrong created IMPALA-6383:
-------------------------------------

             Summary: Memory from previous row groups can accumulate in Parquet scanner
                 Key: IMPALA-6383
                 URL: https://issues.apache.org/jira/browse/IMPALA-6383
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 2.12.0
            Reporter: Tim Armstrong
            Assignee: Tim Armstrong


I ran across this bug when working on porting scanners to the new buffer pool. Before that the only symptom of the failures was excessive memory consumption, but with the reservations they become easy-to-detect hard failures.

The problem is in HdfsParquetScanner::NextRowGroup(), which calls InitColumns() on column readers, which starts scans, which allocate memory. The problem is that, if the row group is skipped because of dictionary predicates or some other error, the scans aren't cancelled and the I/O buffers aren't releated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)