You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/01/16 05:02:00 UTC

[jira] [Resolved] (IMPALA-6383) Memory from previous row groups can accumulate in Parquet scanner

     [ https://issues.apache.org/jira/browse/IMPALA-6383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-6383.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.12.0

IMPALA-6383: free memory after skipping parquet row groups

Before this patch, resources were only flushed after breaking out of
NextRowGroup(). This is a problem because resources can be allocated
for skipped row groups (e.g. for reading dictionaries).

Testing:
Tested in conjunction with a prototype buffer pool patch that was
DCHECKing before the change.

Added DCHECKs to the current version to ensure the streams are cleared
up as expected.

Change-Id: Ibc2f8f27c9b238be60261539f8d4be2facb57a2b
Reviewed-on:

[http://gerrit.cloudera.org:8080/9002]


Reviewed-by: Tim Armstrong <

[tarmstrong@cloudera.com|mailto:tarmstrong@cloudera.com]

>
Tested-by: Impala Public Jenkins

> Memory from previous row groups can accumulate in Parquet scanner
> -----------------------------------------------------------------
>
>                 Key: IMPALA-6383
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6383
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 2.12.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>              Labels: parquet, resource-management
>             Fix For: Impala 2.12.0
>
>
> I ran across this bug when working on porting scanners to the new buffer pool. Before that the only symptom of the failures was excessive memory consumption, but with the reservations they become easy-to-detect hard failures.
> The problem is in HdfsParquetScanner::NextRowGroup(), which calls InitColumns() on column readers, which starts scans, which allocate memory. The problem is that, if the row group is skipped because of dictionary predicates or some other error, the scans aren't cancelled and the I/O buffers aren't releated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)