You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/09/15 07:24:00 UTC

[jira] [Commented] (IMPALA-11580) Memory leak in legacy catalog mode when applying incremental partition updates

    [ https://issues.apache.org/jira/browse/IMPALA-11580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17605160#comment-17605160 ] 

ASF subversion and git services commented on IMPALA-11580:
----------------------------------------------------------

Commit cfd79b40beab86f08ad72e0bea41eabf736d0a99 in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=cfd79b40b ]

IMPALA-11580: Fix memory leak in legacy catalog mode when applying incremental partition updates

In the legacy catalog mode, catalogd propagates incremental metadata
updates at the partition level. While applying the updates, impalad
reuses the existing partition objects and moves them to a new HdfsTable
object. However, the partition objects are immutable, which means their
reference to the old table object remains unchanged. JVM GC cannot
collect the stale table objects since they still have active reference
from the partitions, which results in memory leak.

This patch fixes the issue by recreating a new partition object based on
the existing partition object with the new table field.

Tests:
 - Verified locally that after applying the patch, I don’t see the
   number of live HdfsTable objects keeps bumping.

Change-Id: Ie04ff243c6b82c1a06c489da74353f2d8afe423a
Reviewed-on: http://gerrit.cloudera.org:8080/18978
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>


> Memory leak in legacy catalog mode when applying incremental partition updates
> ------------------------------------------------------------------------------
>
>                 Key: IMPALA-11580
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11580
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 4.0.0, Impala 4.1.0
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> Since IMPALA-3127, catalogd propagates incremental metadata updates in partition level. In the legacy catalog mode, while applying the updates, impalad reuses the existing partition objects and move them to a new HdfsTable object. However, the partition objects are immutable, which means their reference to the old table object remain unchanged. JVM cannot collect the stale table objects since they still have active reference from the partitions.
> To reproduce the issue, create a partitioned table and add new partitions to it in a rate closer to the catalog update frequency (2s by default):
> {code:sql}
> impala-shell> drop table if exists my_part_tbl;
> impala-shell> create external table my_part_tbl (id int) partitioned by (p int) stored as textfile;
> {code}
> Add a partition every 2s:
> {code:bash}
> for i in `seq 1000`; do impala-shell.sh -q "alter table my_part_tbl add partition (p=$i)"; sleep 2; done
> {code}
> Then monitor the live table objects in impalad JVM:
> {code:bash}
> for p in `pidof impalad`; do echo PID=$p; jmap -histo:live $p | grep 'org.apache.impala.catalog.HdfsTable$'; done
> {code}
> You can see that only one impalad has the value unchanged. The number in the other 2 impalads keep bumping.
> {noformat}
> $ for p in `pidof impalad`; do echo PID=$p; jmap -histo:live $p | grep 'org.apache.impala.catalog.HdfsTable$'; done
> PID=27677
>  136:            14           3360  org.apache.impala.catalog.HdfsTable
> PID=27671
>  136:            14           3360  org.apache.impala.catalog.HdfsTable
> PID=27668
>  474:             1            240  org.apache.impala.catalog.HdfsTable
> $ for p in `pidof impalad`; do echo PID=$p; jmap -histo:live $p | grep 'org.apache.impala.catalog.HdfsTable$'; done
> PID=27677
>  113:            21           5040  org.apache.impala.catalog.HdfsTable
> PID=27671
>  113:            21           5040  org.apache.impala.catalog.HdfsTable
> PID=27668
>  474:             1            240  org.apache.impala.catalog.HdfsTable
> {noformat}
> This only happens in the legacy catalog mode and doesn't occur in the local-catalog mode. To workaround this, use the startup flag {{--enable_incremental_metadata_updates=false}} in catalogd to disable incremental catalog updates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org