You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/09/14 16:10:00 UTC

[jira] [Commented] (IMPALA-7448) Periodically evict recently unused table from catalogd

    [ https://issues.apache.org/jira/browse/IMPALA-7448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615027#comment-16615027 ] 

ASF subversion and git services commented on IMPALA-7448:
---------------------------------------------------------

Commit 49095c7e8b8ba1f8e69a68f15a322cc1ead13b7e in impala's branch refs/heads/master from [~tianyiwang]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=49095c7 ]

IMPALA-7448: Invalidate recently unused tables from catalogd

This patch implements an automatic invalidation mechanism in catalogd.
There are two invalidation strategies:
1. Periodically the HDFS tables that are not used in a configured
   period "invalidate_tables_timeout_s" is invalidated from catalogd.
2. If the old GC generation is almost full, a certain percentage of LRU
   tables are invalidated. This can be enabled by backend flag
   "invalidate_tables_on_memory_pressure".

The table usage is reported by impalad to catalogd when the tables are
used during planning.
Tests on time-based invalidation are added. It is manually verified that
the GC callback is called if strings are randomly stuffed into catalogd.

Change-Id: Ib549717abefcffb14d9a3814ee8cf0de8bd49e89
Reviewed-on: http://gerrit.cloudera.org:8080/11224
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Tianyi Wang <tw...@cloudera.com>


> Periodically evict recently unused table from catalogd
> ------------------------------------------------------
>
>                 Key: IMPALA-7448
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7448
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 3.1.0
>            Reporter: Tianyi Wang
>            Assignee: Tianyi Wang
>            Priority: Major
>
> To limit the memory consumption of catalog, we should experiment with a mechanism automatically evicting recently unused tables from catalogd. Initial design:
> - impalad to report periodically/asynchronously the set of catalog objects that were accessed
> - catalogd to record some kind of last access time
> - catalogd to have some facility to scan over all catalog objects, collect some number of not-recently-used ones (eg to reach a target amount of evicted memory), and issue invalidate commands to itself
> - no need to have exact LRU behavior -- to simplify, we probably shouldn't try to do a classical LRU linked list between all catalog objects.
> - initial patch probably just triggered manually. Discussed either running this on a schedule or running this based on JMX GC notifications if we see that the catalogd finished an old-gen GC and the old gen is more than some target percentage full.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org