You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/01/21 21:12:00 UTC

[jira] [Resolved] (IMPALA-5896) Add safeguards to Catalog

     [ https://issues.apache.org/jira/browse/IMPALA-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-5896.
-----------------------------------
    Resolution: Won't Fix

The local catalog changes should address a lot of these problems. There hasn't been any movement on this anyway so gonna close it out.

> Add safeguards to Catalog
> -------------------------
>
>                 Key: IMPALA-5896
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5896
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
>            Reporter: Juan Yu
>            Priority: Major
>              Labels: supportability, usability
>
> Although we are actively working on improve metadata cache usage and efficiency. the data may grow faster. Without any safeguard to protect catalog server, we might hit catalog limit sooner or later.
> Here are Some suggestions:
> 1. Stop loading more metadata after loading large amount of files/blocks for a single table to avoid exceed 2GB java array limit.
>    Note: A proper limit is hard to define. the metadata size also depends on #partitions and if incremental stats metadata present. but 5~6M seems a conservative and large enough limit for most of use cases (consider the best practices for DN is no more than 1M blocks per node).
> 2. Reject new metadata ops (e.g. loading metadata for a new table) if metadata cache heap usage is close to mem_limit setting to avoid OOM. This usage should not include the objects that can be GCed.
> This change will bring wrong result or table not get updated
> We need to make sure show proper warnings to user that there is a problem and they need to fix those metadata issue asap.
> e.g table *** doesn't have updated metadata or have incomplete metadata due to memory limit.
> Not enough memory to load table ***, please remove some unused tables from cache by "invalidate metadata <table>"
> More diagnose information that could help debugging catalog issues:
> - Which table has caused the crash?
> - Which table or how many partitions from a specific table we need to remove to get it back up again?
> - Given a huge topic size, what is the contribution of each of the tables to it?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)