You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/04/30 02:44:00 UTC

[jira] [Created] (IMPALA-10686) Add threshold for catalogd to give up loading large tables

Quanlong Huang created IMPALA-10686:
---------------------------------------

             Summary: Add threshold for catalogd to give up loading large tables
                 Key: IMPALA-10686
                 URL: https://issues.apache.org/jira/browse/IMPALA-10686
             Project: IMPALA
          Issue Type: New Feature
            Reporter: Quanlong Huang


Catalogd could hit the 2GB array size limit of JVM when serializing a large HdfsTable object, which throws an OOM error. Although catalogd sends partition metadata individually outside the table object in catalog updates after IMPALA-3127, it still sends the whole table object in DDL/DML/Refresh responses. IMPALA-9937 aims to fix this. But we tend to fix the problem via local catalog mode. So IMPALA-9937 is in low priority.

Due to this, it would be helpful for users that still using the legacy catalog mode, to have a configurable threshold to avoid catalogd loading metadata of a large table.

We can provide thresholds in number of partitions/files or the estimated metadata size of the whole table. Catalogd should give up loading the table metadata if any of them exceeds the threshold.

Note that a simpler workaround is using the {{--blacklisted_dbs}} and {{--blacklisted_tables}} flags to disable such kinds of tables directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org