You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (JIRA)" <ji...@apache.org> on 2019/03/07 03:09:00 UTC

[jira] [Updated] (IMPALA-7224) UpdateCatalogMetrics very slow when there are many tables

     [ https://issues.apache.org/jira/browse/IMPALA-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang updated IMPALA-7224:
-----------------------------------
    Fix Version/s: Impala 2.13.0

> UpdateCatalogMetrics very slow when there are many tables
> ---------------------------------------------------------
>
>                 Key: IMPALA-7224
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7224
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Major
>             Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> impalad calls UpdateCatalogMetrics after each statement which is considered a DDL. This includes statements like USE, SHOW TABLES, DESCRIBE, etc, which don't actually change the number of tables in the catalog, and therefore probably don't need to update metrics. That aside, even when the metrics _do_ need to be updated, the implementation is very slow. It calls getTableNames on each database, which results in (a) creating an array of all the names, (b) sorting that array and (c) encoding/decoding that whole array into Thrift. This is very expensive: on a use case with approximately 8M tables, each such call takes 10-12 seconds of CPU, most of which is spent in sorting and encoding. All that's really needed is a _count_ of tables, which could be fetched directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org