You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/12/01 07:23:00 UTC

[jira] [Commented] (IMPALA-12486) Add catalog metrics for ParallelFileMetadataLoader

    [ https://issues.apache.org/jira/browse/IMPALA-12486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791924#comment-17791924 ] 

ASF subversion and git services commented on IMPALA-12486:
----------------------------------------------------------

Commit 9011b81afa33ef7e4b0ec8a367b2713be8917213 in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9011b81af ]

IMPALA-12486: Add catalog metrics for metadata loading

This patch adds the following catalog metrics which indicate the load on
HDFS for loading file metadata:
 - catalog-server.metadata.file.num-loading-threads: The total size of
   all thread pools used in loading file metadata.
 - catalog-server.metadata.file.num-loading-tasks: The total number of
   unfinished file metadata loading tasks. Each task corresponds to a
   partition.
 - catalog-server.metadata.table.num-loading-file-metadata: The total
   number of tables that are loading file metadata.

Also adds some metrics for metadata loading on all tables. Note that
metadata loading of an HDFS table consists of loading HMS metadata and
HDFS file metadata, etc.
 - catalog-server.metadata.table.num-loading-metadata: The total number
   of tables that are loading metadata.
 - catalog-server.metadata.table.async-loading.num-in-progress: The
   total number of tables that are loading metadata asynchronously. E.g.
   the initial metadata loading triggered by the first access on a
   table.
 - catalog-server.metadata.table.async-loading.queue-len: The total
   number of tables that are waiting for asynchronous loading. If this
   number raises, consider bumping --num_metadata_loading_threads.

Three metrics about the catalog cache are also added:
 - catalog.num-databases
 - catalog.num-tables
 - catalog.num-functions
Note that the first two are also shown in WebUI of coordinators and we
plan to deprecate them and only show them in catalogd's WebUI.

The number of idle and in-use HMS clients is also exposed in this
patch:
 - catalog.hms-client-pool.num-idle
 - catalog.hms-client-pool.num-in-use

Tests
 - Launch catalogd locally with load_catalog_in_background=true and
   verified the metrics.
 - Add e2e tests in tests/webserver/test_web_pages.py

Change-Id: Icef7b123bdcb0f5b8572635eeaacd8294990f9ba
Reviewed-on: http://gerrit.cloudera.org:8080/20673
Reviewed-by: Andrew Sherman <as...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Add catalog metrics for ParallelFileMetadataLoader
> --------------------------------------------------
>
>                 Key: IMPALA-12486
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12486
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> The following metrics will be helpful to understand the load on HDFS triggered by catalogd for loading file metadata:
>  * *num-file-metadata-loading-threads:* The total size of all thread pools used in all ParallelFileMetadataLoader instances.
>  * *num-file-metadata-loading-tasks:* The total number of *unfinished* FileMetadataLoader tasks that submit to the pools.
>  * *num-tables-loading-file-metadata:* The total number of tables that are loading file metadata.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org