You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/09/08 22:12:00 UTC

[jira] [Commented] (IMPALA-11490) More metrics to debug event processing lagging behind

    [ https://issues.apache.org/jira/browse/IMPALA-11490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17602024#comment-17602024 ] 

ASF subversion and git services commented on IMPALA-11490:
----------------------------------------------------------

Commit bc92661bd3105cb378a3d140e247207959916d16 in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bc92661bd ]

IMPALA-11490: Add more metrics for event processor

This patch adds more metrics to debug event processing lagging behind.
The latest event id in HMS is added so users can compare it with the
last synced event id to know how many events are waiting to be synced.
The event time of the last synced event and latest event in HMS are also
added. Users can compare them to know how long catalogd is lagging
behind. The update of the latest event id and event time are done in a
dedicated thread in case the event-processor thread is blocked by slow
metadata reloading or waiting for table locks.

This patch also fixes the wrong metrics on events fetching and
processing duration. Previously the method we used is
Timer.getMeanRate() which returns the mean rate at which the duration is
recorded. The correct method should be Timer.getSnapshot().getMean(). By
getting the snapshot, we can also expose metrics of the 75th/95th/99th
percentiles.

To facilitate metrics collection, the last durations of events fetching
and processing are also exposed.

Tests:
 - Manually verified the metrics when running some Hive workloads

Change-Id: I0e7d40a0d8e140e6b0698936e97b454cb9abdc1b
Reviewed-on: http://gerrit.cloudera.org:8080/18937
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> More metrics to debug event processing lagging behind
> -----------------------------------------------------
>
>                 Key: IMPALA-11490
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11490
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>              Labels: supportability
>
> Event processor could lag behind in many cases, e.g. processing lots events on large tables, waiting for table locks held by manual refresh or other metadata operations, etc.
> Currently we have metric on the last synced event id. We should also add metric on the latest event id in HMS. Users can compare them to know whether event processing is lagging behind.
> We should also add logs/metrics on tables that take long time in event processing, especially those longer than the event polling interval. So users can decide whether to disable event processing on them, or reduce concurrency of metadata operations on them.
> Some metrics like average events processing duration in the last 5min, 30min or 1h will also be helpful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org