You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@helix.apache.org by "Harry Zhang (JIRA)" <ji...@apache.org> on 2018/09/21 21:33:00 UTC
[jira] [Updated] (HELIX-753) Record top state handoff finished in
single cluster data cache refresh
[ https://issues.apache.org/jira/browse/HELIX-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harry Zhang updated HELIX-753:
------------------------------
Summary: Record top state handoff finished in single cluster data cache refresh (was: record top state handoff finished in single cluster data cache refresh)
> Record top state handoff finished in single cluster data cache refresh
> ----------------------------------------------------------------------
>
> Key: HELIX-753
> URL: https://issues.apache.org/jira/browse/HELIX-753
> Project: Apache Helix
> Issue Type: Bug
> Reporter: Harry Zhang
> Assignee: Harry Zhang
> Priority: Major
>
> Currently we are calculating top state handoff duration by doing the following:
> - record missing top state when we see a top state missing
> - record top state come back when we see it come back
> - report top state handoff duration
> This is perfectly fine for non-P2P state transitions as the entire top state handoff process will always finish for >= 2 pipeline runs. However, for P2P enabled clusters, top state handoff are quick, and if it is quicker than cluster data refresh stage latency, we will lose a lot of short top state handoffs, which make the number miserable on ingraph.
> We need to revise top state handoff metrics implementation so we don't lose data point statistically (i.e. we are losing all short handoffs now).
> AC:
> - revise impl so we catch those short top state hand-offs
> - write new tests to catch the fix if needed
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)