You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Ivan Mitic (JIRA)" <ji...@apache.org> on 2013/11/11 16:02:17 UTC

[jira] [Updated] (HADOOP-10090) Jobtracker metrics not updated properly after execution of a mapreduce job

     [ https://issues.apache.org/jira/browse/HADOOP-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Mitic updated HADOOP-10090:
--------------------------------

    Attachment: OneBoxRepro.png

I was able to repro the problem by looking at JMX metrics thru jconsole. See attached screenshot.

The bug appears to be coming from how the JMX metrics cache is managed in MetricsSourceAdapter.java. To be more specific:
a) MetricsSourceAdapter keeps track of the latest metrics records obtained from the call into MetricsSource#getMetrics (see MetricsSourceAdapter#getMetrics)
b) Metrics system timer thread which periodically polls on all sources, only polls for the metrics that have changed (not all metrics)

Scenario from this bug (JobTracker metrics):
 - Let's assume that the timer runs once per minute
 - At time 0, there were 0 jobs_submitted and 0 jobs_completed, and internal JMX cache reflects this state
 - At time t1, MetricsSystem timer thread runs and there were 1 jobs_submitted, and 0 jobs_completed. Internal JMX cache is not updated.
 - At time t1 + 0.5 minutes, above job is completed, and jobs_completed counter is set to 1.
 - At time t1 + 1 minute, MetricsSystem timer thread runs, and only jobs_completed metric changes from 0 to 1. Internal MetricsSourceAdapter {{lastRecs}} contains only jobs_completed metric.
 - At time t1 + 1 minute + delta, JMX is polled for the current metrics, the current cache is updated using the current {{lastRecs}}, and the final outcome is jobs_submitted = 0, jobs_completed = 1.

Since it is not obvious to me what is the intended behavior around metrics reported thru JMX is, I see two possible directions for the fix:
# JMX always reports the latest snapshot obtained by the MetricsSystem timer thread. This means that JMX does not report the fresh/current metrics state. Each time metrics are fetched from the timer thread, internal MetricsSourceAdapter JMX cache would be refreshed to keep the state in sync.
# JMX always reports the current metric values (with small TTL on the internal cache). In this design, MetricsSourceAdapter would (almost) always have to poll on the source for JMX. Additionally, there would have to be means to mark the source as dirty internally when this happens, to guarantee that the next time timer thread polls for metrics, all metrics are returned (see b. from above).

Option 2 is definitely nicer from users perspective as JMX reflects the current in-memory state of servers. 

Please comment back with your thoughts/recommendations.


> Jobtracker metrics not updated properly after execution of a mapreduce job
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-10090
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10090
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 1.2.1
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: OneBoxRepro.png
>
>
> After executing a wordcount mapreduce sample job, jobtracker metrics are not updated properly. Often times the response from the jobtracker has higher number of job_completed than job_submitted (for example 8 jobs completed and 7 jobs submitted). 
> Issue reported by Toma Paunovic.



--
This message was sent by Atlassian JIRA
(v6.1#6144)