You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@eagle.apache.org by "wujinhu (JIRA)" <ji...@apache.org> on 2017/02/22 03:55:44 UTC

[jira] [Created] (EAGLE-920) mr failed job trouble shooting

wujinhu created EAGLE-920:
-----------------------------

             Summary: mr failed job trouble shooting
                 Key: EAGLE-920
                 URL: https://issues.apache.org/jira/browse/EAGLE-920
             Project: Eagle
          Issue Type: Improvement
          Components: App::Job Performance Monitor
    Affects Versions: v0.5.0
            Reporter: wujinhu
            Assignee: wujinhu
             Fix For: v0.5.0


We will follow below steps when we find a failed mr job.
1. get error category distribution of the job via api
query=TaskAttemptErrorCategoryService[@site="sandbox" and @jobId="job_1486726244016_162594"]<@errorCategory>{count}
2. get error category - error message mapping and failed task attempts list
query=JobErrorMappingService[@site="sandbox" and @jobId="job_1486726244016_162594" and @errorCategory="java.lang.RuntimeException"]
3. dive into one task attempt
query=TaskAttemptExecutionService[@site="sandbox" and @taskAttemptId="attempt_1486726244016_162594_m_002451_1"]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)