You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@eagle.apache.org by "Jayesh (JIRA)" <ji...@apache.org> on 2017/07/20 17:53:07 UTC

[jira] [Updated] (EAGLE-920) mr failed job trouble shooting

     [ https://issues.apache.org/jira/browse/EAGLE-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jayesh updated EAGLE-920:
-------------------------
    Fix Version/s:     (was: v0.5.0)
                   v0.5.1

> mr failed job trouble shooting
> ------------------------------
>
>                 Key: EAGLE-920
>                 URL: https://issues.apache.org/jira/browse/EAGLE-920
>             Project: Eagle
>          Issue Type: Improvement
>          Components: App::Job Performance Monitor
>    Affects Versions: v0.5.0
>            Reporter: wujinhu
>            Assignee: wujinhu
>             Fix For: v0.5.1
>
>
> We will follow below steps when we find a failed mr job.
> 1. get error category distribution of the job via api
> query=TaskAttemptErrorCategoryService[@site="sandbox" and @jobId="job_1486726244016_162594"]<@errorCategory>{count}
> 2. get error category - error message mapping and failed task attempts list
> query=JobErrorMappingService[@site="sandbox" and @jobId="job_1486726244016_162594" and @errorCategory="java.lang.RuntimeException"]
> 3. dive into one task attempt
> query=TaskAttemptExecutionService[@site="sandbox" and @taskAttemptId="attempt_1486726244016_162594_m_002451_1"]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)