You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Peter Bacsko (JIRA)" <ji...@apache.org> on 2018/09/25 12:29:00 UTC

[jira] [Commented] (MAPREDUCE-7130) Rumen crashes trying to handle MRAppMaster recovery events

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627266#comment-16627266 ] 

Peter Bacsko commented on MAPREDUCE-7130:
-----------------------------------------

I've taken a deeper look at this problem, I'm no longer sure that my change is related. So we have enum value "SUCCESS" in the Pre21JobHistoryConstants (which is super-old, it refers to pre-Hadoop 0.21, which was released many years ago) but the jhist file contains SUCCEEDED. However, my patch did not change that.

The string SUCCEEDED comes from {{JobStateInternal}} enum: https://github.com/apache/hadoop/blob/e0f6ffdbad6f43fd43ec57fb68ebf5275b8b9ba0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/JobStateInternal.java#L27

This file was last modified in 2013.

The {{JobUnsuccessfulCompletionEvent}} is generated in {{JobImpl}}: https://github.com/apache/hadoop/blob/e0f6ffdbad6f43fd43ec57fb68ebf5275b8b9ba0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java#L1695-L1711

The critical point here is {{finalState.toString()}}. 

We could be dealing with a special case here - if the job had failed, the state would have been recorded as {{FAILED}} and this enum is present inside {{Pre21JobHistoryConstants}}. I think a simple fix is to add {{SUCCEEDED}} to this class.  I'm not really sure that spending more time on this is really worth it. I suppose this is your workaround as well.

[~jlowe] what do you think about this?

> Rumen crashes trying to handle MRAppMaster recovery events
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-7130
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7130
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Jonathan Bender
>            Priority: Minor
>
> In the event of an MRAppMaster recovery, the Job History file gets an event of the following form:
> {code:json}
> {"type":"JOB_KILLED","event":\{"org.apache.hadoop.mapreduce.jobhistory.JobUnsuccessfulCompletion":{"jobid":"job_1532048817013_xxxx","finishTime":1534521962641,"finishedMaps":0,"finishedReduces":0,"jobStatus":"SUCCEEDED","diagnostics":{"string":"Job commit succeeded in a prior MRAppMaster attempt before it crashed. Recovering."},"failedMaps":0,"failedReduces":0,"killedMaps":0,"killedReduces":0}}}
> {code}
> The issue seems to be around the SUCCEEDED job status for a JobUnsuccessfulCompletion:
> https://github.com/apache/hadoop/blob/e0f6ffdbad6f43fd43ec57fb68ebf5275b8b9ba0/hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/JobBuilder.java#L609
> Which fails to find the enum here:
> https://github.com/apache/hadoop/blob/e0f6ffdbad6f43fd43ec57fb68ebf5275b8b9ba0/hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/Pre21JobHistoryConstants.java#L50
> I'm not sure if this is an error with the Rumen parser or if the job history file is getting into an invalid state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org