You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Peter Bacsko (JIRA)" <ji...@apache.org> on 2018/10/08 10:40:00 UTC

[jira] [Comment Edited] (MAPREDUCE-7130) Rumen crashes trying to handle MRAppMaster recovery events

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641641#comment-16641641 ] 

Peter Bacsko edited comment on MAPREDUCE-7130 at 10/8/18 10:39 AM:
-------------------------------------------------------------------

Thanks [~jlowe]. I think there's only the Pre21 enum and the reason is that Rumen isn't actively maintained. If you look at Github, several files (in fact, the vast majority of them) were last modified 7 years (!) ago. That explains everything.

Just wondering how much effort we should put into this. Adding this extra enum value is perhaps slightly misleading, but it's still the smallest change that fixes the problem. I was thinking of adding this value {{SUCCEEDED}} plus a short comment explaining this situation.


was (Author: pbacsko):
Thanks [~jlowe]. I think there's only the Pre21 enum and the reason is that Rumen isn't actively maintained. If you look at Github, several files (in fact, the vast majority of them) were last modified 7 years (!) ago. That explains everything.

Just wondering how much effort we should put into this. Adding this extra enum is perhaps slightly misleading, but it's still the smallest change. Maybe add this enum plus an extra comment explaining this situation.

> Rumen crashes trying to handle MRAppMaster recovery events
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-7130
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7130
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tools/rumen
>            Reporter: Jonathan Bender
>            Priority: Minor
>
> In the event of an MRAppMaster recovery, the Job History file gets an event of the following form:
> {code:json}
> {"type":"JOB_KILLED","event":\{"org.apache.hadoop.mapreduce.jobhistory.JobUnsuccessfulCompletion":{"jobid":"job_1532048817013_xxxx","finishTime":1534521962641,"finishedMaps":0,"finishedReduces":0,"jobStatus":"SUCCEEDED","diagnostics":{"string":"Job commit succeeded in a prior MRAppMaster attempt before it crashed. Recovering."},"failedMaps":0,"failedReduces":0,"killedMaps":0,"killedReduces":0}}}
> {code}
> The issue seems to be around the SUCCEEDED job status for a JobUnsuccessfulCompletion:
> https://github.com/apache/hadoop/blob/e0f6ffdbad6f43fd43ec57fb68ebf5275b8b9ba0/hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/JobBuilder.java#L609
> Which fails to find the enum here:
> https://github.com/apache/hadoop/blob/e0f6ffdbad6f43fd43ec57fb68ebf5275b8b9ba0/hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/Pre21JobHistoryConstants.java#L50
> I'm not sure if this is an error with the Rumen parser or if the job history file is getting into an invalid state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org