You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Thomas Graves (JIRA)" <ji...@apache.org> on 2012/10/17 18:22:03 UTC

[jira] [Created] (MAPREDUCE-4729) job history UI not showing all job attempts

Thomas Graves created MAPREDUCE-4729:
----------------------------------------

             Summary: job history UI not showing all job attempts
                 Key: MAPREDUCE-4729
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: jobhistoryserver
    Affects Versions: 0.23.3
            Reporter: Thomas Graves


We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.

The RM web ui shows all 4 attempts. 

Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-4729:
-----------------------------------------------

    Status: Patch Available  (was: Open)
    
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: MAPREDUCE-4729-20121031.txt, MAPREDUCE-4729-20121101.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-4729:
-----------------------------------------------

    Status: Patch Available  (was: Open)
    
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: MAPREDUCE-4729-20121031.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488729#comment-13488729 ] 

Robert Joseph Evans commented on MAPREDUCE-4729:
------------------------------------------------

The patch looks good to me.  I like how we short circuit the reading of the history file because we know that what we want to read is in a single chunk. However, I am a bit conflicted about where the parsing of the job history file is happening.  I kind of feel that it should go in the recovery service. In this case we are only doing a partial recovery, not a full recovery.  I can see in the future we would want to add in other things to the partial recovery, like when/if we add in a summary of the history file for fast reading it would be nice to propagate that to the next file no matter what happens so that the history server can show a full view of what is happened in the application.  But, I am fine with it the way it is and if you feel strongly the other way feel free to check it in as is +1. 
                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: MAPREDUCE-4729-20121031.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Jason Lowe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4729:
----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.23.5
                   2.0.3-alpha
     Hadoop Flags: Reviewed
           Status: Resolved  (was: Patch Available)

Thanks, Vinod!  I commmitted this to trunk, branch-2, and branch-0.23.
                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>             Fix For: 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4729-20121031.txt, MAPREDUCE-4729-20121101.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Thomas Graves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479306#comment-13479306 ] 

Thomas Graves commented on MAPREDUCE-4729:
------------------------------------------

Ok, so I figured this out.  The job is using output format org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat, which has the OutputCommitter which is set to null.  This caused the MRAppMaster recoveryService to not start:

 "org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Not starting RecoveryService: recoveryEnabled: true recoverySupportedByCommitter: false ApplicationAttemptID: 4"

Since the recovery service didn't start it didn't parse the old job history files, thus didn't have the list of old AMs. 

I think we should fix that so that even if recovery isn't supported we atleast parse and get the previous AM attempt info.
                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489354#comment-13489354 ] 

Hudson commented on MAPREDUCE-4729:
-----------------------------------

Integrated in Hadoop-Yarn-trunk #24 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/24/])
    MAPREDUCE-4729. job history UI not showing all job attempts. Contributed by Vinod Kumar Vavilapalli (Revision 1404817)

     Result = SUCCESS
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404817
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestAMInfos.java

                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>             Fix For: 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4729-20121031.txt, MAPREDUCE-4729-20121101.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489128#comment-13489128 ] 

Hudson commented on MAPREDUCE-4729:
-----------------------------------

Integrated in Hadoop-trunk-Commit #2949 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/2949/])
    MAPREDUCE-4729. job history UI not showing all job attempts. Contributed by Vinod Kumar Vavilapalli (Revision 1404817)

     Result = SUCCESS
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404817
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestAMInfos.java

                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>             Fix For: 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4729-20121031.txt, MAPREDUCE-4729-20121101.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Jason Lowe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489098#comment-13489098 ] 

Jason Lowe commented on MAPREDUCE-4729:
---------------------------------------

+1, will commit shortly.  Filed MAPREDUCE-4767 to track the lack of flushing for AMInfo records.
                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: MAPREDUCE-4729-20121031.txt, MAPREDUCE-4729-20121101.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Jason Lowe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488741#comment-13488741 ] 

Jason Lowe commented on MAPREDUCE-4729:
---------------------------------------

I tried testing the patch with a sleep job using -Dyarn.app.mapreduce.am.job.recovery.enable=false and manually killing the ApplicationMaster with a kill -9, but it didn't work.  The log showed this exception:

{noformat}
2012-11-01 14:37:01,543 WARN [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Could not parse the old history file. Will not have old AMinfos 
java.io.IOException: Incompatible event log version: null
	at org.apache.hadoop.mapreduce.jobhistory.EventReader.<init>(EventReader.java:70)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.readJustAMInfos(MRAppMaster.java:915)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:846)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1143)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1378)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1139)
	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1098)
{noformat}

It looks like the AM is buffering the history file output, and we didn't flush out the AMInfos from previous runs.  When I used a normal kill instead of kill -9, it worked.  We will want to flush/sync the job history file after writing the AMInfos to help guard against unclean teardowns losing prior AM attempts in the history.  This can be fixed in a separate JIRA if we don't want to fix it here.

Couple of other comments on the patch:
* Application attempts start from 1 instead of 0, so the first attempt tries to recover AMInfos when it shouldn't and leads to a large FileNotFoundException stacktrace being logged
* Nit: In RecoveryService.parse there's an extra space logged before a comma.  {{LOG.info("Got an error parsing job-history file "}} should be {{LOG.info("Got an error parsing job-history file"}}
* Nit: The body of the while loop in readJustAMInfos could be a bit cleaner with fewer conditionals.  For example:
{code}
      while ((event = jobHistoryEventReader.getNextEvent()) != null) {
        if (event.getEventType() == EventType.AM_STARTED) {
          amStartedEventsBegan = true;
          AMStartedEvent amStartedEvent = (AMStartedEvent) event;
          amInfos.add(MRBuilderUtils.newAMInfo(
            amStartedEvent.getAppAttemptId(), amStartedEvent.getStartTime(),
            amStartedEvent.getContainerId(),
            StringInterner.weakIntern(amStartedEvent.getNodeManagerHost()),
            amStartedEvent.getNodeManagerPort(),
            amStartedEvent.getNodeManagerHttpPort()));
        } else if (amStartedEventsBegan) {
          // This means AMStartedEvents began and this event is a
          // non-AMStarted event.
          // No need to continue reading all the other events.
          break;
        }
      }
{code}
                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: MAPREDUCE-4729-20121031.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Jason Lowe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-4729:
----------------------------------

    Status: Open  (was: Patch Available)
    
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: MAPREDUCE-4729-20121031.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478236#comment-13478236 ] 

Robert Joseph Evans commented on MAPREDUCE-4729:
------------------------------------------------

That is a bit odd.  The only thing I can think of is that the jhist files from the previous runs were corrupted some how.  The way we have set things up on the AM an OOM will shut things down very hard with no time for cleanup, so the end of the file may have had some issues. This is just speculation. 
                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479369#comment-13479369 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4729:
----------------------------------------------------

bq. I think we should fix that so that even if recovery isn't supported we atleast parse and get the previous AM attempt info.
+1

bq. I'm not a pig expert but perhaps we should also follow up with pig team to see if they should support Recovery
This is a general issue with all OutputFormats. We need to educate users that they need to start implementing recovery.
                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479201#comment-13479201 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4729:
----------------------------------------------------

It looks like we retain the history files from all attempts, did you look at them all? Also, please see if you find this log in any of the AM attempts:
{code}
"Got an error parsing job-history file " + historyFile + ", ignoring incomplete events."
{code}
                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478250#comment-13478250 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4729:
----------------------------------------------------

IIRC, The AM recovery is tolerant to corrupted records towards the end of file.

Thomas, can you look at the history files directly and see if AMStarted events are getting correctly logged in each generation? Each Job Attempt should have AMStarted events from all the previous generations.
                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Thomas Graves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478393#comment-13478393 ] 

Thomas Graves commented on MAPREDUCE-4729:
------------------------------------------

The jobhistory file itself only has one AM_STARTED event which is for the 4th attempt. So it must be the jhist file is getting lost of corrupt from the other attempts somehow.
                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489049#comment-13489049 ] 

Hadoop QA commented on MAPREDUCE-4729:
--------------------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12551761/MAPREDUCE-4729-20121101.txt
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2980//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2980//console

This message is automatically generated.
                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: MAPREDUCE-4729-20121031.txt, MAPREDUCE-4729-20121101.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Thomas Graves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479309#comment-13479309 ] 

Thomas Graves commented on MAPREDUCE-4729:
------------------------------------------

I'm not a pig expert but perhaps we should also follow up with pig team to see if they should support Recovery
                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489400#comment-13489400 ] 

Hudson commented on MAPREDUCE-4729:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk #1214 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1214/])
    MAPREDUCE-4729. job history UI not showing all job attempts. Contributed by Vinod Kumar Vavilapalli (Revision 1404817)

     Result = FAILURE
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404817
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestAMInfos.java

                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>             Fix For: 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4729-20121031.txt, MAPREDUCE-4729-20121101.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-4729:
-----------------------------------------------

    Attachment: MAPREDUCE-4729-20121031.txt

bq. I think we should fix that so that even if recovery isn't supported we atleast parse and get the previous AM attempt info.
Attached patch does this.
                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: MAPREDUCE-4729-20121031.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489395#comment-13489395 ] 

Hudson commented on MAPREDUCE-4729:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #423 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/423/])
    svn merge -c 1404817 FIXES: MAPREDUCE-4729. job history UI not showing all job attempts. Contributed by Vinod Kumar Vavilapalli (Revision 1404825)

     Result = SUCCESS
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404825
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestAMInfos.java

                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>             Fix For: 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4729-20121031.txt, MAPREDUCE-4729-20121101.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-4729:
-----------------------------------------------

    Attachment: MAPREDUCE-4729-20121101.txt

Thanks for testing this Jason. And for the comments too.

bq. We will want to flush/sync the job history file after writing the AMInfos to help guard against unclean teardowns losing prior AM attempts in the history.
Wanted to this, but ran into couple more not so critical bugs - setupEventWriter() is getting called for all AMStarted events! Let's fix both in a separate ticket.

Fixed the other issues. Good catch on the AttemptId part.
                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: MAPREDUCE-4729-20121031.txt, MAPREDUCE-4729-20121101.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489009#comment-13489009 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4729:
----------------------------------------------------

bq.  However, I am a bit conflicted about where the parsing of the job history file is happening. I kind of feel that it should go in the recovery service.
I am not sure either ways but RecoveryService is wired into the AppMaster and is selected dynamically depending on whether recovery is enabled or not. We can rewire this, but that's a slightly bigger change. I'd like to keep this as is.
                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: MAPREDUCE-4729-20121031.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488431#comment-13488431 ] 

Hadoop QA commented on MAPREDUCE-4729:
--------------------------------------

{color:green}+1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12551662/MAPREDUCE-4729-20121031.txt
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2974//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2974//console

This message is automatically generated.
                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: MAPREDUCE-4729-20121031.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489431#comment-13489431 ] 

Hudson commented on MAPREDUCE-4729:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #1244 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1244/])
    MAPREDUCE-4729. job history UI not showing all job attempts. Contributed by Vinod Kumar Vavilapalli (Revision 1404817)

     Result = FAILURE
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1404817
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestAMInfos.java

                
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>             Fix For: 2.0.3-alpha, 0.23.5
>
>         Attachments: MAPREDUCE-4729-20121031.txt, MAPREDUCE-4729-20121101.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-4729) job history UI not showing all job attempts

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli reassigned MAPREDUCE-4729:
--------------------------------------------------

    Assignee: Vinod Kumar Vavilapalli
    
> job history UI not showing all job attempts
> -------------------------------------------
>
>                 Key: MAPREDUCE-4729
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4729
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 0.23.3
>            Reporter: Thomas Graves
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: MAPREDUCE-4729-20121031.txt
>
>
> We are seeing a case where a job runs but the AM is running out of memory in the first 3 attempts. The job eventually finishes on the 4th attempt.  When you go to the job history UI for that job, it only shows the last attempt.  This is bad since we want to see why the first 3 attempts failed.
> The RM web ui shows all 4 attempts. 
> Also I tested this locally by running "kill" on the app master and in that case the history server UI does show all attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira