You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Mit Desai (JIRA)" <ji...@apache.org> on 2014/01/31 22:26:14 UTC

[jira] [Commented] (MAPREDUCE-5688) MRAppMaster causes TestStagingCleanup to fail intermittently with JDK7

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888185#comment-13888185 ] 

Mit Desai commented on MAPREDUCE-5688:
--------------------------------------

Thanks Jon for the Review!

bq. Re-title this jira since this is not a test problem according to the patch, but a race condition in the MRAppMaster that is exposed most frequently via this test.
I have renamed this JIRA accordingly. I would like to make one clarification here. I do not see this problem as a race. The bug is exposed by this test running with JDK7 and its random ordering.

bq. Add your analysis to the jira so that the actual problem is documented and captured for future use.
This failure is intermittent. It is only caused when the test TestStagingCleanup runs in a particular order. For example, testDeletionofStagingOnReboot() followed by testDeletionofStagingOnKillLastTry()

The reason for the failure is due to the notifyIsLastAMRetry(). When this function is called, it calls setForcejobCompletion(). If the appMaster.stop() is called after the setForcejobCompletion(), it tries to stop the appMaster which was already forced to stop. As a result, it gets an NPE trying to stop the appMaster. If the appMaster.stop() is called in the first place, we won't get the NPE when it tries forceJobCompletion as there already is a null check before it proceeds.

hook.run() is also called in testDeletionofStagingOnKill(). But we do not get the NPE in that case. The reason for this is, in this test, we have 4 app attempts. _MRAppMaster appMaster = new TestMRApp(attemptId, mockAlloc, 4);_
where as in testDeletionofStagingOnKillLastTry() we have only 1 attempt to make sure there is no retry. _MRAppMaster appMaster = new TestMRApp(attemptId, mockAlloc, 1); //no retry_

bq. Please determine if the java7 label is still accurate based on your analysis
We still need the java7 label as the TestStagingCleanup will not always fail without this fix. It only fails when the tests run in a particular order.

> MRAppMaster causes TestStagingCleanup to fail intermittently with JDK7
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5688
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5688
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 2.3.0
>            Reporter: Mit Desai
>            Assignee: Mit Desai
>              Labels: java7
>         Attachments: MAPREDUCE-5688.patch
>
>
> Due to random ordering ordering in JDK7, the test TestStagingCleanup#testDeletionofStagingOnKillLastTry is failing
> {noformat}
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.231 sec <<< FAILURE!
> test(org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup)  Time elapsed: 3882 sec  <<< ERROR!
> java.lang.NullPointerException
> 	at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:349)
> 	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> 	at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> 	at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> 	at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:159)
> 	at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
> 	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1399)
> 	at org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup.testDeletionofStagingOnKillLastTry(TestStagingCleanup.java:239)
> 	at org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup.test(TestStagingCleanup.java:82)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at junit.framework.TestCase.runTest(TestCase.java:168)
> 	at junit.framework.TestCase.runBare(TestCase.java:134)
> 	at junit.framework.TestResult$1.protect(TestResult.java:110)
> 	at junit.framework.TestResult.runProtected(TestResult.java:128)
> 	at junit.framework.TestResult.run(TestResult.java:113)
> 	at junit.framework.TestCase.run(TestCase.java:124)
> 	at junit.framework.TestSuite.runTest(TestSuite.java:243)
> 	at junit.framework.TestSuite.run(TestSuite.java:238)
> 	at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:242)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:137)
> 	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
> 	at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
> 	at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
> 	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
> 	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)