You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2018/01/11 17:17:00 UTC

[jira] [Updated] (MAPREDUCE-7020) Task timeout in uber mode can crash AM

     [ https://issues.apache.org/jira/browse/MAPREDUCE-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-7020:
----------------------------------
    Affects Version/s: 2.7.6
                       2.8.4
                       2.9.1
                       2.10.0
                       3.0.1
                       3.1.0
              Summary: Task timeout in uber mode can crash AM  (was: TestUberAM is failing)
     Target Version/s: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
          Component/s:     (was: test)
                       mr-am

There's a lot of important stuff that happens in the {{done}} method, like final counter updates and task commit  Why was that call removed?

Does ReduceTask need a similar update, or is this problem somehow specific to only map tasks?

I'm wondering if we should not allow Task to call System.exit when running in uber mode and instead let the AM decide what to do when a task fails in uber mode.  That may be a more general fix, as it could solve other scenarios where the task is tearing down the uber AM harshly and leaving no history.

> Task timeout in uber mode can crash AM
> --------------------------------------
>
>                 Key: MAPREDUCE-7020
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7020
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
>            Reporter: Akira Ajisaka
>            Assignee: Peter Bacsko
>         Attachments: MAPREDUCE-7020-001.patch
>
>
> TestUberAM is failing
> {noformat}
> java.lang.AssertionError: No AppMaster log found! expected:<1> but was:<2>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.junit.Assert.assertEquals(Assert.java:555)
> 	at org.apache.hadoop.mapreduce.v2.TestMRJobs.testThreadDumpOnTaskTimeout(TestMRJobs.java:1228)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}
> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/614/testReport/junit/org.apache.hadoop.mapreduce.v2/TestUberAM/testThreadDumpOnTaskTimeout/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org