You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Gergo Repas (JIRA)" <ji...@apache.org> on 2018/01/02 18:00:05 UTC
[jira] [Updated] (MAPREDUCE-7028) Concurrent task progress updates causing NPE in Application Master

     [ https://issues.apache.org/jira/browse/MAPREDUCE-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gergo Repas updated MAPREDUCE-7028:
-----------------------------------
    Attachment: MAPREDUCE-7028.002.patch

[~miklos.szegedi@cloudera.com] Thanks for the review, I addressed your comments. Regarding using locks vs retry-logic: I agree with [~jlowe]'s comment, status updates are sent in a synchronous manner from the task, and it must be the RPC retry logic causing the concurrent updates. And also, even the locking would not ensure that the update that was first sent will be processed first. So I have not changed the retry logic to locking for now.

> Concurrent task progress updates causing NPE in Application Master
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7028
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7028
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
>            Reporter: Gergo Repas
>            Assignee: Gergo Repas
>         Attachments: MAPREDUCE-7028.000.patch, MAPREDUCE-7028.001.patch, MAPREDUCE-7028.002.patch
>
>
> Concurrent task progress updates can cause a NullPointerException in the Application Master (stack trace is with code at current trunk):
> {quote}
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 9 on 39501] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1513780867907_0001_m_000002_0 is : 0.02677883
> 2017-12-20 06:49:42,369 INFO [IPC Server handler 13 on 39501] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1513780867907_0001_m_000002_0 is : 0.02677883
> 2017-12-20 06:49:42,383 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
>         at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2450)
>         at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:2433)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>         at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1362)
>         at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:154)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1543)
>         at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1535)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
>         at java.lang.Thread.run(Thread.java:748)
> 2017-12-20 06:49:42,385 INFO [IPC Server handler 13 on 39501] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1513780867907_0001_m_000002_0 is : 0.02677883
> 2017-12-20 06:49:42,386 INFO [AsyncDispatcher ShutDown handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
> {quote}
> This happened naturally in several big wordcount runs, and I could reproduce this reliably by artificially making task updates more frequent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org