You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Markus Weimer (JIRA)" <ji...@apache.org> on 2015/08/19 21:57:45 UTC

[jira] [Resolved] (REEF-563) Evaluators that are kept alive are not able to re-register with the driver

     [ https://issues.apache.org/jira/browse/REEF-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Weimer resolved REEF-563.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 0.13

Fixed via [#380|https://github.com/apache/incubator-reef/pull/380]

> Evaluators that are kept alive are not able to re-register with the driver
> --------------------------------------------------------------------------
>
>                 Key: REEF-563
>                 URL: https://issues.apache.org/jira/browse/REEF-563
>             Project: REEF
>          Issue Type: Sub-task
>          Components: REEF Driver, REEF.NET Driver
>            Reporter: Andrew Chung
>            Assignee: Andrew Chung
>             Fix For: 0.13
>
>
> This stems from 2 issues:
> 1. the queryUri to find the driver http endpoint is missing a '/' after application ID.
> 2. YARN currently redirects the proxy call to the primary RM via a meta-refresh, which is not handled by our recovery mechanism assuming no redirects. See YARN-2605.
> 3. The driver does not recognize the evaluator trying to contact it and receives exception:
> {code}
> java.lang.RuntimeException: Contact from unknown Evaluator with identifier 'container_e02_1438245500443_0040_01_000004' with state 'RUNNING'
> 	at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:72)
> 	at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:36)
> 	at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:146)
> 	at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:37)
> 	at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:171)
> 	at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:152)
> 	at org.apache.reef.wake.impl.ThreadPoolStage$1.run(ThreadPoolStage.java:181)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}
> Item 3 will be incorporated into the work of REEF-560 instead of being covered by this item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)