You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Andrew Chung (JIRA)" <ji...@apache.org> on 2015/08/15 22:09:45 UTC

[jira] [Updated] (REEF-563) Evaluators that are kept alive are not able to re-register with the driver

     [ https://issues.apache.org/jira/browse/REEF-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Chung updated REEF-563:
------------------------------
    Description: 
This stems from 2 issues:
1. the queryUri to find the driver http endpoint is missing a '/' after application ID.
2. YARN currently redirects the proxy call to the primary RM via a meta-refresh, which is not handled by our recovery mechanism assuming no redirects. See https://issues.apache.org/jira/browse/YARN-2605.
3. The driver does not recognize the evaluator trying to contact it and receives exception:
{code}
java.lang.RuntimeException: Contact from unknown Evaluator with identifier 'container_e02_1438245500443_0040_01_000004' with state 'RUNNING'
	at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:72)
	at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:36)
	at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:146)
	at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:37)
	at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:171)
	at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:152)
	at org.apache.reef.wake.impl.ThreadPoolStage$1.run(ThreadPoolStage.java:181)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
{code}
This will be incorporated into the work of REEF-560 instead of being covered by this item.

  was:
This stems from 2 issues:
1. the queryUri to find the driver http endpoint is missing a '/' after application ID.
2. The driver does not recognize the evaluator trying to contact it and receives exception:
{code}
java.lang.RuntimeException: Contact from unknown Evaluator with identifier 'container_e02_1438245500443_0040_01_000004' with state 'RUNNING'
	at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:72)
	at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:36)
	at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:146)
	at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:37)
	at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:171)
	at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:152)
	at org.apache.reef.wake.impl.ThreadPoolStage$1.run(ThreadPoolStage.java:181)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
{code}


> Evaluators that are kept alive are not able to re-register with the driver
> --------------------------------------------------------------------------
>
>                 Key: REEF-563
>                 URL: https://issues.apache.org/jira/browse/REEF-563
>             Project: REEF
>          Issue Type: Sub-task
>          Components: REEF Driver, REEF.NET Driver
>            Reporter: Andrew Chung
>            Assignee: Andrew Chung
>
> This stems from 2 issues:
> 1. the queryUri to find the driver http endpoint is missing a '/' after application ID.
> 2. YARN currently redirects the proxy call to the primary RM via a meta-refresh, which is not handled by our recovery mechanism assuming no redirects. See https://issues.apache.org/jira/browse/YARN-2605.
> 3. The driver does not recognize the evaluator trying to contact it and receives exception:
> {code}
> java.lang.RuntimeException: Contact from unknown Evaluator with identifier 'container_e02_1438245500443_0040_01_000004' with state 'RUNNING'
> 	at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:72)
> 	at org.apache.reef.runtime.common.driver.evaluator.EvaluatorHeartbeatHandler.onNext(EvaluatorHeartbeatHandler.java:36)
> 	at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:146)
> 	at org.apache.reef.wake.remote.impl.HandlerContainer.onNext(HandlerContainer.java:37)
> 	at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:171)
> 	at org.apache.reef.wake.remote.impl.OrderedPullEventHandler.onNext(OrderedRemoteReceiverStage.java:152)
> 	at org.apache.reef.wake.impl.ThreadPoolStage$1.run(ThreadPoolStage.java:181)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}
> This will be incorporated into the work of REEF-560 instead of being covered by this item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)