You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Sergiy Matusevych (JIRA)" <ji...@apache.org> on 2017/04/18 23:54:41 UTC
[jira] [Commented] (REEF-1782) REEF-on-REEF host driver closes
prematurely
[ https://issues.apache.org/jira/browse/REEF-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973764#comment-15973764 ]
Sergiy Matusevych commented on REEF-1782:
-----------------------------------------
To reproduce, make sure you have Hadoop 2.7.3+ cluster (earlier versions of YARN have a bug that prevents REEF from running in Unamanged AM mode), and run
{code}
./bin/run.sh org.apache.reef.examples.reefonreef.Launch
{code}
on Linux, or
{code}
.\bin\runreef.ps1 -VerboseLog -Jars .\lang\java\reef-examples\target\reef-examples-0.16.0-SNAPSHOT-shaded.jar -Class org.apache.reef.examples.reefonreef.Launch
{code}
in Windows PowerShell.
> REEF-on-REEF host driver closes prematurely
> -------------------------------------------
>
> Key: REEF-1782
> URL: https://issues.apache.org/jira/browse/REEF-1782
> Project: REEF
> Issue Type: Bug
> Components: REEF Driver, REEF Runtime YARN
> Environment: YARN 2.7.3+
> Reporter: Sergiy Matusevych
> Assignee: Sergiy Matusevych
> Labels: bug, yarn
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> REEF-on-REEF application runs on YARN, and the inner application completes successfully; however, the host application's driver closes prematurely and has the {{FAILED/FAILED}} status in RM:
> {code}
> $ yarn application -list -appStates ALL
> Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
> application_1492554568254_0013 REEF-on-REEF:host YARN hadoop root.hadoop FAILED FAILED 100% http://cisl-linux-070:8088/cluster/app/application_1492554568254_0013
> application_1492554568254_0014 REEF-on-REEF:hello YARN hadoop root.hadoop FINISHED SUCCEEDED 100% N/A
> {code}
> Most likely, that happens because on completion the inner application closes some resources that either belong to the host app, or are shared with it.
> Here's a fragment of the dirver log:
> {code}
> 2017-04-18 19:15:52,332 INFO reef.examples.reefonreef.ReefOnReefDriver.onNext main | REEF-on-REEF inner job application_1492554568254_0014 completed: state DONE
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.REEFEnvironment.close main | ENTRY
> 2017-04-18 19:15:52,332 FINER reef.wake.time.runtime.RuntimeClock.close main | ENTRY
> 2017-04-18 19:15:52,332 FINER reef.wake.time.runtime.RuntimeClock.close main | RETURN Clock has already been closed
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.launch.REEFErrorHandler.close main | ENTRY
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.utils.RemoteManager.close main | ENTRY
> 2017-04-18 19:15:52,332 FINE reef.wake.remote.impl.DefaultRemoteManagerImplementation.close main | RemoteManager: REEF_UNMANAGED_DRIVER Closing remote manager id: socket://10.200.91.65:16952
> 2017-04-18 19:15:52,332 FINE reef.wake.remote.impl.DefaultRemoteManagerImplementation.close main | RemoteManager: REEF_UNMANAGED_DRIVER already closed
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.utils.RemoteManager.close main | RETURN
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.launch.REEFErrorHandler.close main | RETURN
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.REEFEnvironment.close main | RETURN
> 2017-04-18 19:15:52,332 INFO reef.examples.reefonreef.ReefOnReefDriver.onNext main | REEF-on-REEF host job REEF-on-REEF:host completed: inner app application_1492554568254_0014 status SUBMITTED
> {code}
> i.e. some driver resources has already been closed at the end of the inner app.
> Another good test for that behavior would be running *two* inner applications in Unmanaged AM mode sequentially from the same host driver.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)