You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Sergiy Matusevych (JIRA)" <ji...@apache.org> on 2017/04/18 23:54:41 UTC

[jira] [Commented] (REEF-1782) REEF-on-REEF host driver closes prematurely

    [ https://issues.apache.org/jira/browse/REEF-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973764#comment-15973764 ] 

Sergiy Matusevych commented on REEF-1782:
-----------------------------------------

To reproduce, make sure you have Hadoop 2.7.3+ cluster (earlier versions of YARN have a bug that prevents REEF from running in Unamanged AM mode), and run
{code}
./bin/run.sh org.apache.reef.examples.reefonreef.Launch
{code}
on Linux, or
{code}
.\bin\runreef.ps1 -VerboseLog -Jars .\lang\java\reef-examples\target\reef-examples-0.16.0-SNAPSHOT-shaded.jar -Class org.apache.reef.examples.reefonreef.Launch
{code}
in Windows PowerShell.

> REEF-on-REEF host driver closes prematurely
> -------------------------------------------
>
>                 Key: REEF-1782
>                 URL: https://issues.apache.org/jira/browse/REEF-1782
>             Project: REEF
>          Issue Type: Bug
>          Components: REEF Driver, REEF Runtime YARN
>         Environment: YARN 2.7.3+
>            Reporter: Sergiy Matusevych
>            Assignee: Sergiy Matusevych
>              Labels: bug, yarn
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> REEF-on-REEF application runs on YARN, and the inner application completes successfully; however, the host application's driver closes prematurely and has the {{FAILED/FAILED}} status in RM:
> {code}
> $ yarn application -list -appStates ALL
>                 Application-Id      Application-Name        Application-Type          User           Queue                   State             Final-State             Progress                        Tracking-URL
> application_1492554568254_0013     REEF-on-REEF:host                    YARN        hadoop      root.hadoop                 FAILED                  FAILED                 100% http://cisl-linux-070:8088/cluster/app/application_1492554568254_0013
> application_1492554568254_0014    REEF-on-REEF:hello                    YARN        hadoop      root.hadoop               FINISHED               SUCCEEDED                 100%                                 N/A
> {code}
> Most likely, that happens because on completion the inner application closes some resources that either belong to the host app, or are shared with it.
> Here's a fragment of the dirver log:
> {code}
> 2017-04-18 19:15:52,332 INFO reef.examples.reefonreef.ReefOnReefDriver.onNext main | REEF-on-REEF inner job application_1492554568254_0014 completed: state DONE
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.REEFEnvironment.close main | ENTRY
> 2017-04-18 19:15:52,332 FINER reef.wake.time.runtime.RuntimeClock.close main | ENTRY
> 2017-04-18 19:15:52,332 FINER reef.wake.time.runtime.RuntimeClock.close main | RETURN Clock has already been closed
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.launch.REEFErrorHandler.close main | ENTRY
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.utils.RemoteManager.close main | ENTRY
> 2017-04-18 19:15:52,332 FINE reef.wake.remote.impl.DefaultRemoteManagerImplementation.close main | RemoteManager: REEF_UNMANAGED_DRIVER Closing remote manager id: socket://10.200.91.65:16952
> 2017-04-18 19:15:52,332 FINE reef.wake.remote.impl.DefaultRemoteManagerImplementation.close main | RemoteManager: REEF_UNMANAGED_DRIVER already closed
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.utils.RemoteManager.close main | RETURN
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.launch.REEFErrorHandler.close main | RETURN
> 2017-04-18 19:15:52,332 FINER reef.runtime.common.REEFEnvironment.close main | RETURN
> 2017-04-18 19:15:52,332 INFO reef.examples.reefonreef.ReefOnReefDriver.onNext main | REEF-on-REEF host job REEF-on-REEF:host completed: inner app application_1492554568254_0014 status SUBMITTED
> {code}
> i.e. some driver resources has already been closed at the end of the inner app.
> Another good test for that behavior would be running *two* inner applications in Unmanaged AM mode sequentially from the same host driver.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)