You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@twill.apache.org by "Terence Yim (JIRA)" <ji...@apache.org> on 2014/10/21 19:27:34 UTC

[jira] [Commented] (TWILL-103) YarnTwillRunnerService lookup() fails to find application if called immediately after startAndWait()

    [ https://issues.apache.org/jira/browse/TWILL-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178681#comment-14178681 ] 

Terence Yim commented on TWILL-103:
-----------------------------------

The delay in lookup is caused by the fact that all ZK interactions in Twill are performed asynchronously. The mechanism of how TwillRunnerService knows what Applications are running are done by fetching and watching ZooKeeper, hence the delay. One way to give better experience is to have TwillRunner to expose methods for doing watches on the changes of applications.

> YarnTwillRunnerService lookup() fails to find application if called immediately after startAndWait()
> ----------------------------------------------------------------------------------------------------
>
>                 Key: TWILL-103
>                 URL: https://issues.apache.org/jira/browse/TWILL-103
>             Project: Apache Twill
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 0.3.0-incubating, 0.4.0-incubating, 0.5.0-incubating
>            Reporter: Mike Walch
>
> While TwillRunnerService.startAndWait() requests application/controller state from Zookeeper, there may be a delay in retrieving this state.  This can cause subsequent lookup() calls to return null even if there is a Twill application running.  
> While this can be prevented by adding a one second sleep after startAndWait() is called, it would be better if startAndWait() was modifed to not return until all  state has been retrieved from Zookeeper. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)